Compare commits

..

193 Commits

Author SHA1 Message Date
Bernhard Beschow
d13b9e4a54 Revert "hw/i386/pc: Confine system flash handling to pc_sysfw"
Specifying the property `-M pflash0` results in a regression:
  qemu-system-x86_64: Property 'pc-q35-9.0-machine.pflash0' not found
Revert the change for now until a solution is found.

This reverts commit 6f6ad2b245.

Reported-by: Volker Rümelin <vr_qemu@t-online.de>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20240226215909.30884-3-shentey@gmail.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit f2cb9f34ad)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-27 09:23:41 +01:00
Bernhard Beschow
9b0eb5104f Revert "hw/i386/pc_sysfw: Inline pc_system_flash_create() and remove it"
Commit 6f6ad2b245 "hw/i386/pc: Confine system flash handling to pc_sysfw"
causes a regression when specifying the property `-M pflash0` in the PCI PC
machines:
  qemu-system-x86_64: Property 'pc-q35-9.0-machine.pflash0' not found
In order to revert the commit, the commit below must be reverted first.

This reverts commit cb05cc1602.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20240226215909.30884-2-shentey@gmail.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 0fbe8d7c4c)
Referecens: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-27 09:23:02 +01:00
Xiaoyao Li
1b231e8f26 docs: Add TDX documentation
Add docs/system/i386/tdx.rst for TDX support, and add tdx in
confidential-guest-support.rst

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:17:00 +01:00
Sean Christopherson
72db2d374f i386/tdx: Don't get/put guest state for TDX VMs
Don't get/put state of TDX VMs since accessing/mutating guest state of
production TDs is not supported.

Note, it will be allowed for a debug TD. Corresponding support will be
introduced when debug TD support is implemented in the future.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:17:00 +01:00
Xiaoyao Li
4a81d67680 i386/tdx: Skip kvm_put_apicbase() for TDs
KVM doesn't allow wirting to MSR_IA32_APICBASE for TDs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:17:00 +01:00
Xiaoyao Li
820895b892 i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() for TDs
For TDs, only MSR_IA32_UCODE_REV in kvm_init_msrs() can be configured
by VMM, while the features enumerated/controlled by other MSRs except
MSR_IA32_UCODE_REV in kvm_init_msrs() are not under control of VMM.

Only configure MSR_IA32_UCODE_REV for TDs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:17:00 +01:00
Isaku Yamahata
fa977aad67 i386/tdx: Don't synchronize guest tsc for TDs
TSC of TDs is not accessible and KVM doesn't allow access of
MSR_IA32_TSC for TDs. To avoid the assert() in kvm_get_tsc, make
kvm_synchronize_all_tsc() noop for TDs,

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:17:00 +01:00
Isaku Yamahata
e12b096972 hw/i386: add option to forcibly report edge trigger in acpi tables
When level trigger isn't supported on x86 platform,
forcibly report edge trigger in acpi tables.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:17:00 +01:00
Xiaoyao Li
321af9d3fa hw/i386: add eoi_intercept_unsupported member to X86MachineState
Add a new bool member, eoi_intercept_unsupported, to X86MachineState
with default value false. Set true for TDX VM.

Inability to intercept eoi causes impossibility to emulate level
triggered interrupt to be re-injected when level is still kept active.
which affects interrupt controller emulation.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:17:00 +01:00
Xiaoyao Li
437417dac6 i386/tdx: LMCE is not supported for TDX
LMCE is not supported TDX since KVM doesn't provide emulation for
MSR_IA32_FEAT_CTL.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:16:59 +01:00
Xiaoyao Li
1c882e8210 i386/tdx: Don't allow system reset for TDX VMs
TDX CPU state is protected and thus vcpu state cann't be reset by VMM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:16:59 +01:00
Xiaoyao Li
56ee0ce804 i386/tdx: Disable PIC for TDX VMs
Legacy PIC (8259) cannot be supported for TDX VMs since TDX module
doesn't allow directly interrupt injection.  Using posted interrupts
for the PIC is not a viable option as the guest BIOS/kernel will not
do EOI for PIC IRQs, i.e. will leave the vIRR bit set.

Hence disable PIC for TDX VMs and error out if user wants PIC.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:16:59 +01:00
Xiaoyao Li
36fe16995c i386/tdx: Disable SMM for TDX VMs
TDX doesn't support SMM and VMM cannot emulate SMM for TDX VMs because
VMM cannot manipulate TDX VM's memory.

Disable SMM for TDX VMs and error out if user requests to enable SMM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:16:59 +01:00
Isaku Yamahata
03d11c6dca q35: Introduce smm_ranges property for q35-pci-host
Add a q35 property to check whether or not SMM ranges, e.g. SMRAM, TSEG,
etc... exist for the target platform.  TDX doesn't support SMM and doesn't
play nice with QEMU modifying related guest memory ranges.

Signed-off-by: Isaku Yamahata <isaku.yamahata@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:16:59 +01:00
Isaku Yamahata
860a8b50a3 pci-host/q35: Move PAM initialization above SMRAM initialization
In mch_realize(), process PAM initialization before SMRAM initialization so
that later patch can skill all the SMRAM related with a single check.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:16:59 +01:00
Xiaoyao Li
e03a166f4e i386/tdx: Wire TDX_REPORT_FATAL_ERROR with GuestPanic facility
Integrate TDX's TDX_REPORT_FATAL_ERROR into QEMU GuestPanic facility

Originated-from: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:16:59 +01:00
Xiaoyao Li
819540ef60 i386/tdx: Handle TDG.VP.VMCALL<REPORT_FATAL_ERROR>
TD guest can use TDG.VP.VMCALL<REPORT_FATAL_ERROR> to request termination
with error message encoded in GPRs.

Parse and print the error message, and terminate the TD guest in the
handler.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:16:59 +01:00
Isaku Yamahata
ac2eeaf4a2 i386/tdx: handle TDG.VP.VMCALL<MapGPA> hypercall
MapGPA is a hypercall to convert GPA from/to private GPA to/from shared GPA.
As the conversion function is already implemented as kvm_convert_memory,
wire it to TDX hypercall exit.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:16:59 +01:00
Isaku Yamahata
6f0f362e95 i386/tdx: handle TDG.VP.VMCALL<GetQuote>
Add property "quote-generation-socket" to tdx-guest, which is a property
of type SocketAddress to specify Quote Generation Service(QGS).

On request of GetQuote, it connects to the QGS socket, read request
data from shared guest memory, send the request data to the QGS,
and store the response into shared guest memory, at last notify
TD guest by interrupt.

command line example:
  qemu-system-x86_64 \
    -object '{"qom-type":"tdx-guest","id":"tdx0","quote-generation-socket":{"type": "vsock", "cid":"1","port":"1234"}}' \
    -machine confidential-guest-support=tdx0

Note, above example uses vsock type socket because the QGS we used
implements the vsock socket. It can be other types, like UNIX socket,
which depends on the implementation of QGS.

To avoid no response from QGS server, setup a timer for the transaction.
If timeout, make it an error and interrupt guest. Define the threshold of
time to 30s at present, maybe change to other value if not appropriate.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Codeveloped-by: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Codeveloped-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:16:57 +01:00
Isaku Yamahata
df99750b0d i386/tdx: handle TDG.VP.VMCALL<SetupEventNotifyInterrupt>
For SetupEventNotifyInterrupt, record interrupt vector and the apic id
of the vcpu that received this TDVMCALL.

Later it can inject interrupt with given vector to the specific vcpu
that received SetupEventNotifyInterrupt.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:14:57 +01:00
Xiaoyao Li
ae1705eda4 i386/tdx: Finalize TDX VM
Invoke KVM_TDX_FINALIZE_VM to finalize the TD's measurement and make
the TD vCPUs runnable once machine initialization is complete.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:14:57 +01:00
Xiaoyao Li
2c9915e371 i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu
TDX vcpu needs to be initialized by SEAMCALL(TDH.VP.INIT) and KVM
provides vcpu level IOCTL KVM_TDX_INIT_VCPU for it.

KVM_TDX_INIT_VCPU needs the address of the HOB as input. Invoke it for
each vcpu after HOB list is created.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:14:57 +01:00
Isaku Yamahata
b0886d6973 i386/tdx: Populate TDVF private memory via KVM_MEMORY_MAPPING
TDVF firmware (CODE and VARS) needs to be copied to TD's private
memory, as well as TD HOB and TEMP memory.

If the TDVF section has TDVF_SECTION_ATTRIBUTES_MR_EXTEND set in the
flag, calling KVM_TDX_EXTEND_MEMORY to extend the measurement.

After populating the TDVF memory, the original image located in shared
ramblock can be discarded.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:14:57 +01:00
Xiaoyao Li
3da0f0e494 i386/tdx: Setup the TD HOB list
The TD HOB list is used to pass the information from VMM to TDVF. The TD
HOB must include PHIT HOB and Resource Descriptor HOB. More details can
be found in TDVF specification and PI specification.

Build the TD HOB in TDX's machine_init_done callback.

Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:14:57 +01:00
Xiaoyao Li
7813aa24ad headers: Add definitions from UEFI spec for volumes, resources, etc...
Add UEFI definitions for literals, enums, structs, GUIDs, etc... that
will be used by TDX to build the UEFI Hand-Off Block (HOB) that is passed
to the Trusted Domain Virtual Firmware (TDVF).

All values come from the UEFI specification [1], PI spec [2] and TDVF
design guide[3].

[1] UEFI Specification v2.1.0 https://uefi.org/sites/default/files/resources/UEFI_Spec_2_10_Aug29.pdf
[2] UEFI PI spec v1.8 https://uefi.org/sites/default/files/resources/UEFI_PI_Spec_1_8_March3.pdf
[3] https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.pdf

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:14:56 +01:00
Xiaoyao Li
9fd4ec0cb2 i386/tdx: Track RAM entries for TDX VM
The RAM of TDX VM can be classified into two types:

 - TDX_RAM_UNACCEPTED: default type of TDX memory, which needs to be
   accepted by TDX guest before it can be used and will be all-zeros
   after being accepted.

 - TDX_RAM_ADDED: the RAM that is ADD'ed to TD guest before running, and
   can be used directly. E.g., TD HOB and TEMP MEM that needed by TDVF.

Maintain TdxRamEntries[] which grabs the initial RAM info from e820 table
and mark each RAM range as default type TDX_RAM_UNACCEPTED.

Then turn the range of TD HOB and TEMP MEM to TDX_RAM_ADDED since these
ranges will be ADD'ed before TD runs and no need to be accepted runtime.

The TdxRamEntries[] are later used to setup the memory TD resource HOB
that passes memory info from QEMU to TDVF.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:14:56 +01:00
Xiaoyao Li
11319132f6 i386/tdx: Track mem_ptr for each firmware entry of TDVF
For each TDVF sections, QEMU needs to copy the content to guest
private memory via KVM API (KVM_TDX_INIT_MEM_REGION).

Introduce a field @mem_ptr for TdxFirmwareEntry to track the memory
pointer of each TDVF sections. So that QEMU can add/copy them to guest
private memory later.

TDVF sections can be classified into two groups:
 - Firmware itself, e.g., TDVF BFV and CFV, that located separately from
   guest RAM. Its memory pointer is the bios pointer.

 - Sections located at guest RAM, e.g., TEMP_MEM and TD_HOB.
   mmap a new memory range for them.

Register a machine_init_done callback to do the stuff.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:14:56 +01:00
Xiaoyao Li
118a3e46df i386/tdx: Don't initialize pc.rom for TDX VMs
For TDX, the address below 1MB are entirely general RAM. No need to
initialize pc.rom memory region for TDs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:14:56 +01:00
Xiaoyao Li
a1d39edf05 i386/tdx: Skip BIOS shadowing setup
TDX doesn't support map different GPAs to same private memory. Thus,
aliasing top 128KB of BIOS as isa-bios is not supported.

On the other hand, TDX guest cannot go to real mode, it can work fine
without isa-bios.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:14:56 +01:00
Xiaoyao Li
3c8db2995b i386/tdx: Parse TDVF metadata for TDX VM
After TDVF is loaded to bios MemoryRegion, it needs parse TDVF metadata.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:14:56 +01:00
Isaku Yamahata
94b122086c i386/tdvf: Introduce function to parse TDVF metadata
TDX VM needs to boot with its specialized firmware, Trusted Domain
Virtual Firmware (TDVF). QEMU needs to parse TDVF and map it in TD
guest memory prior to running the TDX VM.

A TDVF Metadata in TDVF image describes the structure of firmware.
QEMU refers to it to setup memory for TDVF. Introduce function
tdvf_parse_metadata() to parse the metadata from TDVF image and store
the info of each TDVF section.

TDX metadata is located by a TDX metadata offset block, which is a
GUID-ed structure. The data portion of the GUID structure contains
only an 4-byte field that is the offset of TDX metadata to the end
of firmware file.

Select X86_FW_OVMF when TDX is enable to leverage existing functions
to parse and search OVMF's GUID-ed structures.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:14:56 +01:00
Chao Peng
3a8358b856 i386/tdx: load TDVF for TD guest
TDVF(OVMF) needs to run at private memory for TD guest. TDX cannot
support pflash device since it doesn't support read-only private memory.
Thus load TDVF(OVMF) with -bios option for TDs.

Use memory_region_init_ram_guest_memfd() to allocate the MemoryRegion
for TDVF because it needs to be located at private memory.

Also store the MemoryRegion pointer of TDVF since the shared ramblock of
it can be discared after it gets copied to private ramblock.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:14:56 +01:00
Xiaoyao Li
a0a25c2263 memory: Introduce memory_region_init_ram_guest_memfd()
Introduce memory_region_init_ram_guest_memfd() to allocate private
guset memfd on the MemoryRegion initialization. It's for the use case of
TDVF, which must be private on TDX case.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:14:56 +01:00
Isaku Yamahata
433562d5e6 kvm/tdx: Ignore memory conversion to shared of unassigned region
TDX requires vMMIO region to be shared.  For KVM, MMIO region is the region
which kvm memslot isn't assigned to (except in-kernel emulation).
qemu has the memory region for vMMIO at each device level.

While OVMF issues MapGPA(to-shared) conservatively on 32bit PCI MMIO
region, qemu doesn't find corresponding vMMIO region because it's before
PCI device allocation and memory_region_find() finds the device region, not
PCI bus region.  It's safe to ignore MapGPA(to-shared) because when guest
accesses those region they use GPA with shared bit set for vMMIO.  Ignore
memory conversion request of non-assigned region to shared and return
success.  Otherwise OVMF is confused and panics there.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:14:56 +01:00
Isaku Yamahata
02d624f780 kvm/tdx: Don't complain when converting vMMIO region to shared
Because vMMIO region needs to be shared region, guest TD may explicitly
convert such region from private to shared.  Don't complain such
conversion.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:14:56 +01:00
Xiaoyao Li
18b319791d i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM
TDX only supports readonly for shared memory but not for private memory.

In the view of QEMU, it has no idea whether a memslot is used as shared
memory of private. Thus just mark kvm_readonly_mem_enabled to false to
TDX VM for simplicity.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:14:56 +01:00
Xiaoyao Li
ae1391d1da i386/tdx: Implement user specified tsc frequency
Reuse "-cpu,tsc-frequency=" to get user wanted tsc frequency and call VM
scope VM_SET_TSC_KHZ to set the tsc frequency of TD before KVM_TDX_INIT_VM.

Besides, sanity check the tsc frequency to be in the legal range and
legal granularity (required by TDX module).

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:14:56 +01:00
Isaku Yamahata
b2c919bcdc i386/tdx: Support user configurable mrconfigid/mrowner/mrownerconfig
Three sha384 hash values, mrconfigid, mrowner and mrownerconfig, of a TD
can be provided for TDX attestation. Detailed meaning of them can be
found: https://lore.kernel.org/qemu-devel/31d6dbc1-f453-4cef-ab08-4813f4e0ff92@intel.com/

Allow user to specify those values via property mrconfigid, mrowner and
mrownerconfig. They are all in base64 format.

example
-object tdx-guest, \
  mrconfigid=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
  mrowner=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
  mrownerconfig=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:14:56 +01:00
Xiaoyao Li
aa7c69b590 i386/tdx: Validate TD attributes
Validate TD attributes with tdx_caps that fixed-0 bits must be zero and
fixed-1 bits must be set.

Besides, sanity check the attribute bits that have not been supported by
QEMU yet. e.g., debug bit, it will be allowed in the future when debug
TD support lands in QEMU.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:14:56 +01:00
Xiaoyao Li
2410afb463 i386/tdx: Disable pmu for TD guest
Current KVM doesn't support PMU for TD guest. It returns error if TD is
created with PMU bit being set in attributes.

Disable PMU for TD guest on QEMU side.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:14:56 +01:00
Xiaoyao Li
ab9bc0b91a i386/tdx: Wire CPU features up with attributes of TD guest
For QEMU VMs, PKS is configured via CPUID_7_0_ECX_PKS and PMU is
configured by x86cpu->enable_pmu. Reuse the existing configuration
interface for TDX VMs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:14:56 +01:00
Isaku Yamahata
45a5878924 i386/tdx: Make sept_ve_disable set by default
For TDX KVM use case, Linux guest is the most major one.  It requires
sept_ve_disable set.  Make it default for the main use case.  For other use
case, it can be enabled/disabled via qemu command line.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
2024-03-26 17:14:56 +01:00
Xiaoyao Li
05d9e2102f i386/tdx: Add property sept-ve-disable for tdx-guest object
Bit 28 of TD attribute, named SEPT_VE_DISABLE. When set to 1, it disables
EPT violation conversion to #VE on guest TD access of PENDING pages.

Some guest OS (e.g., Linux TD guest) may require this bit as 1.
Otherwise refuse to boot.

Add sept-ve-disable property for tdx-guest object, for user to configure
this bit.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
2024-03-26 17:14:56 +01:00
Xiaoyao Li
5e74fb7a2c i386/tdx: Initialize TDX before creating TD vcpus
Invoke KVM_TDX_INIT in kvm_arch_pre_create_vcpu() that KVM_TDX_INIT
configures global TD configurations, e.g. the canonical CPUID config,
and must be executed prior to creating vCPUs.

Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM.

Note, this doesn't address the fact that QEMU may change the CPUID
configuration when creating vCPUs, i.e. punts on refactoring QEMU to
provide a stable CPUID config prior to kvm_arch_init().

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
2024-03-26 17:14:54 +01:00
Xiaoyao Li
5ca74db70f kvm: Introduce kvm_arch_pre_create_vcpu()
Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
work prior to create any vcpu. This is for i386 TDX because it needs
call TDX_INIT_VM before creating any vcpu.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:13:49 +01:00
Sean Christopherson
796bcd9463 i386/kvm: Move architectural CPUID leaf generation to separate helper
Move the architectural (for lack of a better term) CPUID leaf generation
to a separate helper so that the generation code can be reused by TDX,
which needs to generate a canonical VM-scoped configuration.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:13:49 +01:00
Xiaoyao Li
2f3ccbe5b5 i386/tdx: Integrate tdx_caps->attrs_fixed0/1 to tdx_cpuid_lookup
Some bits in TD attributes have corresponding CPUID feature bits. Reflect
the fixed0/1 restriction on TD attributes to their corresponding CPUID
bits in tdx_cpuid_lookup[] as well.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:13:49 +01:00
Xiaoyao Li
83c0f0e38c i386/tdx: Integrate tdx_caps->xfam_fixed0/1 into tdx_cpuid_lookup
KVM requires userspace to pass XFAM configuration via CPUID 0xD leaves.

Convert tdx_caps->xfam_fixed0/1 into corresponding
tdx_cpuid_lookup[].tdx_fixed0/1 field of CPUID 0xD leaves. Thus the
requirement can be applied naturally.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:13:49 +01:00
Xiaoyao Li
6b412efd3a i386/tdx: Update tdx_cpuid_lookup[].tdx_fixed0/1 by tdx_caps.cpuid_config[]
tdx_cpuid_lookup[].tdx_fixed0/1 is QEMU maintained data which reflects
TDX restrictions regrading what bits are fixed by TDX module.

It's retrieved from TDX spec and static. However, TDX may evolve and
change some fixed fields to configurable in the future. Update
tdx_cpuid.lookup[].tdx_fixed0/1 fields by removing the bits that
reported from TDX module as configurable. This can adapt with the
updated TDX (module) automatically.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:13:49 +01:00
Xiaoyao Li
9357a3d329 i386/tdx: Make Intel-PT unsupported for TD guest
Due to the fact that Intel-PT virtualization support has been broken in
QEMU since Sapphire Rapids generation[1], below warning is triggered when
luanching TD guest:

  warning: host doesn't support requested feature: CPUID.07H:EBX.intel-pt [bit 25]

Before Intel-pt is fixed in QEMU, just make Intel-PT unsupported for TD
guest, to avoid the confusing warning.

[1] https://lore.kernel.org/qemu-devel/20230531084311.3807277-1-xiaoyao.li@intel.com/

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:13:49 +01:00
Xiaoyao Li
13d5ea1836 i386/tdx: Adjust the supported CPUID based on TDX restrictions
According to Chapter "CPUID Virtualization" in TDX module spec, CPUID
bits of TD can be classified into 6 types:

------------------------------------------------------------------------
1 | As configured | configurable by VMM, independent of native value;
------------------------------------------------------------------------
2 | As configured | configurable by VMM if the bit is supported natively
    (if native)   | Otherwise it equals as native(0).
------------------------------------------------------------------------
3 | Fixed         | fixed to 0/1
------------------------------------------------------------------------
4 | Native        | reflect the native value
------------------------------------------------------------------------
5 | Calculated    | calculated by TDX module.
------------------------------------------------------------------------
6 | Inducing #VE  | get #VE exception
------------------------------------------------------------------------

Note:
1. All the configurable XFAM related features and TD attributes related
   features fall into type #2. And fixed0/1 bits of XFAM and TD
   attributes fall into type #3.

2. For CPUID leaves not listed in "CPUID virtualization Overview" table
   in TDX module spec, TDX module injects #VE to TDs when those are
   queried. For this case, TDs can request CPUID emulation from VMM via
   TDVMCALL and the values are fully controlled by VMM.

Due to TDX module has its own virtualization policy on CPUID bits, it leads
to what reported via KVM_GET_SUPPORTED_CPUID diverges from the supported
CPUID bits for TDs. In order to keep a consistent CPUID configuration
between VMM and TDs. Adjust supported CPUID for TDs based on TDX
restrictions.

Currently only focus on the CPUID leaves recognized by QEMU's
feature_word_info[] that are indexed by a FeatureWord.

Introduce a TDX CPUID lookup table, which maintains 1 entry for each
FeatureWord. Each entry has below fields:

 - tdx_fixed0/1: The bits that are fixed as 0/1;

 - depends_on_vmm_cap: The bits that are configurable from the view of
		       TDX module. But they requires emulation of VMM
		       when configured as enabled. For those, they are
		       not supported if VMM doesn't report them as
		       supported. So they need be fixed up by checking
		       if VMM supports them.

 - inducing_ve: TD gets #VE when querying this CPUID leaf. The result is
                totally configurable by VMM.

 - supported_on_ve: It's valid only when @inducing_ve is true. It represents
		    the maximum feature set supported that be emulated
		    for TDs.

By applying TDX CPUID lookup table and TDX capabilities reported from
TDX module, the supported CPUID for TDs can be obtained from following
steps:

- get the base of VMM supported feature set;

- if the leaf is not a FeatureWord just return VMM's value without
  modification;

- if the leaf is an inducing_ve type, applying supported_on_ve mask and
  return;

- include all native bits, it covers type #2, #4, and parts of type #1.
  (it also includes some unsupported bits. The following step will
   correct it.)

- apply fixed0/1 to it (it covers #3, and rectifies the previous step);

- add configurable bits (it covers the other part of type #1);

- fix the ones in vmm_fixup;

(Calculated type is ignored since it's determined at runtime).

Co-developed-by: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:13:49 +01:00
Xiaoyao Li
0b0493be25 i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object
It will need special handling for TDX VMs all around the QEMU.
Introduce is_tdx_vm() helper to query if it's a TDX VM.

Cache tdx_guest object thus no need to cast from ms->cgs every time.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
2024-03-26 17:13:49 +01:00
Xiaoyao Li
1f15e4c9f0 i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
KVM provides TDX capabilities via sub command KVM_TDX_CAPABILITIES of
IOCTL(KVM_MEMORY_ENCRYPT_OP). Get the capabilities when initializing
TDX context. It will be used to validate user's setting later.

Since there is no interface reporting how many cpuid configs contains in
KVM_TDX_CAPABILITIES, QEMU chooses to try starting with a known number
and abort when it exceeds KVM_MAX_CPUID_ENTRIES.

Besides, introduce the interfaces to invoke TDX "ioctls" at different
scope (KVM, VM and VCPU) in preparation.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:13:49 +01:00
Xiaoyao Li
a14b9df0cc i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context
Implement TDX specific ConfidentialGuestSupportClass::kvm_init()
callback, tdx_kvm_init().

Set ms->require_guest_memfd to true to require private guest memfd
allocation for any memory backend.

More TDX specific initialization will be added later.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:13:49 +01:00
Xiaoyao Li
baab233ea4 target/i386: Implement mc->kvm_type() to get VM type
TDX VM requires VM type KVM_X86_TDX_VM to be passed to
kvm_ioctl(KVM_CREATE_VM). Hence implement mc->kvm_type() for i386
architecture.

If tdx-guest object is specified to confidential-guest-support, like,

  qemu -machine ...,confidential-guest-support=tdx0 \
       -object tdx-guest,id=tdx0,...

it parses VM type as KVM_X86_TDX_VM. Otherwise, it's KVM_X86_DEFAULT_VM.

Also store the vm_type in MachineState for other code to query what the
VM type is.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2024-03-26 17:13:49 +01:00
Xiaoyao Li
27e07ada8b i386: Introduce tdx-guest object
Introduce tdx-guest object which inherits CONFIDENTIAL_GUEST_SUPPORT,
and will be used to create TDX VMs (TDs) by

  qemu -machine ...,confidential-guest-support=tdx0	\
       -object tdx-guest,id=tdx0

So far, it has no QAPI member/properety decleared and only one internal
member 'attributes' with fixed value 0 that not configurable.

QAPI properties will be added later.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
2024-03-26 17:13:44 +01:00
Xiaoyao Li
d8e31efb54 *** HACK *** linux-headers: Update headers to pull in TDX API changes
Pull in recent TDX updates, which are not backwards compatible.

It's just to make this series runnable. It will be updated by script

	scripts/update-linux-headers.sh

once TDX support is upstreamed in linux kernel

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:03:16 +01:00
Xiaoyao Li
3152a855e2 kvm/memory: Make memory type private by default if it has guest memfd backend
KVM side leaves the memory to shared by default, while may incur the
overhead of paging conversion on the first visit of each page. Because
the expectation is that page is likely to private for the VMs that
require private memory (has guest memfd).

Explicitly set the memory to private when memory region has valid
guest memfd backend.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:03:16 +01:00
Isaku Yamahata
5e897fcf5f trace/kvm: Add trace for page convertion between shared and private
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:03:16 +01:00
Chao Peng
363c43d251 kvm: handle KVM_EXIT_MEMORY_FAULT
When geeting KVM_EXIT_MEMORY_FAULT exit, it indicates userspace needs to
do the memory conversion on the RAMBlock to turn the memory into desired
attribute, i.e., private/shared.

Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
KVM_EXIT_MEMORY_FAULT happens.

Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
guest_memfd memory backend.

Note, KVM_EXIT_MEMORY_FAULT returns with -EFAULT, so special handling is
added.

When page is converted from shared to private, the original shared
memory can be discarded via ram_block_discard_range(). Note, shared
memory can be discarded only when it's not back'ed by hugetlb because
hugetlb is supposed to be pre-allocated and no need for discarding.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
2024-03-26 17:03:16 +01:00
Xiaoyao Li
b359aee12a physmem: Introduce ram_block_discard_guest_memfd_range()
When memory page is converted from private to shared, the original
private memory is back'ed by guest_memfd. Introduce
ram_block_discard_guest_memfd_range() for discarding memory in
guest_memfd.

Originally-from: Isaku Yamahata <isaku.yamahata@intel.com>
Codeveloped-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2024-03-26 17:03:16 +01:00
Xiaoyao Li
e1f1caf9d4 kvm: Introduce support for memory_attributes
Introduce the helper functions to set the attributes of a range of
memory to private or shared.

This is necessary to notify KVM the private/shared attribute of each gpa
range. KVM needs the information to decide the GPA needs to be mapped at
hva-based shared memory or guest_memfd based private memory.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:03:16 +01:00
Chao Peng
1b8cbfe6ee kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM.

With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that
backend'ed both by hva-based shared memory and guest memfd based private
memory.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:03:16 +01:00
Xiaoyao Li
28cf6e1fe0 trace/kvm: Split address space and slot id in trace_kvm_set_user_memory()
The upper 16 bits of kvm_userspace_memory_region::slot are
address space id. Parse it separately in trace_kvm_set_user_memory().

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
2024-03-26 17:03:16 +01:00
Xiaoyao Li
cdbcc2d68c HostMem: Add mechanism to opt in kvm guest memfd via MachineState
Add a new member "guest_memfd" to memory backends. When it's set
to true, it enables RAM_GUEST_MEMFD in ram_flags, thus private kvm
guest_memfd will be allocated during RAMBlock allocation.

Memory backend's @guest_memfd is wired with @require_guest_memfd
field of MachineState. It avoid looking up the machine in phymem.c.

MachineState::require_guest_memfd is supposed to be set by any VMs
that requires KVM guest memfd as private memory, e.g., TDX VM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2024-03-26 17:03:16 +01:00
Xiaoyao Li
e3da9ef694 RAMBlock: Add support of KVM private guest memfd
Add KVM guest_memfd support to RAMBlock so both normal hva based memory
and kvm guest memfd based private memory can be associated in one RAMBlock.

Introduce new flag RAM_GUEST_MEMFD. When it's set, it calls KVM ioctl to
create private guest_memfd during RAMBlock setup.

Allocating a new RAM_GUEST_MEMFD flag to instruct the setup of guest memfd
is more flexible and extensible than simply relying on the VM type because
in the future we may have the case that not all the memory of a VM need
guest memfd. As a benefit, it also avoid getting MachineState in memory
subsystem.

Note, RAM_GUEST_MEMFD is supposed to be set for memory backends of
confidential guests, such as TDX VM. How and when to set it for memory
backends will be implemented in the following patches.

Introduce memory_region_has_guest_memfd() to query if the MemoryRegion has
KVM guest_memfd allocated.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2024-03-26 17:03:16 +01:00
Xiaoyao Li
2e638a41c1 linux-headers: Update to Linux v6.8-rc5
Guest memfd support in QEMU requires corresponding KVM guest memfd APIs,
which lands in Linux from v6.8-rc1.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
2024-03-26 17:03:16 +01:00
Xiaoyao Li
2dd11a66a3 s390: Switch to use confidential_guest_kvm_init()
Use unified confidential_guest_kvm_init(), to avoid exposing specific
functions.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes from rfc v1:
 - check machine->cgs not NULL before calling confidential_guest_kvm_init();
2024-03-26 17:03:12 +01:00
Xiaoyao Li
6c0cc4975e ppc/pef: switch to use confidential_guest_kvm_init/reset()
Use the unified interface to call confidential guest related kvm_init()
and kvm_reset(), to avoid exposing pef specific functions.

remove perf.h since it is now blank..

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes from rfc v1:
 - check machine->cgs not NULL before callling
   confidential_guest_kvm_init/reset();
2024-03-26 17:03:05 +01:00
Xiaoyao Li
88f5e47bdc i386/sev: Switch to use confidential_guest_kvm_init()
Use confidential_guest_kvm_init() instead of calling SEV specific
sev_kvm_init(). As a bouns, it fits to future TDX when TDX implements
its own confidential_guest_support and .kvm_init().

Move the "TypeInfo sev_guest_info" definition and related functions to
the end of the file, to avoid declaring the sev_kvm_init() ahead.

Delete the sve-stub.c since it's not needed anymore.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes from rfc v1:
- check ms->cgs not NULL before calling confidential_guest_kvm_init();
- delete the sev-stub.c;
2024-03-26 17:02:54 +01:00
Xiaoyao Li
b3f4d68667 confidential guest support: Add kvm_init() and kvm_reset() in class
Different confidential VMs in different architectures all have the same
needs to do their specific initialization (and maybe resetting) stuffs
with KVM. Currently each of them exposes individual *_kvm_init()
functions and let machine code or kvm code to call it.

To make it more object oriented, add two virtual functions, kvm_init()
and kvm_reset() in ConfidentialGuestSupportClass, and expose two helpers
functions for invodking them.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes since rfc v1:
- Drop the NULL check and rely the check from the caller;
2024-03-26 17:02:34 +01:00
Bernhard Beschow
cddbb95e8d hw/i386/pc_q35: Populate interrupt handlers before realizing LPC PCI function
The interrupt handlers need to be populated before the device is realized since
internal devices such as the RTC are wired during realize(). If the interrupt
handlers aren't populated, devices such as the RTC will be wired with a NULL
interrupt handler, i.e. MC146818RtcState::irq is NULL.

Fixes: fc11ca08bc "hw/i386/q35: Realize LPC PCI function before accessing it"

Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-ID: <20240217104644.19755-1-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 143f3fd3d8)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-26 10:53:50 +01:00
Philippe Mathieu-Daudé
20a4ef0b4b hw/i386/pc_sysfw: Use qdev_is_realized() instead of QOM API
Prefer QDev API for QDev objects, avoid the underlying QOM layer.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20240216110313.17039-3-philmd@linaro.org>
(cherry picked from commit 58183abfe7)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-26 10:53:50 +01:00
Philippe Mathieu-Daudé
4e70c6cffd hw/i386/q35: Realize LPC PCI function before accessing it
We should not wire IRQs on unrealized device.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Damien Hedde <dhedde@kalrayinc.com>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240213130341.1793-5-philmd@linaro.org>
(cherry picked from commit fc11ca08bc)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-26 10:51:06 +01:00
Bernhard Beschow
0ad7dd2a3b hw/i386/pc_sysfw: Inline pc_system_flash_create() and remove it
pc_system_flash_create() checked for pcmc->pci_enabled which is redundant since
its caller already checked it. The method can be turned into just two lines, so
inline and remove it.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240208220349.4948-8-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit cb05cc1602)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-26 10:47:55 +01:00
Bernhard Beschow
a2be8f46c2 hw/i386/pc: Confine system flash handling to pc_sysfw
Rather than distributing PC system flash handling across three files, let's
confine it to one. Now, pc_system_firmware_init() creates, configures and cleans
up the system flash which makes the code easier to understand. It also avoids
the extra call to pc_system_flash_cleanup_unused() in the Xen case.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240208220349.4948-7-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 6f6ad2b245)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-26 10:46:39 +01:00
Bernhard Beschow
f2858dc109 hw/i386/pc: Defer smbios_set_defaults() to machine_done
Handling most of smbios data generation in the machine_done notifier is similar
to how the ARM virt machine handles it which also calls smbios_set_defaults()
there. The result is that all pc machines are freed from explicitly worrying
about smbios setup.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240208220349.4948-6-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit a0204a5ed0)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-26 10:46:21 +01:00
Bernhard Beschow
0e50e2590d hw/i386/pc: Merge pc_guest_info_init() into pc_machine_initfn()
Resolves redundant code in the piix and q35 machines.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240208220349.4948-5-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 4d3457fef9)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-26 10:46:01 +01:00
Bernhard Beschow
cf15fc547f hw/i386/x86: Turn apic_xrupt_override into class attribute
The attribute isn't user-changeable and only true for pc-based machines. Turn it
into a class attribute which allows for inlining pc_guest_info_init() into
pc_machine_initfn().

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240208220349.4948-4-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 6e6d59a94d)
Referencex: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-26 10:45:37 +01:00
Bernhard Beschow
4faecddc93 hw/i386/pc_piix: Share pc_cmos_init() invocation between pc and isapc machines
Both invocations are the same and either one is always executed. Avoid this
redundancy.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240208220349.4948-3-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 16bd024bb4)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-26 10:44:05 +01:00
Bernhard Beschow
603f0584ab hw/i386/x86: Let ioapic_init_gsi() take parent as pointer
Rather than taking a QOM name which has to be resolved, let's pass the parent
directly as pointer. This simplifies the code.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240224135851.100361-2-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 9b0c44334c)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-26 10:43:18 +01:00
Philippe Mathieu-Daudé
0cc701c26d hw/i386/q35: Simplify pc_q35_init() since PCI is always enabled
We can not create the Q35 machine without PCI, so simplify
pc_q35_init() removing pointless checks.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240213041952.58840-1-philmd@linaro.org>
(cherry picked from commit 88ad980c0f)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-26 10:42:47 +01:00
Xiaoyao Li
2e8a084f49 i386/cpuid: Remove subleaf constraint on CPUID leaf 1F
No such constraint that subleaf index needs to be less than 64.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by:Yang Weijiang <weijiang.yang@intel.com>
Message-ID: <20240125024016.2521244-3-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a3b5376521)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-26 10:17:47 +01:00
Jai Arora
ee127024c2 accel/kvm: Turn DPRINTF macro use into tracepoints
Patch removes DPRINTF macro and adds multiple tracepoints
to capture different kvm events.

We also drop the DPRINTFs that don't add any additional
information than trace_kvm_run_exit already does.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1827

Signed-off-by: Jai Arora <arorajai2798@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 9cdfb1e3a5)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-26 09:44:01 +01:00
Philippe Mathieu-Daudé
12c0f00dc9 hw/pci-host/raven: Propagate error in raven_realize()
When an Error** reference is available, it is better to
propagate local errors, rather then using generic ones,
which might terminate the whole QEMU process.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-26-philmd@linaro.org>
(cherry picked from commit cb50fc6842)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:24:09 +01:00
Philippe Mathieu-Daudé
b9014ea6e7 hw/nvram: Simplify memory_region_init_rom_device() calls
Mechanical change using the following coccinelle script:

@@
expression mr, owner, arg3, arg4, arg5, arg6, errp;
@@
-   memory_region_init_rom_device(mr, owner, arg3, arg4, arg5, arg6, &errp);
    if (
-       errp
+       !memory_region_init_rom_device(mr, owner, arg3, arg4, arg5, arg6, &errp)
    ) {
        ...
        return;
    }

and removing the local Error variable.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-25-philmd@linaro.org>
(cherry picked from commit ca1b876292)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:23:48 +01:00
Philippe Mathieu-Daudé
9fbb99a0ce hw/misc: Simplify memory_region_init_ram_from_fd() calls
Mechanical change using the following coccinelle script:

@@
expression mr, owner, arg3, arg4, arg5, arg6, arg7, errp;
@@
-   memory_region_init_ram_from_fd(mr, owner, arg3, arg4, arg5, arg6, arg7, &errp);
    if (
-       errp
+       !memory_region_init_ram_from_fd(mr, owner, arg3, arg4, arg5, arg6, arg7, &errp)
    ) {
        ...
        return;
    }

and removing the local Error variable.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-24-philmd@linaro.org>
(cherry picked from commit 7493bd184e)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:23:25 +01:00
Philippe Mathieu-Daudé
8246d13768 hw/sparc: Simplify memory_region_init_ram_nomigrate() calls
Mechanical change using the following coccinelle script:

@@
expression mr, owner, arg3, arg4, errp;
@@
-   memory_region_init_ram_nomigrate(mr, owner, arg3, arg4, &errp);
    if (
-       errp
+       !memory_region_init_ram_nomigrate(mr, owner, arg3, arg4, &errp)
    ) {
        ...
        return;
    }

and removing the local Error variable.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-23-philmd@linaro.org>
(cherry picked from commit 02e0ecb42c)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:23:08 +01:00
Philippe Mathieu-Daudé
7fd1a517b6 hw/arm: Simplify memory_region_init_rom() calls
Mechanical change using the following coccinelle script:

@@
expression mr, owner, arg3, arg4, errp;
@@
-   memory_region_init_rom(mr, owner, arg3, arg4, &errp);
    if (
-       errp
+       !memory_region_init_rom(mr, owner, arg3, arg4, &errp)
    ) {
        ...
        return;
    }

and removing the local Error variable.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-22-philmd@linaro.org>
(cherry picked from commit 419d8524a2)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:22:48 +01:00
Philippe Mathieu-Daudé
f49a1505e9 hw: Simplify memory_region_init_ram() calls
Mechanical change using the following coccinelle script:

@@
expression mr, owner, arg3, arg4, errp;
@@
-   memory_region_init_ram(mr, owner, arg3, arg4, &errp);
    if (
-       errp
+       !memory_region_init_ram(mr, owner, arg3, arg4, &errp)
    ) {
        ...
        return;
    }

and removing the local Error variable.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Andrew Jeffery <andrew@codeconstruct.com.au> # aspeed
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-21-philmd@linaro.org>
(cherry picked from commit 2198f5f0f2)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:22:27 +01:00
Philippe Mathieu-Daudé
c7d60cec4a misc: Simplify qemu_prealloc_mem() calls
Since qemu_prealloc_mem() returns whether or not an error
occured, we don't need to check the @errp pointer. Remove
local_err uses when we can return directly.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-20-philmd@linaro.org>
(cherry picked from commit 9c878ad6fb)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:22:10 +01:00
Philippe Mathieu-Daudé
d816ca84fc util/oslib: Have qemu_prealloc_mem() handler return a boolean
Following the example documented since commit e3fe3988d7 ("error:
Document Error API usage rules"), have qemu_prealloc_mem()
return a boolean indicating whether an error is set or not.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-19-philmd@linaro.org>
(cherry picked from commit b622ee98bf)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:21:33 +01:00
Philippe Mathieu-Daudé
ea8f948b5b backends: Reduce variable scope in host_memory_backend_memory_complete
Reduce the &local_err variable use and remove the 'out:' label.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-18-philmd@linaro.org>
(cherry picked from commit 3961613a76)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:21:10 +01:00
Philippe Mathieu-Daudé
21eadb015d backends: Have HostMemoryBackendClass::alloc() handler return a boolean
Following the example documented since commit e3fe3988d7 ("error:
Document Error API usage rules"), have HostMemoryBackendClass::alloc
return a boolean indicating whether an error is set or not.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-17-philmd@linaro.org>
(cherry picked from commit fdb63cf3b5)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:20:42 +01:00
Philippe Mathieu-Daudé
455ea0ddd0 backends: Simplify host_memory_backend_memory_complete()
Return early if bc->alloc is NULL. De-indent the if() ladder.

Note, this avoids a pointless call to error_propagate() with
errp=NULL at the 'out:' label.

Change trivial when reviewed with 'git-diff --ignore-all-space'.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-16-philmd@linaro.org>
(cherry picked from commit e199f7ad4d)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:20:27 +01:00
Philippe Mathieu-Daudé
21c997995e backends: Use g_autofree in HostMemoryBackendClass::alloc() handlers
In preparation of having HostMemoryBackendClass::alloc() handlers
return a boolean, have them use g_autofree.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-15-philmd@linaro.org>
(cherry picked from commit 2d7a1eb6e6)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:20:10 +01:00
Philippe Mathieu-Daudé
6a1f916145 memory: Have memory_region_init_ram_from_fd() handler return a boolean
Following the example documented since commit e3fe3988d7 ("error:
Document Error API usage rules"), have memory_region_init_ram_from_fd
return a boolean indicating whether an error is set or not.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-14-philmd@linaro.org>
(cherry picked from commit 9583a90579)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:19:51 +01:00
Philippe Mathieu-Daudé
5c723c4d4f memory: Have memory_region_init_ram_from_file() handler return a boolean
Following the example documented since commit e3fe3988d7 ("error:
Document Error API usage rules"), have memory_region_init_ram_from_file
return a boolean indicating whether an error is set or not.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-13-philmd@linaro.org>
(cherry picked from commit 9b9d11ac03)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:19:35 +01:00
Philippe Mathieu-Daudé
f354f3391d memory: Have memory_region_init_resizeable_ram() return a boolean
Following the example documented since commit e3fe3988d7 ("error:
Document Error API usage rules"), have memory_region_init_resizeable_ram
return a boolean indicating whether an error is set or not.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-12-philmd@linaro.org>
(cherry picked from commit f25a9fbb64)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:19:14 +01:00
Philippe Mathieu-Daudé
12bc0f50e7 memory: Have memory_region_init_rom_device() handler return a boolean
Following the example documented since commit e3fe3988d7 ("error:
Document Error API usage rules"), have memory_region_init_rom_device
return a boolean indicating whether an error is set or not.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-11-philmd@linaro.org>
(cherry picked from commit 62f5c1b234)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:18:56 +01:00
Philippe Mathieu-Daudé
2917525f2a memory: Simplify memory_region_init_rom_device_nomigrate() calls
Mechanical change using the following coccinelle script:

@@
expression mr, owner, arg3, arg4, arg5, arg6, errp;
@@
-   memory_region_init_rom_device_nomigrate(mr, owner, arg3, arg4, arg5, arg6, &errp);
    if (
-       errp
+       !memory_region_init_rom_device_nomigrate(mr, owner, arg3, arg4, arg5, arg6, &errp)
    ) {
        ...
        return;
    }

and removing the local Error variable.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-10-philmd@linaro.org>
(cherry picked from commit bd3aa06950)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:18:37 +01:00
Philippe Mathieu-Daudé
35d3129006 memory: Have memory_region_init_rom_device_nomigrate() return a boolean
Following the example documented since commit e3fe3988d7
("error: Document Error API usage rules"), have
memory_region_init_rom_device_nomigrate() return a boolean
indicating whether an error is set or not.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-9-philmd@linaro.org>
(cherry picked from commit ae076b6c39)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:18:18 +01:00
Philippe Mathieu-Daudé
df6b2acc8b memory: Have memory_region_init_rom() handler return a boolean
Following the example documented since commit e3fe3988d7 ("error:
Document Error API usage rules"), have memory_region_init_rom()
return a boolean indicating whether an error is set or not.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-8-philmd@linaro.org>
(cherry picked from commit b9159451d3)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:17:58 +01:00
Philippe Mathieu-Daudé
7899521f1c memory: Have memory_region_init_ram() handler return a boolean
Following the example documented since commit e3fe3988d7 ("error:
Document Error API usage rules"), have memory_region_init_ram()
return a boolean indicating whether an error is set or not.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-7-philmd@linaro.org>
(cherry picked from commit fe5f33d6b0)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:17:39 +01:00
Philippe Mathieu-Daudé
ac7d60bdd8 memory: Simplify memory_region_init_ram_from_fd() calls
Mechanical change using the following coccinelle script:

@@
expression mr, owner, arg3, arg4, arg5, arg6, arg7, errp;
@@
-   memory_region_init_ram_from_fd(mr, owner, arg3, arg4, arg5, arg6, arg7, &errp);
    if (
-       errp
+       !memory_region_init_ram_from_fd(mr, owner, arg3, arg4, arg5, arg6, arg7, &errp)
    ) {
        ...
        return;
    }

and removing the local Error variable.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-6-philmd@linaro.org>
(cherry picked from commit d3143bd531)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:17:21 +01:00
Philippe Mathieu-Daudé
6aaba1df5d memory: Simplify memory_region_init_rom_nomigrate() calls
Mechanical change using the following coccinelle script:

@@
expression mr, owner, arg3, arg4, errp;
@@
-   memory_region_init_rom_nomigrate(mr, owner, arg3, arg4, &errp);
    if (
-       errp
+       !memory_region_init_rom_nomigrate(mr, owner, arg3, arg4, &errp)
    ) {
        ...
        return;
    }

and removing the local Error variable.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-5-philmd@linaro.org>
(cherry picked from commit fd7549ee13)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:16:58 +01:00
Philippe Mathieu-Daudé
7d33a9126a memory: Have memory_region_init_rom_nomigrate() handler return a boolean
Following the example documented since commit e3fe3988d7 ("error:
Document Error API usage rules"), have memory_region_init_rom_nomigrate
return a boolean indicating whether an error is set or not.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-4-philmd@linaro.org>
[PMD: Only update 'readonly' field on success (Manos Pitsidianakis)]
Message-Id: <af352e7d-3346-4705-be77-6eed86858d18@linaro.org>
(cherry picked from commit 197faa7006)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:16:37 +01:00
Philippe Mathieu-Daudé
40e2695f58 memory: Have memory_region_init_ram_nomigrate() handler return a boolean
Following the example documented since commit e3fe3988d7 ("error:
Document Error API usage rules"), have memory_region_init_ram_nomigrate
return a boolean indicating whether an error is set or not.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-3-philmd@linaro.org>
(cherry picked from commit 62c19b72c7)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:16:18 +01:00
Philippe Mathieu-Daudé
e10e0a7eb4 memory: Have memory_region_init_ram_flags_nomigrate() return a boolean
Following the example documented since commit e3fe3988d7 ("error:
Document Error API usage rules"), have memory_region_init_ram_nomigrate
return a boolean indicating whether an error is set or not.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-2-philmd@linaro.org>
(cherry picked from commit cbbc434023)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 22:15:59 +01:00
William Roche
476810b8e9 migration: prevent migration when VM has poisoned memory
A memory page poisoned from the hypervisor level is no longer readable.
The migration of a VM will crash Qemu when it tries to read the
memory address space and stumbles on the poisoned page with a similar
stack trace:

Program terminated with signal SIGBUS, Bus error.

To avoid this VM crash during the migration, prevent the migration
when a known hardware poison exists on the VM.

Signed-off-by: William Roche <william.roche@oracle.com>
Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit 06152b89db)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 18:11:59 +01:00
Tianrui Zhao
e3df38cf80 linux-headers: Synchronize linux headers from linux v6.7.0-rc8
Use the scripts/update-linux-headers.sh to synchronize linux
headers from linux v6.7.0-rc8. We mainly want to add the
loongarch linux headers and then add the loongarch kvm support
based on it.

Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20240105075804.1228596-2-zhaotianrui@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>
(cherry picked from commit 5817db6890)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 17:55:41 +01:00
Daniel Henrique Barboza
625bf5ceef linux-headers: riscv: add ptrace.h
KVM vector support for RISC-V requires the linux-header ptrace.h.

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20231218204321.75757-3-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 1583ca8aa6)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 17:54:52 +01:00
Daniel Henrique Barboza
18e4e0be1f linux-headers: Update to Linux v6.7-rc5
We'll add a new RISC-V linux-header file, but first let's update all
headers.

Headers for 'asm-loongarch' were added in this update.

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20231218204321.75757-2-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit efb91426af)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 17:52:56 +01:00
Akihiko Odaki
158c1626b7 hw/nvme: Use pcie_sriov_num_vfs() (bsc#1220065, CVE-2024-26328)
nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
configurations to know the number of VFs being disabled due to SR-IOV
configuration writes, but the logic was flawed and resulted in
out-of-bound memory access.

It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
VFs, but it actually doesn't in the following cases:
- PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
- PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
- VFs were only partially enabled because of realization failure.

It is a responsibility of pcie_sriov to interpret SR-IOV configurations
and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
provides, to get the number of enabled VFs before and after SR-IOV
configuration writes.

Cc: qemu-stable@nongnu.org
Fixes: CVE-2024-26328
Fixes: 11871f53ef ("hw/nvme: Add support for the Virtualization Management command")
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240228-reuse-v8-1-282660281e60@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 91bb64a8d2)
References: bsc#1220065
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-25 17:40:04 +01:00
aad9841219 [openSUSE] Update version to 8.2.2
Update to latest stable release (8.2.2).

Full changelog here:
 https://lore.kernel.org/qemu-devel/1709577077.783602.1474596.nullmailer@tls.msk.ru/

Upstream backports:
 chardev/char-socket: Fix TLS io channels sending too much data to the backend
 tests/unit/test-util-sockets: Remove temporary file after test
 hw/usb/bus.c: PCAP adding 0xA in Windows version
 hw/intc/Kconfig: Fix GIC settings when using "--without-default-devices"
 gitlab: force allow use of pip in Cirrus jobs
 tests/vm: avoid re-building the VM images all the time
 tests/vm: update openbsd image to 7.4
 target/i386: leave the A20 bit set in the final NPT walk
 target/i386: remove unnecessary/wrong application of the A20 mask
 target/i386: Fix physical address truncation
 target/i386: check validity of VMCB addresses
 target/i386: mask high bits of CR3 in 32-bit mode
 pl031: Update last RTCLR value on write in case it's read back
 hw/nvme: fix invalid endian conversion
 update edk2 binaries to edk2-stable202402
 update edk2 submodule to edk2-stable202402
 target/ppc: Fix crash on machine check caused by ifetch
 target/ppc: Fix lxv/stxv MSR facility check
 .gitlab-ci.d/windows.yml: Drop msys2-32bit job
 system/vl: Update description for input grab key
 docs/system: Update description for input grab key
 hw/hppa/Kconfig: Fix building with "configure --without-default-devices"
 tests/qtest: Depend on dbus_display1_dep
 meson: Explicitly specify dbus-display1.h dependency
 audio: Depend on dbus_display1_dep
 ui/console: Fix console resize with placeholder surface
 ui/clipboard: add asserts for update and request
 ui/clipboard: mark type as not available when there is no data
 ui: reject extended clipboard message if not activated
 target/i386: Generate an illegal opcode exception on cmp instructions with lock prefix
 i386/cpuid: Move leaf 7 to correct group
 i386/cpuid: Decrease cpuid_i when skipping CPUID leaf 1F
 i386/cpu: Mask with XCR0/XSS mask for FEAT_XSAVE_XCR0_HI and FEAT_XSAVE_XSS_HI leafs
 i386/cpu: Clear FEAT_XSAVE_XSS_LO/HI leafs when CPUID_EXT_XSAVE is not available
 .gitlab-ci/windows.yml: Don't install libusb or spice packages on 32-bit
 iotests: Make 144 deterministic again
 target/arm: Don't get MDCR_EL2 in pmu_counter_enabled() before checking ARM_FEATURE_PMU
 target/arm: Fix SVE/SME gross MTE suppression checks
 target/arm: Handle mte in do_ldrq, do_ldro
 target/arm: Split out make_svemte_desc
 target/arm: Adjust and validate mtedesc sizem1
 target/arm: Fix nregs computation in do_{ld,st}_zpa
 linux-user/aarch64: Choose SYNC as the preferred MTE mode
 tests/acpi: Update DSDT.cxl to reflect change _STA return value.
 hw/i386: Fix _STA return value for ACPI0017
 tests/acpi: Allow update of DSDT.cxl
 smmu: Clear SMMUPciBus pointer cache when system reset
 virtio_iommu: Clear IOMMUPciBus pointer cache when system reset
 virtio-gpu: Correct virgl_renderer_resource_get_info() error check
 hw/cxl: Pass CXLComponentState to cache_mem_ops
 hw/cxl/device: read from register values in mdev_reg_read()
 cxl/cdat: Fix header sum value in CDAT checksum
 cxl/cdat: Handle cdat table build errors
 vhost-user.rst: Fix vring address description
 tcg/arm: Fix goto_tb for large translation blocks
 tcg: Increase width of temp_subindex
 hw/net/tulip: add chip status register values
 hw/smbios: Fix port connector option validation
 hw/smbios: Fix OEM strings table option validation
 configure: run plugin TCG tests again
 tests/docker: Add sqlite3 module to openSUSE Leap container
 hw/riscv/virt-acpi-build.c: fix leak in build_rhct()
 migration: Fix logic of channels and transport compatibility check
 virtio-blk: avoid using ioeventfd state in irqfd conditional
 virtio: Re-enable notifications after drain
 virtio-scsi: Attach event vq notifier with no_poll
 iotests: give tempdir an identifying name
 iotests: fix leak of tmpdir in dry-run mode
 hw/scsi/lsi53c895a: add missing decrement of reentrancy counter
 linux-user/aarch64: Add padding before __kernel_rt_sigreturn
 tcg/loongarch64: Set vector registers call clobbered
 pci-host: designware: Limit value range of iATU viewport register
 target/arm: Reinstate "vfp" property on AArch32 CPUs
 qemu-options.hx: Improve -serial option documentation
 system/vl.c: Fix handling of '-serial none -serial something'
 target/arm: fix exception syndrome for AArch32 bkpt insn
 block/blkio: Make s->mem_region_alignment be 64 bits
 qemu-docs: Update options for graphical frontends
 Make 'uri' optional for migrate QAPI
 vfio/pci: Clear MSI-X IRQ index always
 migration: Fix use-after-free of migration state object
 migration: Plug memory leak on HMP migrate error path

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 21:21:17 +01:00
Harsh Prateek Bora
921de6788a ppc/spapr: Initialize max_cpus limit to SPAPR_IRQ_NR_IPIS.
Initialize the machine specific max_cpus limit as per the maximum range
of CPU IPIs available. Keeping between 4096 to 8192 will throw IRQ not
free error due to XIVE/XICS limitation and keeping beyond 8192 will hit
assert in tcg_region_init or spapr_xive_claim_irq.

Logs:

Without patch fix:

[root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
qemu-system-ppc64: IRQ 4096 is not free
[root@host build]#

On LPAR:
[root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
**
ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
(region_size >= 2 * page_size)
Bail out! ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
(region_size >= 2 * page_size)
Aborted (core dumped)
[root@host build]#

On x86:
[root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
qemu-system-ppc64: ../hw/intc/spapr_xive.c:596: spapr_xive_claim_irq:
Assertion `lisn < xive->nr_irqs' failed.
Aborted (core dumped)
[root@host build]#

With patch fix:
[root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
qemu-system-ppc64: Invalid SMP CPUs 4097. The max CPUs supported by
machine 'pseries-8.2' is 4096
[root@host build]#

Reported-by: Kowshik Jois <kowsjois@linux.ibm.com>
Tested-by: Kowshik Jois <kowsjois@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit c4f91d7b7b)
References: bsc#1220310
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:10 +01:00
Harsh Prateek Bora
278898483e ppc/spapr: Introduce SPAPR_IRQ_NR_IPIS to refer IRQ range for CPU IPIs.
spapr_irq_init currently uses existing macro SPAPR_XIRQ_BASE to refer to
the range of CPU IPIs during initialization of nr-irqs property.
It is more appropriate to have its own define which can be further
reused as appropriate for correct interpretation.

Suggested-by: Cedric Le Goater <clg@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Kowshik Jois <kowsjois@linux.ibm.com>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit 2df5c1f5b0)
References: bsc#1220310
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:10 +01:00
25e7bc99cc [openSUSE]: Increase default phys bits to 42, if host supports that
We wanted QEMU to support larger VMs (in therm of RAM size) by default
and we therefore introduced patch "[openSUSE] increase x86_64 physical
bits to 42". This, however, means that we create VMs with 42 bits of
physical address space even on hosts that only has, say, 40. And that
can't work.

In fact, it has been a problem since a long time (e.g., bsc#1205978) and
it's also the actual root cause of bsc#1219977.

Get rid of that old patch, in favor of a new one that still raise the
default number of address bits to 42, but only on hosts that supports
that.

This means that we can also use the proper SeaBIOS version, without
reverting commits that were only a problem due to our broken downstream
patch.

We probably aslo don't need to ship some of the custom ACPI tables (for
passing tests), but we'll actually remove them later, after double
checking properly that all the tests do work.

References: bsc#1205978
References: bsc#1219977
References: bsc#1220799
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:10 +01:00
adabbbb494 [openSUSE][RPM] Cosmetic fixes to spec files (copyright, sorting, etc)
Update the copyright year to 2024, sort dependencies etc.

This way, 'osc' does not have to do these changes all the times (they're
automatic, so no big deal, but it's annoying to see them in the diffs of
all the requests).

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:10 +01:00
c10c66077b [openSUSE] roms/seabios: Drop an old (and no longer necessary) downstream patch
Drop the patch "[openSUSE] build: be explicit about -mx86-used-note=no"
from SeaBIOS.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:10 +01:00
4deb258e2e [openSUSE][RPM] Update to latest stable versio (8.2.1)
Backported commits:
 * Update version for 8.2.1 release
 * target/arm: Fix incorrect aa64_tidcp1 feature check
 * target/arm: Fix A64 scalar SQSHRN and SQRSHRN
 * target/xtensa: fix OOB TLB entry access
 * qtest: bump aspeed_smc-test timeout to 6 minutes
 * monitor: only run coroutine commands in qemu_aio_context
 * iotests: port 141 to Python for reliable QMP testing
 * iotests: add filter_qmp_generated_node_ids()
 * block/blklogwrites: Fix a bug when logging "write zeroes" operations.
 * virtio-net: correctly copy vnet header when flushing TX (bsc#1218484, CVE-2023-6693)
 * tcg/arm: Fix SIGILL in tcg_out_qemu_st_direct
 * linux-user/riscv: Adjust vdso signal frame cfa offsets
 * linux-user: Fixed cpu restore with pc 0 on SIGBUS
 * block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status
 * coroutine-ucontext: Save fake stack for pooled coroutine
 * tcg/s390x: Fix encoding of VRIc, VRSa, VRSc insns
 * accel/tcg: Revert mapping of PCREL translation block to multiple virtual addresses
 * acpi/tests/avocado/bits: wait for 200 seconds for SHUTDOWN event from bits VM
 * s390x/pci: drive ISM reset from subsystem reset
 * s390x/pci: refresh fh before disabling aif
 * s390x/pci: avoid double enable/disable of aif
 * hw/scsi/esp-pci: set DMA_STAT_BCMBLT when BLAST command issued
 * hw/scsi/esp-pci: synchronise setting of DMA_STAT_DONE with ESP completion interrupt
 * hw/scsi/esp-pci: generate PCI interrupt from separate ESP and PCI sources
 * hw/scsi/esp-pci: use correct address register for PCI DMA transfers
 * migration/rdma: define htonll/ntohll only if not predefined
 * hw/pflash: implement update buffer for block writes
 * hw/pflash: use ldn_{be,le}_p and stn_{be,le}_p
 * hw/pflash: refactor pflash_data_write()
 * backends/cryptodev: Do not ignore throttle/backends Errors
 * target/i386: pcrel: store low bits of physical address in data[0]
 * target/i386: fix incorrect EIP in PC-relative translation blocks
 * target/i386: Do not re-compute new pc with CF_PCREL
 * load_elf: fix iterator's type for elf file processing
 * target/hppa: Update SeaBIOS-hppa to version 15
 * target/hppa: Fix IOR and ISR on error in probe
 * target/hppa: Fix IOR and ISR on unaligned access trap
 * target/hppa: Export function hppa_set_ior_and_isr()
 * target/hppa: Avoid accessing %gr0 when raising exception
 * hw/hppa: Move software power button address back into PDC
 * target/hppa: Fix PDC address translation on PA2.0 with PSW.W=0
 * hw/pci-host/astro: Add missing astro & elroy registers for NetBSD
 * hw/hppa/machine: Disable default devices with --nodefaults option
 * hw/hppa/machine: Allow up to 3840 MB total memory
 * readthodocs: fully specify a build environment
 * .gitlab-ci.d/buildtest.yml: Work around htags bug when environment is large
 * target/s390x: Fix LAE setting a wrong access register
 * tests/qtest/virtio-ccw: Fix device presence checking
 * tests/acpi: disallow tests/data/acpi/virt/SSDT.memhp changes
 * tests/acpi: update expected data files
 * edk2: update binaries to git snapshot
 * edk2: update build config, set PcdUninstallMemAttrProtocol = TRUE.
 * edk2: update to git snapshot
 * tests/acpi: allow tests/data/acpi/virt/SSDT.memhp changes
 * util: fix build with musl libc on ppc64le
 * tcg/ppc: Use new registers for LQ destination
 * hw/intc/arm_gicv3_cpuif: handle LPIs in in the list registers
 * hw/vfio: fix iteration over global VFIODevice list
 * vfio/container: Replace basename with g_path_get_basename
 * edu: fix DMA range upper bound check
 * hw/net: cadence_gem: Fix MDIO_OP_xxx values
 * audio/audio.c: remove trailing newline in error_setg
 * chardev/char.c: fix "abstract device type" error message
 * target/riscv: Fix mcycle/minstret increment behavior
 * hw/net/can/sja1000: fix bug for single acceptance filter and standard frame
 * target/i386: the sgx_epc_get_section stub is reachable
 * configure: use a native non-cross compiler for linux-user
 * include/ui/rect.h: fix qemu_rect_init() mis-assignment
 * target/riscv/kvm: do not use non-portable strerrorname_np()
 * iotests: Basic tests for internal snapshots
 * vl: Improve error message for conflicting -incoming and -loadvm
 * block: Fix crash when loading snapshot on inactive node

References: bsc#1218484 (CVE-2023-6693)
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:10 +01:00
7399ffaeab [openSUSE][RPM] factor common definitions between qemu and qemu-linux-user spec files
Simplify both the spec files, by factoring common definitions.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:10 +01:00
942d785c74 [openSUSE][RPM] Install the VGA module "more often" (bsc#1219164)
Depending on the VM configuration (both at the VM definition level and
on the guest itself) a VGA console might be necessary, or weird lockup
will occur. Since the VGA module package is smalle enough, add a
dependency for it, from other display modules, to act as a workaround.

While there, make more explicit and precise the dependencies between all
the various modules, by specifying that they should all have the same
version and release.

References: bsc#1219164
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:10 +01:00
d133e7e5fe [openSUSE][RPM] Create the legacy qemu-kvm symlink for all arches
Historically, KVM was available only for x86 and s390, and was invoked
via a binary called 'kvm' or 'qemu-kvm'. For a while, we've shipped a
package that was making it possible to invoke QEMU like that, but only
for these two arches. This, however, created a lot of confusion and
dependencies issues.

Fix them by creating a symlink from 'qemu-kvm' to the proper binary on
all arches and by making the main QEMU package Providing and Obsoleting
(also on all arches) the old qemu-kvm one.

Note that, for RISCV, the qemu-system-riscv64 binary, to which the symlink
should point, is in the qemu-extra package. However, if we are on RISCV,
qemu-extra is an hard dependency of qemu. Therefore, it's fine to ship
the link and also set the Provides: and Obsoletes: tag in the qemu
package itself. It'd be more correct to do that in the qemu-extra
package, of course, but this would complicate the spec file and it's not
worth it, considering this is all legacy and should very well go away
soon.

References: bsc#1218684
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:10 +01:00
1a0440ded9 [openSUSE][RPM] spec: allow building without spice
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:10 +01:00
39ccf1d7f0 [openSUSE] Update ipxe submodule reference (bsc#1219733, bsc#1219722)
Add to the ipxe submodule the commit (and all its dependencies) for
fixing building with binutils 2.42

References: bsc#1219733
References: bsc#1219722
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:10 +01:00
5aec4c3b9e [openSUSE][RPM] Disable test-crypto-secret in linux-user build 2024-03-14 18:26:10 +01:00
Fabian Vogt
b19143deea [openSUSE][RPM] Fix enabling features on non-x86_64
The %endif was in the wrong place, so on non-x86_64, most features were
disabled.
2024-03-14 18:26:10 +01:00
5224af8952 [openSUSE] Update submodule references for 8.2.0
Point the submodules to the repositories that host our downstream
patches:

* roms/seabios
 - [openSUSE] switch to python3 as needed
 - [openSUSE] build: enable cross compilation on ARM
 - [openSUSE] build: be explicit about -mx86-used-note=no
* roms/SLOF
 - Allow to override build date with SOURCE_DATE_EPOCH
* roms/ipxe
 - [ath5k] Add missing AR5K_EEPROM_READ in ath5k_eeprom_read_turbo_modes
 - [openSUSE] [build] Makefile: fix issues of build reproducibility
 - [openSUSE] [test] help compiler out by initializing array[openSUSE]
 - [openSUSE] [build] Silence GCC 12 spurious warnings
 - [librm] Use explicit operand size when pushing a label address
* roms/skiboot
 - [openSUSE] Makefile: define endianess for cross-building on aarch64
 - [openSUSE] Make Sphinx build reproducible (boo#1102408)
* roms/qboot
 - [openSUSE] add cross.ini file to handle aarch64 based build

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:10 +01:00
f1eb2880a5 [openSUSE][RPM] Update version to 8.2
Update to latest upstream release.

The full list of changes are available at:

  https://wiki.qemu.org/ChangeLog/8.2

Highlights include:
 * New virtio-sound device emulation
 * New virtio-gpu rutabaga device emulation used by Android emulator
 * New hv-balloon for dynamic memory protocol device for Hyper-V guests
 * New Universal Flash Storage device emulation
 * Network Block Device (NBD) 64-bit offsets for improved performance
 * dump-guest-memory now supports the standard kdump format
 * ARM: Xilinx Versal board now models the CFU/CFI, and the TRNG device
 * ARM: CPU emulation support for cortex-a710 and neoverse-n2
 * ARM: architectural feature support for PACQARMA3, EPAC, Pauth2, FPAC,
   FPACCOMBINE, TIDCP1, MOPS, HBC, and HPMN0
 * HPPA: CPU emulation support for 64-bit PA-RISC 2.0
 * HPPA: machine emulation support for C3700, including Astro memory
   controller and four Elroy PCI bridges
 * LoongArch: ISA support for LASX extension and PRELDX instruction
 * LoongArch: CPU emulation support for la132
 * RISC-V: ISA/extension support for AIA virtualization support via KVM,
   and vector cryptographic instructions
 * RISC-V: Numerous extension/instruction cleanups, fixes, and reworks
 * s390x: support for vfio-ap passthrough of crypto adapter for
   protected
   virtualization guests
 * Tricore: support for TC37x CPU which implements ISA v1.6.2
 * Tricore: support for CRCN, FTOU, FTOHP, and HPTOF instructions
 * x86: Zen support for PV console and network devices

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:10 +01:00
Fabiano Rosas
51bf974b6a [openSUSE] Downstream note about the graph lock
In 8.2, the upstream added graph lock annotations that conflict with
our coroutine patches. The GRAPH_RDLOCK_GUARD_MAINLOOP is now asserted
at some places where we turned the code into a coroutine.

My understanding is that the annotation merely asserts that the code
is running in the main loop and serves effectively as a marker of
where the graph lock should be taken if the code were to be moved out
of the main loop. That's exactly what we're doing so it should be safe
to replace the main loop check with an actual rdlock/rdunlock pair.

Commits e7e801a2a7, 94d03a7425 and c89144c058 replace
GRAPH_RDLOCK_GUARD_MAINLOOP() with GRAPH_RDLOCK_GUARD() at the
bdrv_snapshot_list, bdrv_named_nodes_list and qmp_query_block
functions. The changes are split into those patches instead of here to
preserve bisectability.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-03-14 18:26:10 +01:00
João Silva
a0aa455b00 [openSUSE] block: Add a thread-pool version of fstat (bsc#1211000)
The fstat call can take a long time to finish when running over
NFS. Add a version of it that runs in the thread pool.

Adapt one of its users, raw_co_get_allocated_file size to use the new
version. That function is called via QMP under the qemu_global_mutex
so it has a large chance of blocking VCPU threads in case it takes too
long to finish.

Signed-off-by: João Silva <jsilva@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Claudio Fontana <cfontana@suse.de>
References: bsc#1211000
Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-03-14 18:26:10 +01:00
Fabiano Rosas
bee1f08f69 [openSUSE] block: Convert qmp_query_block() to coroutine_fn (bsc#1211000)
This is another caller of bdrv_get_allocated_file_size() that needs to
be converted to a coroutine because that function will be made
asynchronous when called (indirectly) from the QMP dispatcher.

This QMP command is a candidate because it calls bdrv_do_query_node_info(),
which in turn calls bdrv_get_allocated_file_size().

We've determined bdrv_do_query_node_info() to be coroutine-safe (see
previous commits), so we can just put this QMP command in a coroutine.

Since qmp_query_block() now expects to run in a coroutine, its callers
need to be converted as well. Convert hmp_info_block(), which calls
only coroutine-safe code, including qmp_query_named_block_nodes()
which has been converted to coroutine in the previous patches.

Now that all callers of bdrv_[co_]block_device_info() are using the
coroutine version, a few things happen:

 - we can return to using bdrv_block_device_info() without a wrapper;

 - bdrv_get_allocated_file_size() can stop being mixed;

 - bdrv_co_get_allocated_file_size() needs to be put under the graph
   lock because it is being called wthout the wrapper;

 - bdrv_do_query_node_info() doesn't need to acquire the AioContext
   because it doesn't call aio_poll anymore;

Signed-off-by: Fabiano Rosas <farosas@suse.de>
References: bsc#1211000
[relax main loop requirement at qmp_query_block]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-03-14 18:26:10 +01:00
Fabiano Rosas
fdf023d52d [openSUSE] block: Don't query all block devices at hmp_nbd_server_start (bsc#1211000)
We're currently doing a full query-block just to enumerate the devices
for qmp_nbd_server_add and then discarding the BlockInfoList
afterwards. Alter hmp_nbd_server_start to instead iterate explicitly
over the block_backends list.

This allows the removal of the dependency on qmp_query_block from
hmp_nbd_server_start. This is desirable because we're about to move
qmp_query_block into a coroutine and don't need to change the NBD code
at the same time.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
References: bsc#1211000
Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-03-14 18:26:10 +01:00
84e049d7ae [openSUSE] block: Convert qmp_query_named_block_nodes to coroutine (bsc#1211000)
We're converting callers of bdrv_get_allocated_file_size() to run in
coroutines because that function will be made asynchronous when called
(indirectly) from the QMP dispatcher.

This QMP command is a candidate because it indirectly calls
bdrv_get_allocated_file_size() through bdrv_block_device_info() ->
bdrv_query_image_info() -> bdrv_query_image_info().

The previous patches have determined that bdrv_query_image_info() and
bdrv_do_query_node_info() are coroutine-safe so we can just make the
QMP command run in a coroutine.

Signed-off-by: Lin Ma <lma@suse.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
References: bsc#1211000
[relax the main loop requirement at bdrv_named_nodes_list]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-03-14 18:26:10 +01:00
Fabiano Rosas
52015d0078 [openSUSE] block: Convert bdrv_block_device_info into co_wrapper (bsc#1211000)
We're converting callers of bdrv_get_allocated_file_size() to run in
coroutines because that function will be made asynchronous when called
(indirectly) from the QMP dispatcher.

This function is a candidate because it calls bdrv_query_image_info()
-> bdrv_do_query_node_info() -> bdrv_get_allocated_file_size().

It is safe to turn this is a coroutine because the code it calls is
made up of either simple accessors and string manipulation functions
[1] or it has already been determined to be safe [2].

1) bdrv_refresh_filename(), bdrv_is_read_only(),
   blk_enable_write_cache(), bdrv_cow_bs(), blk_get_public(),
   throttle_group_get_name(), bdrv_write_threshold_get(),
   bdrv_query_dirty_bitmaps(), throttle_group_get_config(),
   bdrv_filter_or_cow_bs(), bdrv_skip_implicit_filters()

2) bdrv_do_query_node_info() (see previous commit);

Signed-off-by: Fabiano Rosas <farosas@suse.de>
References: bsc#1211000
Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-03-14 18:26:10 +01:00
Fabiano Rosas
9eef8908d2 [openSUSE] block: Convert bdrv_query_block_graph_info to coroutine (bsc#1211000)
We're converting callers of bdrv_get_allocated_file_size() to run in
coroutines because that function will be made asynchronous when called
(indirectly) from the QMP dispatcher.

This function is a candidate because it calls bdrv_do_query_node_info(),
which in turn calls bdrv_get_allocated_file_size().

All the functions called from bdrv_do_query_node_info() onwards are
coroutine-safe, either have a coroutine version themselves[1] or are
mostly simple code/string manipulation[2].

1) bdrv_getlength(), bdrv_get_allocated_file_size(), bdrv_get_info(),
   bdrv_get_specific_info();

2) bdrv_refresh_filename(), bdrv_get_format_name(),
   bdrv_get_full_backing_filename(), bdrv_query_snapshot_info_list();

Signed-off-by: Fabiano Rosas <farosas@suse.de>
References: bsc#1211000
[relax the main loop requirement at bdrv_snapshot_list]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-03-14 18:26:10 +01:00
Fabiano Rosas
ef1a23f0a5 [openSUSE] block: Temporarily mark bdrv_co_get_allocated_file_size as mixed (bsc#1211000)
Some callers of this function are about to be converted to run in
coroutines, so allow it to be executed both inside and outside a
coroutine while we convert all the callers.

This will be reverted once all callers of bdrv_do_query_node_info run
in a coroutine.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
References: bsc#1211000
Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-03-14 18:26:10 +01:00
Fabiano Rosas
b29e7f9aeb [openSUSE] block: Allow the wrapper script to see functions declared in qapi.h (bsc#1211000)
The following patches will add co_wrapper annotations to functions
declared in qapi.h. Add that header to the set of files used by
block-coroutine-wrapper.py.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
References: bsc#1211000
Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-03-14 18:26:10 +01:00
83389eed7e [openSUSE][RPM] Restrict canokey to openSUSE only
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:10 +01:00
86ffcc5e3a [openSUSE][RPM] Fix virtiofsd dependency on 32 bit systems
And make the switch more general, as we now have multiple
instances of it.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
Ludwig Nussel
bfa9615104 [openSUSE][RPM] Add support for canokeys (boo#1217520) 2024-03-14 18:26:09 +01:00
ab15e51a01 [openSUSE][RPM] Disable Xen support in ALP-based distros
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
06877cdc3d [openSUSE][RPM] Some more refinements of inter-subpackage dependencies
Add some block drivers and virtiofsd as hard dependencies of the
qemu-headless package, to make sure it's really useful for headless
server environments (even when recommended packages are not installed).

Singed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
cd664a9abb [openSUSE][RPM] Normalize hostname, for reproducible builds
Use a fixed USER value (in case someone builds outside of OBS/osc).

References: boo#1084909
Signed-off-by: Bernhard M. Wiedemann <githubbmwprimary@lsmod.de>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
33d19cf4e1 [openSUSE][RPM] New subpackage, for SPICE
Define a new sub-(meta-)package that can be installed for having
all the other modules and packages necessary for SPICE to work.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
37cc99737a [openSUSE] Update version to 8.1.3
Align to upstream stable release. It includes many of the patches we had
backported ourself, to fix bugs and issues, plus more.

See here for details:
- https://lore.kernel.org/qemu-devel/1700589639.257680.3420728.nullmailer@tls.msk.ru/
- https://gitlab.com/qemu-project/qemu/-/commits/stable-8.1?ref_type=heads

An (incomplete!) list of such backports is:
 * Update version for 8.1.3 release
 * hw/mips: LOONGSON3V depends on UNIMP device
 * target/arm: HVC at EL3 should go to EL3, not EL2
 * s390x/pci: only limit DMA aperture if vfio DMA limit reported
 * target/riscv/kvm: support KVM_GET_REG_LIST
 * target/riscv/kvm: improve 'init_multiext_cfg' error msg
 * tracetool: avoid invalid escape in Python string
 * tests/tcg/s390x: Test LAALG with negative cc_src
 * target/s390x: Fix LAALG not updating cc_src
 * tests/tcg/s390x: Test CLC with inaccessible second operand
 * target/s390x: Fix CLC corrupting cc_src
 * tests/qtest: ahci-test: add test exposing reset issue with pending callback
 * hw/ide: reset: cancel async DMA operation before resetting state
 * target/mips: Fix TX79 LQ/SQ opcodes
 * target/mips: Fix MSA BZ/BNZ opcodes displacement
 * ui/gtk-egl: apply scale factor when calculating window's dimension
 * ui/gtk: force realization of drawing area
 * ati-vga: Implement fallback for pixman routines
 * ...

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
31572a8b57 [openSUSE] Make Sphinx build reproducible (boo#1102408)
Avoid parallel processing in sphinx because that causes variations in
generated files

This is addressed here, with a downstream patch, until a proper solution
is found upstream.

Signed-off-by: Bernhard Wiedemann <bwiedemann@suse.com>
References: boo#1102408
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
caaba8677b [openSUSE] supportconfig: Adapt plugin to modern supportconfig
The supportconfig 'scplugin.rc' file is deprecated in favor of
supportconfig.rc'. Adapt the qemu plugin to the new scheme.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
5d3d60ccc5 [openSUSE] Add -p1 to autosetup in spec files
Our workflow does not include patches in the spec files. Still, it could
be useful to add some there, during development and/or debugging issues.

Make sure that they are applied properly, by adding -p1 to the
%autosetup directive (it's a nop if there are no patches, so both cases
are ok).

Suggested-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
71c72823d0 [openSUSE] Update version to 8.1.2
This fixes the following upstream issues:
 * https://gitlab.com/qemu-project/qemu/-/issues/1826
 * https://gitlab.com/qemu-project/qemu/-/issues/1834
 * https://gitlab.com/qemu-project/qemu/-/issues/1846

It also contains a fix for:
 * CVE-2023-42467 (bsc#1215192)

As well as several upstream backports:
 * target/riscv: Fix vfwmaccbf16.vf
 * disas/riscv: Fix the typo of inverted order of pmpaddr13 and pmpaddr14
 * roms: use PYTHON to invoke python
 * hw/audio/es1370: reset current sample counter
 * migration/qmp: Fix crash on setting tls-authz with null
 * util/log: re-allow switching away from stderr log file
 * vfio/display: Fix missing update to set backing fields
 * amd_iommu: Fix APIC address check
 * vdpa net: follow VirtIO initialization properly at cvq isolation probing
 * vdpa net: stop probing if cannot set features
 * vdpa net: fix error message setting virtio status
 * vdpa net: zero vhost_vdpa iova_tree pointer at cleanup
 * linux-user/hppa: Fix struct target_sigcontext layout
 * chardev/char-pty: Avoid losing bytes when the other side just (re-)connected
 * hw/display/ramfb: plug slight guest-triggerable leak on mode setting
 * win32: avoid discarding the exception handler
 * target/i386: fix memory operand size for CVTPS2PD
 * target/i386: generalize operand size "ph" for use in CVTPS2PD
 * subprojects/berkeley-testfloat-3: Update to fix a problem with compiler warnings
 * scsi-disk: ensure that FORMAT UNIT commands are terminated
 * esp: restrict non-DMA transfer length to that of available data
 * esp: use correct type for esp_dma_enable() in sysbus_esp_gpio_demux()
 * optionrom: Remove build-id section
 * target/tricore: Fix RCPW/RRPW_INSERT insns for width = 0
 * accel/tcg: Always require can_do_io
 * accel/tcg: Always set CF_LAST_IO with CF_NOIRQ
 * accel/tcg: Improve setting of can_do_io at start of TB
 * accel/tcg: Track current value of can_do_io in the TB
 * accel/tcg: Hoist CF_MEMI_ONLY check outside translation loop
 * accel/tcg: Avoid load of icount_decr if unused
 * softmmu: Use async_run_on_cpu in tcg_commit
 * migration: Move return path cleanup to main migration thread
 * migration: Replace the return path retry logic
 * migration: Consolidate return path closing code
 * migration: Remove redundant cleanup of postcopy_qemufile_src
 * migration: Fix possible race when shutting down to_dst_file
 * migration: Fix possible races when shutting down the return path
 * migration: Fix possible race when setting rp_state.error
 * migration: Fix race that dest preempt thread close too early
 * ui/vnc: fix handling of VNC_FEATURE_XVP
 * ui/vnc: fix debug output for invalid audio message
 * hw/scsi/scsi-disk: Disallow block sizes smaller than 512 [CVE-2023-42467]
 * accel/tcg: mttcg remove false-negative halted assertion
 * meson.build: Make keyutils independent from keyring
 * target/arm: Don't skip MTE checks for LDRT/STRT at EL0
 * hw/arm/boot: Set SCR_EL3.FGTEn when booting kernel
 * include/exec: Widen tlb_hit/tlb_hit_page()
 * tests/file-io-error: New test
 * file-posix: Simplify raw_co_prw's 'out' zone code
 * file-posix: Fix zone update in I/O error path
 * file-posix: Check bs->bl.zoned for zone info
 * file-posix: Clear bs->bl.zoned on error
 * hw/cxl: Fix out of bound array access
 * hw/cxl: Fix CFMW config memory leak
 * linux-user/hppa: lock both words of function descriptor
 * linux-user/hppa: clear the PSW 'N' bit when delivering signals
 * hw/ppc: Read time only once to perform decrementer write
 * hw/ppc: Reset timebase facilities on machine reset
 * hw/ppc: Always store the decrementer value
 * target/ppc: Sign-extend large decrementer to 64-bits
 * hw/ppc: Avoid decrementer rounding errors
 * hw/ppc: Round up the decrementer interval when converting to ns
 * host-utils: Add muldiv64_round_up

Signed-of-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
ed0d90bb5f [openSUSE] Update to version 8.1.1
This includes the following commits:

 * tpm: fix crash when FD >= 1024 and unnecessary errors due to EINTR (Marc-André Lureau)
 * meson: Fix targetos match for illumos and Solaris. (Jonathan Perkin)
 * s390x/ap: fix missing subsystem reset registration (Janosch Frank)
 * ui: fix crash when there are no active_console (Marc-André Lureau)
 * virtio-gpu/win32: set the destroy function on load (Marc-André Lureau)
 * target/riscv: Allocate itrigger timers only once (Akihiko Odaki)
 * target/riscv/pmp.c: respect mseccfg.RLB for pmpaddrX changes (Leon Schuermann)
 * target/riscv: fix satp_mode_finalize() when satp_mode.supported = 0 (Daniel Henrique Barboza)
 * hw/riscv: virt: Fix riscv,pmu DT node path (Conor Dooley)
 * linux-user/riscv: Use abi type for target_ucontext (LIU Zhiwei)
 * hw/intc: Make rtc variable names consistent (Jason Chien)
 * hw/intc: Fix upper/lower mtime write calculation (Jason Chien)
 * target/riscv: Fix zfa fleq.d and fltq.d (LIU Zhiwei)
 * target/riscv: Fix page_check_range use in fault-only-first (LIU Zhiwei)
 * target/riscv/cpu.c: add zmmul isa string (Daniel Henrique Barboza)
 * hw/char/riscv_htif: Fix the console syscall on big endian hosts (Thomas Huth)
 * hw/char/riscv_htif: Fix printing of console characters on big endian hosts (Thomas Huth)
 * arm64: Restore trapless ptimer access (Colton Lewis)
 * virtio: Drop out of coroutine context in virtio_load() (Kevin Wolf)
 * qxl: don't assert() if device isn't yet initialized (Marc-André Lureau)
 * hw/net/vmxnet3: Fix guest-triggerable assert() (Thomas Huth)
 * docs tests: Fix use of migrate_set_parameter (Markus Armbruster)
 * qemu-options.hx: Rephrase the descriptions of the -hd* and -cdrom options (Thomas Huth)
 * hw/i2c/aspeed: Fix TXBUF transmission start position error (Hang Yu)
 * hw/i2c/aspeed: Fix Tx count and Rx size error in buffer pool mode (Hang Yu)
 * hw/ide/ahci: fix broken SError handling (Niklas Cassel)
 * hw/ide/ahci: fix ahci_write_fis_sdb() (Niklas Cassel)
 * hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set (Niklas Cassel)
 * hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared (Niklas Cassel)
 * hw/ide/ahci: simplify and document PxCI handling (Niklas Cassel)
 * hw/ide/ahci: write D2H FIS when processing NCQ command (Niklas Cassel)
 * hw/ide/core: set ERR_STAT in unsupported command completion (Niklas Cassel)
 * target/ppc: Fix LQ, STQ register-pair order for big-endian (Nicholas Piggin)
 * target/ppc: Flush inputs to zero with NJ in ppc_store_vscr (Richard Henderson)
 * hw/ppc/e500: fix broken snapshot replay (Maksim Kostin)
 * ppc/vof: Fix missed fields in VOF cleanup (Nicholas Piggin)
 * ui/dbus: Properly dispose touch/mouse dbus objects (Bilal Elmoussaoui)
 * target/i386: raise FERR interrupt with iothread locked (Paolo Bonzini)
 * linux-user: Adjust brk for load_bias (Richard Henderson)
 * target/arm: properly document FEAT_CRC32 (Alex Bennée)
 * block-migration: Ensure we don't crash during migration cleanup (Fabiano Rosas)
 * softmmu: Assert data in bounds in iotlb_to_section (Richard Henderson)
 * docs/about/license: Update LICENSE URL (Philippe Mathieu-Daudé)
 * target/arm: Fix 64-bit SSRA (Richard Henderson)
 * target/arm: Fix SME ST1Q (Richard Henderson)
 * accel/kvm: Specify default IPA size for arm64 (Akihiko Odaki)
 * kvm: Introduce kvm_arch_get_default_type hook (Akihiko Odaki)
 * include/hw/virtio/virtio-gpu: Fix virtio-gpu with blob on big endian hosts (Thomas Huth)
 * target/s390x: Check reserved bits of VFMIN/VFMAX's M5 (Ilya Leoshkevich)
 * target/s390x: Fix VSTL with a large length (Ilya Leoshkevich)
 * target/s390x: Use a 16-bit immediate in VREP (Ilya Leoshkevich)
 * target/s390x: Fix the "ignored match" case in VSTRS (Ilya Leoshkevich)

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
5a27b86dc6 [openSUSE][RPM] spec: enable the Pipewire audio backend (bsc#1215486)
Enable the Pipewire audio backend (available since 8.1), in the
appropriate subpackage.

References: bsc#1215486
Signed-off-by: Dario Faggioli
2024-03-14 18:26:09 +01:00
b7d4f949b3 [openSUSE][RPM] Use discount instead of perl-Text-Markdown
perl-Text-Markdown is not always available (e.g., in SLE/Leap).
Use discount instead, as the provider of the 'markdown' binary.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
fbc4756925 [openSUSE][RPM] Transform meson subproject in git submodules
OBS SCM bridge can handle git submodule, while it can't handle (yet?)
meson subprojects. The (ugly, I know!) solution, for now, is to turn
the latter into the former, with commands like the followings:

git submodule add -f https://gitlab.com/qemu-project/berkeley-testfloat-3 subprojects/berkeley-testfloat-3
git -C subprojects/berkeley-testfloat-3 reset --hard 40619cbb3bf32872df8c53cc457039229428a263

(the hash used comes from the subprojects/berkeley-testfloat-3.wrap file)

It's also necessary to manually apply the layering of the packagefiles,
and that is done in the specfile.

Longer term and better solutions could be:
- Make SCM support meson subprojects
- Create standalone packages for the subprojects (and instruct
  QEMU to pick stuff from there)

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
527a5d3ed7 [openSUSE][RPM] Update to version 8.1.0
Full list of changes are available at:

  https://wiki.qemu.org/ChangeLog/8.1

Highlights:
 * VFIO: improved live migration support, no longer an experimental feature
 * GTK GUI now supports multi-touch events
 * ARM, PowerPC, and RISC-V can now use AES acceleration on host processor
 * PCIe: new QMP commands to inject CXL General Media events, DRAM
   events and Memory Module events
 * ARM: KVM VMs on a host which supports MTE (the Memory Tagging Extension)
   can now use MTE in the guest
 * ARM: emulation support for bpim2u (Banana Pi BPI-M2 Ultra) board and
   neoverse-v1 (Cortex Neoverse-V1) CPU
 * ARM: new architectural feature support for: FEAT_PAN3 (SCTLR_ELx.EPAN),
   FEAT_LSE2 (Large System Extensions v2), and experimental support for
   FEAT_RME (Realm Management Extensions)
 * Hexagon: new instruction support for v68/v73 scalar, and v68/v69 HVX
 * Hexagon: gdbstub support for HVX
 * MIPS: emulation support for Ingenic XBurstR1/XBurstR2 CPUs, and MXU
   instructions
 * PowerPC: TCG SMT support, allowing pseries and powernv to run with up
   to 8 threads per core
 * PowerPC: emulation support for Power9 DD2.2 CPU model, and perf
   sampling support for POWER CPUs
 * RISC-V: ISA extension support for BF16/Zfa, and disassembly support
   for Zcm*/Z*inx/XVentanaCondOps/Xthead
 * RISC-V: CPU emulation support for Veyron V1
 * RISC-V: numerous KVM/emulation fixes and enhancements
 * s390: instruction emulation fixes for LDER, LCBB, LOCFHR, MXDB, MXDBR,
   EPSW, MDEB, MDEBR, MVCRL, LRA, CKSM, CLM, ICM, MC, STIDP, EXECUTE, and
   CLGEBR(A)
 * SPARC: updated target/sparc to use tcg_gen_lookup_and_goto_ptr() for
   improved performance
 * Tricore: emulation support for TC37x CPU that supports ISA v1.6.2
   instructions
 * Tricore: instruction emulation of POPCNT.W, LHA, CRC32L.W, CRC32.B,
   SHUFFLE, SYSCALL, and DISABLE
 * x86: CPU model support for GraniteRapids
 * and lots more...

This also (automatically) fixes:
 - bsc#1212850 (CVE-2023-3354)
 - bsc#1213001 (CVE-2023-3255)
 - bsc#1213925 (CVE-2023-3180)
 - bsc#1213414 (CVE-2023-3301)
 - bsc#1207205 (CVE-2023-0330)
 - bsc#1212968 (CVE-2023-2861)
 - bsc#1179993, bsc#1181740, bsc#1211697

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
e4a5f14170 [openSUSE][RPM] Use --preserve-argv0 in qemu-linux-user (boo#1197298, bsc#1212768)
By default try to preserve argv[0].

Original report is boo#1197298, which also became relevant recently again in bsc#1212768.

Signed-off-by: Fabian Vogt <fabian@ritter-vogt.de>
References: boo#1197298
References: bsc#1212768
Signed-off-by: Fabian Vogt <fabian@ritter-vogt.de>
2024-03-14 18:26:09 +01:00
39ef4b454c [openSUSE][RPM] Split qemu-tools package (#31)
Create separate packages for qemu-img and qemu-pr-helper.

Signed-off-by: Vasiliy Ulyanov <vulyanov@suse.de>
Co-authored-by: Vasiliy Ulyanov <vulyanov@suse.de>
2024-03-14 18:26:09 +01:00
3f35213205 [openSUSE][RPM] Fix deps for virtiofsd and improve spec files
Address the comments from Factory Submission
https://build.opensuse.org/request/show/1088674?notification_id=40890530:
- remove the various '%defattr()'
- make sure that we depend on virtiofsd only on arch-es
  where it can actually be built

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
85343d8ccd [openSUSE][RPM] spec: require virtiofsd, now that it is a sep package (#27)
Since version 8.0.0, virtiofsd is not part of QEMU sources any longer.
We therefore have also moved it to a separate package. To retain
compatibility and consistency of behavior, require such a package as an
hard dependency.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
150b5e071c [openSUSE][RPM] Try to avoid recommending too many packages (bsc#1205680)
For example, let's try to avoid recommending GUI UI stuff, unless GTK is
already installed. This way we avoid things like bringing in an entire
graphic stack on servers.

References: bsc#1205680
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
06617e52e2 [openSUSE][RPM] Move documentation to a subpackage and fix qemu-headless (bsc#1209629)
- The qemu-headless subpackage was defined but never build, because it
  had no files. Fix that by putting there just a simple README.

- Move the docs in a dedicated subpackage

Resolves: bsc#1209629
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
Gerd Hoffmann
890a31ceaa [openSUSE] roms: add back edk2-basetools target
The efi nic boot rom builds depend on this, they need the
EfiRom utility from edk2 BaseTools.

Fixes: 22e11539e1 ("edk2: replace build scripts")
Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
References: https://lore.kernel.org/qemu-devel/20230411101709.445259-1-kraxel@redhat.com/
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
40eb837e9b [openSUSE][OBS] Limit the workflow runs to the factory branch (#25)
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
067e6183b0 [openSUSE] pc: q35: Allow 1024 cpus for old machine types (bsc#1202282, jsc#PED-2592)
In SUSE/openSUSE, we bumped up the number of maximum vcpus since
machine type q35-7.1. Make sure that this continue to be true, for
backward compatibility.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
References: https://lore.kernel.org/qemu-devel/166876173513.24238.8968021290016401421.stgit@tumbleweed.Wayrath/
References: bsc#1202282, jsc#PED-2592
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
59d94e7ffc [openSUSE] meson: remove $pkgversion from CONFIG_STAMP input to broaden compatibility
As part of the effort to close the gap with Leap I think we are fine
removing the $pkgversion component to creating a unique CONFIG_STAMP.
This stamp is only used in creating a unique symbol used in ensuring the
dynamically loaded modules correspond correctly to the loading qemu.
The default inputs to producing this unique symbol are somewhat reasonable
as a generic mechanism, but specific packaging and maintenance practices
might require the default to be modified for best use. This is an example
of that.

Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
Bruce Rogers
386d4950ea [openSUSE] meson: install ivshmem-client and ivshmem-server
Turn on the meson install flag for these executables

Signed-off-by: Bruce Rogers <brogers@suse.com>
2024-03-14 18:26:09 +01:00
Bruce Rogers
63428b19f0 [openSUSE] Make installed scripts explicitly python3 (bsc#1077564)
We want to explicitly reference python3 in the scripts we install.

References: bsc#1077564
Signed-off-by: Bruce Rogers <brogers@suse.com>
2024-03-14 18:26:09 +01:00
a8776466de [openSUSE] Disable some tests that have problems in OBS
We are disabling the following tests:

qemu-system-ppc64 / display-vga-test

They are failing due to some memory corruption errors. We believe that
this might be due to the combination of the compiler version and of LTO,
and will take up the investigation within the upstream community.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
Bruce Rogers
23964dfca0 [openSUSE] tests/qemu-iotests: Triple timeout of i/o tests due to obs environment
Executing tests in obs is very fickle, since you aren't guaranteed
reliable cpu time. Triple the timeout for each test to help ensure
we don't fail a test because the stars align against us.

Signed-off-by: Bruce Rogers <brogers@suse.com>
[DF: Small tweaks necessary for rebasing on top of 6.2.0]
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
Bruce Rogers
3724ff19db [openSUSE] tests: change error message in test 162
Since we have a quite restricted execution environment, as far as
networking is concerned, we need to change the error message we expect
in test 162. There is actually no routing set up so the error we get is
"Network is unreachable". Change the expected output accordingly.

Signed-off-by: Bruce Rogers <brogers@suse.com>
2024-03-14 18:26:09 +01:00
9f24a3c6cc [openSUSE] Revert "tests/qtest: enable more vhost-user tests by default"
Revert commit "tests/qtest: enable more vhost-user tests by default"
(8dcb404bff), as it causes prooblem when building with GCC 12 and LTO
enabled.

This should be considered temporary, until the actual reason why the
code of the tests that are added in that commit breaks.

It has been reported upstream, and will be (hopefully) solved there:
https://lore.kernel.org/qemu-devel/1d3bbff9e92e7c8a24db9e140dcf3f428c2df103.camel@suse.com/

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
Hannes Reinecke
f30c826058 [openSUSE] scsi-generic: check for additional SG_IO status on completion (bsc#1178049)
SG_IO may return additional status in the 'status', 'driver_status',
and 'host_status' fields. When either of these fields are set the
command has not been executed normally, so we should not continue
processing this command but rather return an error.
scsi_read_complete() already checks for these errors,
scsi_write_complete() does not.

References: bsc#1178049
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Lin Ma <lma@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
Mauro Matteo Cascella
9b8a507ab1 [openSUSE] hw/scsi/megasas: check for NULL frame in megasas_command_cancelled() (bsc#1180432, CVE-2020-35503)
Ensure that 'cmd->frame' is not NULL before accessing the 'header' field.
This check prevents a potential NULL pointer dereference issue.

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1910346
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Reported-by: Cheolwoo Myung <cwmyung@snu.ac.kr>
References: bsc#1180432, CVE-2020-35503
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
766dcf5a0b [openSUSE] scsi-generic: replace logical block count of response of READ CAPACITY (SLE-20965)
While using SCSI passthrough, Following scenario makes qemu doesn't
realized the capacity change of remote scsi target:
1. online resize the scsi target.
2. issue 'rescan-scsi-bus.sh -s ...' in host.
3. issue 'rescan-scsi-bus.sh -s ...' in vm.

In above scenario I used to experienced errors while accessing the
additional disk space in vm. I think the reasonable operations should
be:
1. online resize the scsi target.
2. issue 'rescan-scsi-bus.sh -s ...' in host.
3. issue 'block_resize' via qmp to notify qemu.
4. issue 'rescan-scsi-bus.sh -s ...' in vm.

The errors disappear once I notify qemu by block_resize via qmp.

So this patch replaces the number of logical blocks of READ CAPACITY
response from scsi target by qemu's bs->total_sectors. If the user in
vm wants to access the additional disk space, The administrator of
host must notify qemu once resizeing the scsi target.

Bonus is that domblkinfo of libvirt can reflect the consistent capacity
information between host and vm in case of missing block_resize in qemu.
E.g:
...
    <disk type='block' device='lun'>
      <driver name='qemu' type='raw'/>
      <source dev='/dev/sdc' index='1'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
...

Before:
1. online resize the scsi target.
2. host:~  # rescan-scsi-bus.sh -s /dev/sdc
3. guest:~ # rescan-scsi-bus.sh -s /dev/sda
4  host:~  # virsh domblkinfo --domain $DOMAIN --human --device sda
Capacity:       4.000 GiB
Allocation:     0.000 B
Physical:       8.000 GiB

5. guest:~ # lsblk /dev/sda
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda      8:0    0   8G  0 disk
└─sda1   8:1    0   2G  0 part

After:
1. online resize the scsi target.
2. host:~  # rescan-scsi-bus.sh -s /dev/sdc
3. guest:~ # rescan-scsi-bus.sh -s /dev/sda
4  host:~  # virsh domblkinfo --domain $DOMAIN --human --device sda
Capacity:       4.000 GiB
Allocation:     0.000 B
Physical:       8.000 GiB

5. guest:~ # lsblk /dev/sda
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda      8:0    0   4G  0 disk
└─sda1   8:1    0   2G  0 part

References: [SUSE-JIRA] (SLE-20965)
Signed-off-by: Lin Ma <lma@suse.com>
2024-03-14 18:26:09 +01:00
Olaf Hering
81ccb369aa [openSUSE] xen: ignore live parameter from xen-save-devices-state (bsc#1079730, bsc#1101982, bsc#106399)
The final step of xl migrate|save for an HVM domU is saving the state of
qemu. This also involves releasing all block devices. While releasing
backends ought to be a separate step, such functionality is not
implemented.

Unfortunately, releasing the block devices depends on the optional
'live' option. This breaks offline migration with 'virsh migrate domU
dom0' because the sending side does not release the disks, as a result
the receiving side can not properly claim write access to the disks.

As a minimal fix, remove the dependency on the 'live' option. Upstream
may fix this in a different way, like removing the newly added 'live'
parameter entirely.

Fixes: 5d6c599fe1 ("migration, xen: Fix block image lock issue on live migration")

Signed-off-by: Olaf Hering <olaf@aepfle.de>
References: bsc#1079730, bsc#1101982, bsc#1063993
Signed-off-by: Bruce Rogers <brogers@suse.com>
2024-03-14 18:26:09 +01:00
Bruce Rogers
f9f3c4deda [openSUSE] xen: add block resize support for xen disks
Provide monitor naming of xen disks, and plumb guest driver
notification through xenstore of resizing instigated via the
monitor.

[BR: minor edits to pass qemu's checkpatch script]
[BR: significant rework needed due to upstream xen disk qdevification]
[BR: At this point, monitor_add_blk call is all we need to add!]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2024-03-14 18:26:09 +01:00
Bruce Rogers
3f367b50a9 [openSUSE] xen_disk: Add suse specific flush disable handling and map to QEMU equiv (bsc#879425)
Add code to read the suse specific suse-diskcache-disable-flush flag out
of xenstore, and set the equivalent flag within QEMU.

Patch taken from Xen's patch queue, Olaf Hering being the original author.
[bsc#879425]

[BR: minor edits to pass qemu's checkpatch script]
[BR: With qdevification of xen-block, code has changed significantly]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Olaf Hering <olaf@aepfle.de>
2024-03-14 18:26:09 +01:00
Andreas Färber
69dc441378 [openSUSE] Raise soft address space limit to hard limit
For SLES we want users to be able to use large memory configurations
with KVM without fiddling with ulimit -Sv.

Signed-off-by: Andreas Färber <afaerber@suse.de>
[BR: add include for sys/resource.h]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2024-03-14 18:26:09 +01:00
Bruce Rogers
396fb12516 [openSUSE] qemu-bridge-helper: reduce security profile (boo#988279)
Change from using glib alloc and free routines to those
from libc. Also perform safety measure of dropping privs
to user if configured no-caps.

References: boo#988279
Signed-off-by: Bruce Rogers <brogers@suse.com>
[AF: Rebased for v2.7.0-rc2]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2024-03-14 18:26:09 +01:00
Alexander Graf
c3414ebae7 [openSUSE] Make char muxer more robust wrt small FIFOs
Virtio-Console can only process one character at a time. Using it on S390
gave me strange "lags" where I got the character I pressed before when
pressing one. So I typed in "abc" and only received "a", then pressed "d"
but the guest received "b" and so on.

While the stdio driver calls a poll function that just processes on its
queue in case virtio-console can't take multiple characters at once, the
muxer does not have such callbacks, so it can't empty its queue.

To work around that limitation, I introduced a new timer that only gets
active when the guest can not receive any more characters. In that case
it polls again after a while to check if the guest is now receiving input.

This patch fixes input when using -nographic on s390 for me.

[AF: Rebased for v2.7.0-rc2]
[BR: minor edits to pass qemu's checkpatch script]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2024-03-14 18:26:09 +01:00
Alexander Graf
ae5b115e1a [openSUSE] PPC: KVM: Disable mmu notifier check
When using hugetlbfs (which is required for HV mode KVM on 970), we
check for MMU notifiers that on 970 can not be implemented properly.

So disable the check for mmu notifiers on PowerPC guests, making
KVM guests work there, even if possibly racy in some odd circumstances.

Signed-off-by: Bruce Rogers <brogers@suse.com>
2024-03-14 18:26:09 +01:00
Alexander Graf
4f1e250c0d [openSUSE] linux-user: lseek: explicitly cast non-set offsets to signed
When doing lseek, SEEK_SET indicates that the offset is an unsigned variable.
Other seek types have parameters that can be negative.

When converting from 32bit to 64bit parameters, we need to take this into
account and enable SEEK_END and SEEK_CUR to be negative, while SEEK_SET stays
absolute positioned which we need to maintain as unsigned.

Signed-off-by: Alexander Graf <agraf@suse.de>
2024-03-14 18:26:09 +01:00
Alexander Graf
635dd43cf0 [openSUSE] linux-user: use target_ulong
Linux syscalls pass pointers or data length or other information of that sort
to the kernel. This is all stuff you don't want to have sign extended.
Otherwise a host 64bit variable parameter with a size parameter will extend
it to a negative number, breaking lseek for example.

Pass syscall arguments as ulong always.

Signed-off-by: Alexander Graf <agraf@suse.de>
[JRZ: changes from linux-user/qemu.h wass moved to linux-user/user-internals.h]
Signed-off-by: Jose R Ziviani <jziviani@suse.de>
[DF: Forward port, i.e., use ulong for do_prctl too]
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
Andreas Färber
a4b62322d1 [openSUSE] qemu-binfmt-conf: Modify default path
Change QEMU_PATH from /usr/local/bin to /usr/bin prefix.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2024-03-14 18:26:09 +01:00
Bruce Rogers
fe5a4218cb [openSUSE] hw/smbios: handle both file formats regardless of machine type (bsc#994082, bsc#1084316, boo#1131894)
It's easy enough to handle either per-spec or legacy smbios structures
in the smbios file input without regard to the machine type used, by
simply applying the basic smbios formatting rules. then depending on
what is detected. terminal numm bytes are added or removed for machine
type specific processing.

References: bsc#994082, bsc#1084316, boo#1131894
Signed-off-by: Bruce Rogers <brogers@suse.com>
2024-03-14 18:26:09 +01:00
Bruce Rogers
ca1c30aaca [openSUSE] roms/Makefile: add --cross-file to qboot meson setup for aarch64
We add a --cross-file reference so that we can do cross compilation
of qboot from an aarch64 build.

Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
Bruce Rogers
f04930b9c7 [openSUSE] roms/Makefile: pass a packaging timestamp to subpackages with date info (bsc#1011213)
Certain rom subpackages build from qemu git-submodules call the date
program to include date information in the packaged binaries. This
causes repeated builds of the package to be different, wkere the only
real difference is due to the fact that time build timestamp has
changed. To promote reproducible builds and avoid customers being
prompted to update packages needlessly, we'll use the timestamp of the
VERSION file as the packaging timestamp for all packages that build in a
timestamp for whatever reason.

References: bsc#1011213
Signed-off-by: Bruce Rogers <brogers@suse.com>
2024-03-14 18:26:09 +01:00
79113f134b [openSUSE][RPM] Spec file adjustments for 8.0.0 (and later)
The sgabios submodule is no longer there, so let's get rid of any
reference to it from our spec files.

Remove no longer supported './configure' options.

We're also not set yet for using the set_version service, so we need to
update the following manually:
- the Version: tags in the spec files
- the rpm/seabios_version and rpm/skiboot_version files (see qemu.spec
  for instructions on how to do that)
- the %{sbver} variable in rpm/common.inc

A better solution for handling this aspect is being worked on.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:09 +01:00
d5da72c920 [openSUSE][OBS] Add OBS workflow
Create a rebuild (for pushes) and a pull request workflow.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:08 +01:00
107266f5bb [openSUSE][RPM] Split qemu and qemu-linux-user spec files
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:08 +01:00
37314a98fc [openSUSE][RPM] Provide seabios and skiboot version files
In an upstream tarball there are some special files, generated by a
script that is run when the archive is prepared. Let's make our
repository look a little more like that, so we can build it properly.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:08 +01:00
daa30db454 [openSUSE][RPM] Add downstream packaging files
Stash the "packaging files" in the QEMU repository, in the rpm/
directory. During package build, they will be pulled out from there
and used as appropriate.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:08 +01:00
209 changed files with 12075 additions and 1172 deletions

25
.gitmodules vendored
View File

@@ -1,12 +1,12 @@
[submodule "roms/seabios"]
path = roms/seabios
url = https://gitlab.com/qemu-project/seabios.git/
url = https://github.com/openSUSE/qemu-seabios.git
[submodule "roms/SLOF"]
path = roms/SLOF
url = https://gitlab.com/qemu-project/SLOF.git
url = https://github.com/openSUSE/qemu-SLOF.git
[submodule "roms/ipxe"]
path = roms/ipxe
url = https://gitlab.com/qemu-project/ipxe.git
url = https://github.com/openSUSE/qemu-ipxe.git
[submodule "roms/openbios"]
path = roms/openbios
url = https://gitlab.com/qemu-project/openbios.git
@@ -18,7 +18,7 @@
url = https://gitlab.com/qemu-project/u-boot.git
[submodule "roms/skiboot"]
path = roms/skiboot
url = https://gitlab.com/qemu-project/skiboot.git
url = https://github.com/openSUSE/qemu-skiboot.git
[submodule "roms/QemuMacDrivers"]
path = roms/QemuMacDrivers
url = https://gitlab.com/qemu-project/QemuMacDrivers.git
@@ -36,10 +36,25 @@
url = https://gitlab.com/qemu-project/opensbi.git
[submodule "roms/qboot"]
path = roms/qboot
url = https://gitlab.com/qemu-project/qboot.git
url = https://github.com/openSUSE/qemu-qboot.git
[submodule "roms/vbootrom"]
path = roms/vbootrom
url = https://gitlab.com/qemu-project/vbootrom.git
[submodule "tests/lcitool/libvirt-ci"]
path = tests/lcitool/libvirt-ci
url = https://gitlab.com/libvirt/libvirt-ci.git
[submodule "subprojects/berkeley-softfloat-3"]
path = subprojects/berkeley-softfloat-3
url = https://gitlab.com/qemu-project/berkeley-softfloat-3
[submodule "subprojects/berkeley-testfloat-3"]
path = subprojects/berkeley-testfloat-3
url = https://gitlab.com/qemu-project/berkeley-testfloat-3
[submodule "subprojects/dtc"]
path = subprojects/dtc
url = https://gitlab.com/qemu-project/dtc.git
[submodule "subprojects/libvfio-user"]
path = subprojects/libvfio-user
url = https://gitlab.com/qemu-project/libvfio-user.git
[submodule "subprojects/keycodemapdb"]
path = subprojects/keycodemapdb
url = https://gitlab.com/qemu-project/keycodemapdb.git

22
.obs/workflows.yml Normal file
View File

@@ -0,0 +1,22 @@
pr_workflow:
steps:
- branch_package:
source_project: Virtualization:Staging
source_package: qemu
target_project: Virtualization:Staging:PRs
filters:
event: pull_request
branches:
only:
- factory
rebuild_workflow:
steps:
# Will automatically rebuild the package
- trigger_services:
project: Virtualization:Staging
package: qemu
filters:
event: push
branches:
only:
- factory

View File

@@ -69,16 +69,6 @@
#define KVM_GUESTDBG_BLOCKIRQ 0
#endif
//#define DEBUG_KVM
#ifdef DEBUG_KVM
#define DPRINTF(fmt, ...) \
do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
#else
#define DPRINTF(fmt, ...) \
do { } while (0)
#endif
struct KVMParkedVcpu {
unsigned long vcpu_id;
int kvm_fd;
@@ -101,6 +91,8 @@ bool kvm_msi_use_devid;
bool kvm_has_guest_debug;
static int kvm_sstep_flags;
static bool kvm_immediate_exit;
static bool kvm_guest_memfd_supported;
static uint64_t kvm_supported_memory_attributes;
static hwaddr kvm_max_slot_size = ~0;
static const KVMCapabilityInfo kvm_required_capabilites[] = {
@@ -292,34 +284,69 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram,
static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, bool new)
{
KVMState *s = kvm_state;
struct kvm_userspace_memory_region mem;
struct kvm_userspace_memory_region2 mem;
static int cap_user_memory2 = -1;
int ret;
if (cap_user_memory2 == -1) {
cap_user_memory2 = kvm_check_extension(s, KVM_CAP_USER_MEMORY2);
}
if (!cap_user_memory2 && slot->guest_memfd >= 0) {
error_report("%s, KVM doesn't support KVM_CAP_USER_MEMORY2,"
" which is required by guest memfd!", __func__);
exit(1);
}
mem.slot = slot->slot | (kml->as_id << 16);
mem.guest_phys_addr = slot->start_addr;
mem.userspace_addr = (unsigned long)slot->ram;
mem.flags = slot->flags;
mem.guest_memfd = slot->guest_memfd;
mem.guest_memfd_offset = slot->guest_memfd_offset;
if (slot->memory_size && !new && (mem.flags ^ slot->old_flags) & KVM_MEM_READONLY) {
/* Set the slot size to 0 before setting the slot to the desired
* value. This is needed based on KVM commit 75d61fbc. */
mem.memory_size = 0;
ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
if (cap_user_memory2) {
ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, &mem);
} else {
ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
}
if (ret < 0) {
goto err;
}
}
mem.memory_size = slot->memory_size;
ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
if (cap_user_memory2) {
ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, &mem);
} else {
ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
}
slot->old_flags = mem.flags;
err:
trace_kvm_set_user_memory(mem.slot, mem.flags, mem.guest_phys_addr,
mem.memory_size, mem.userspace_addr, ret);
trace_kvm_set_user_memory(mem.slot >> 16, (uint16_t)mem.slot, mem.flags,
mem.guest_phys_addr, mem.memory_size,
mem.userspace_addr, mem.guest_memfd,
mem.guest_memfd_offset, ret);
if (ret < 0) {
error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
" start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
__func__, mem.slot, slot->start_addr,
(uint64_t)mem.memory_size, strerror(errno));
if (cap_user_memory2) {
error_report("%s: KVM_SET_USER_MEMORY_REGION2 failed, slot=%d,"
" start=0x%" PRIx64 ", size=0x%" PRIx64 ","
" flags=0x%" PRIx32 ", guest_memfd=%" PRId32 ","
" guest_memfd_offset=0x%" PRIx64 ": %s",
__func__, mem.slot, slot->start_addr,
(uint64_t)mem.memory_size, mem.flags,
mem.guest_memfd, (uint64_t)mem.guest_memfd_offset,
strerror(errno));
} else {
error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
" start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
__func__, mem.slot, slot->start_addr,
(uint64_t)mem.memory_size, strerror(errno));
}
}
return ret;
}
@@ -331,7 +358,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
struct KVMParkedVcpu *vcpu = NULL;
int ret = 0;
DPRINTF("kvm_destroy_vcpu\n");
trace_kvm_destroy_vcpu();
ret = kvm_arch_destroy_vcpu(cpu);
if (ret < 0) {
@@ -341,7 +368,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
if (mmap_size < 0) {
ret = mmap_size;
DPRINTF("KVM_GET_VCPU_MMAP_SIZE failed\n");
trace_kvm_failed_get_vcpu_mmap_size();
goto err;
}
@@ -391,6 +418,11 @@ static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
}
int __attribute__ ((weak)) kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
{
return 0;
}
int kvm_init_vcpu(CPUState *cpu, Error **errp)
{
KVMState *s = kvm_state;
@@ -399,15 +431,27 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
/*
* tdx_pre_create_vcpu() may call cpu_x86_cpuid(). It in turn may call
* kvm_vm_ioctl(). Set cpu->kvm_state in advance to avoid NULL pointer
* dereference.
*/
cpu->kvm_state = s;
ret = kvm_arch_pre_create_vcpu(cpu, errp);
if (ret < 0) {
cpu->kvm_state = NULL;
goto err;
}
ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
if (ret < 0) {
error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed (%lu)",
kvm_arch_vcpu_id(cpu));
cpu->kvm_state = NULL;
goto err;
}
cpu->kvm_fd = ret;
cpu->kvm_state = s;
cpu->vcpu_dirty = true;
cpu->dirty_pages = 0;
cpu->throttle_us_per_full = 0;
@@ -443,7 +487,6 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
if (cpu->kvm_dirty_gfns == MAP_FAILED) {
ret = -errno;
DPRINTF("mmap'ing vcpu dirty gfns failed: %d\n", ret);
goto err;
}
}
@@ -475,6 +518,9 @@ static int kvm_mem_flags(MemoryRegion *mr)
if (readonly && kvm_readonly_mem_allowed) {
flags |= KVM_MEM_READONLY;
}
if (memory_region_has_guest_memfd(mr)) {
flags |= KVM_MEM_GUEST_MEMFD;
}
return flags;
}
@@ -1130,6 +1176,11 @@ int kvm_vm_check_extension(KVMState *s, unsigned int extension)
return ret;
}
/*
* We track the poisoned pages to be able to:
* - replace them on VM reset
* - block a migration for a VM with a poisoned page
*/
typedef struct HWPoisonPage {
ram_addr_t ram_addr;
QLIST_ENTRY(HWPoisonPage) list;
@@ -1163,6 +1214,11 @@ void kvm_hwpoison_page_add(ram_addr_t ram_addr)
QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
}
bool kvm_hwpoisoned_mem(void)
{
return !QLIST_EMPTY(&hwpoison_page_list);
}
static uint32_t adjust_ioeventfd_endianness(uint32_t val, uint32_t size)
{
#if HOST_BIG_ENDIAN != TARGET_BIG_ENDIAN
@@ -1266,6 +1322,46 @@ void kvm_set_max_memslot_size(hwaddr max_slot_size)
kvm_max_slot_size = max_slot_size;
}
static int kvm_set_memory_attributes(hwaddr start, hwaddr size, uint64_t attr)
{
struct kvm_memory_attributes attrs;
int r;
if (kvm_supported_memory_attributes == 0) {
error_report("No memory attribute supported by KVM\n");
return -EINVAL;
}
if ((attr & kvm_supported_memory_attributes) != attr) {
error_report("memory attribute 0x%lx not supported by KVM,"
" supported bits are 0x%lx\n",
attr, kvm_supported_memory_attributes);
return -EINVAL;
}
attrs.attributes = attr;
attrs.address = start;
attrs.size = size;
attrs.flags = 0;
r = kvm_vm_ioctl(kvm_state, KVM_SET_MEMORY_ATTRIBUTES, &attrs);
if (r) {
error_report("failed to set memory (0x%lx+%#zx) with attr 0x%lx error '%s'",
start, size, attr, strerror(errno));
}
return r;
}
int kvm_set_memory_attributes_private(hwaddr start, hwaddr size)
{
return kvm_set_memory_attributes(start, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
}
int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size)
{
return kvm_set_memory_attributes(start, size, 0);
}
/* Called with KVMMemoryListener.slots_lock held */
static void kvm_set_phys_mem(KVMMemoryListener *kml,
MemoryRegionSection *section, bool add)
@@ -1362,6 +1458,9 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
mem->ram_start_offset = ram_start_offset;
mem->ram = ram;
mem->flags = kvm_mem_flags(mr);
mem->guest_memfd = mr->ram_block->guest_memfd;
mem->guest_memfd_offset = (uint8_t*)ram - mr->ram_block->host;
kvm_slot_init_dirty_bitmap(mem);
err = kvm_set_user_memory_region(kml, mem, true);
if (err) {
@@ -1369,6 +1468,16 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
strerror(-err));
abort();
}
if (memory_region_has_guest_memfd(mr)) {
err = kvm_set_memory_attributes_private(start_addr, slot_size);
if (err) {
error_report("%s: failed to set memory attribute private: %s\n",
__func__, strerror(-err));
exit(1);
}
}
start_addr += slot_size;
ram_start_offset += slot_size;
ram += slot_size;
@@ -2396,6 +2505,11 @@ static int kvm_init(MachineState *ms)
}
s->as = g_new0(struct KVMAs, s->nr_as);
kvm_guest_memfd_supported = kvm_check_extension(s, KVM_CAP_GUEST_MEMFD);
ret = kvm_check_extension(s, KVM_CAP_MEMORY_ATTRIBUTES);
kvm_supported_memory_attributes = ret > 0 ? ret : 0;
if (object_property_find(OBJECT(current_machine), "kvm-type")) {
g_autofree char *kvm_type = object_property_get_str(OBJECT(current_machine),
"kvm-type",
@@ -2816,12 +2930,101 @@ static void kvm_eat_signals(CPUState *cpu)
} while (sigismember(&chkset, SIG_IPI));
}
int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
{
MemoryRegionSection section;
ram_addr_t offset;
MemoryRegion *mr;
RAMBlock *rb;
void *addr;
int ret = -1;
trace_kvm_convert_memory(start, size, to_private ? "shared_to_private" : "private_to_shared");
if (!QEMU_PTR_IS_ALIGNED(start, qemu_host_page_size) ||
!QEMU_PTR_IS_ALIGNED(size, qemu_host_page_size)) {
return -1;
}
if (!size) {
return -1;
}
section = memory_region_find(get_system_memory(), start, size);
mr = section.mr;
if (!mr) {
/*
* Ignore converting non-assigned region to shared.
*
* TDX requires vMMIO region to be shared to inject #VE to guest.
* OVMF issues conservatively MapGPA(shared) on 32bit PCI MMIO region,
* and vIO-APIC 0xFEC00000 4K page.
* OVMF assigns 32bit PCI MMIO region to
* [top of low memory: typically 2GB=0xC000000, 0xFC00000)
*/
if (!to_private) {
return 0;
}
return -1;
}
if (memory_region_has_guest_memfd(mr)) {
if (to_private) {
ret = kvm_set_memory_attributes_private(start, size);
} else {
ret = kvm_set_memory_attributes_shared(start, size);
}
if (ret) {
memory_region_unref(section.mr);
return ret;
}
addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
rb = qemu_ram_block_from_host(addr, false, &offset);
if (to_private) {
if (rb->page_size != qemu_host_page_size) {
/*
* shared memory is back'ed by hugetlb, which is supposed to be
* pre-allocated and doesn't need to be discarded
*/
return 0;
} else {
ret = ram_block_discard_range(rb, offset, size);
}
} else {
ret = ram_block_discard_guest_memfd_range(rb, offset, size);
}
} else {
/*
* Because vMMIO region must be shared, guest TD may convert vMMIO
* region to shared explicitly. Don't complain such case. See
* memory_region_type() for checking if the region is MMIO region.
*/
if (!to_private &&
!memory_region_is_ram(mr) &&
!memory_region_is_ram_device(mr) &&
!memory_region_is_rom(mr) &&
!memory_region_is_romd(mr)) {
ret = 0;
} else {
error_report("Convert non guest_memfd backed memory region "
"(0x%"HWADDR_PRIx" ,+ 0x%"HWADDR_PRIx") to %s",
start, size, to_private ? "private" : "shared");
}
}
memory_region_unref(section.mr);
return ret;
}
int kvm_cpu_exec(CPUState *cpu)
{
struct kvm_run *run = cpu->kvm_run;
int ret, run_ret;
DPRINTF("kvm_cpu_exec()\n");
trace_kvm_cpu_exec();
if (kvm_arch_process_async_events(cpu)) {
qatomic_set(&cpu->exit_request, 0);
@@ -2848,7 +3051,7 @@ int kvm_cpu_exec(CPUState *cpu)
kvm_arch_pre_run(cpu, run);
if (qatomic_read(&cpu->exit_request)) {
DPRINTF("interrupt exit requested\n");
trace_kvm_interrupt_exit_request();
/*
* KVM requires us to reenter the kernel after IO exits to complete
* instruction emulation. This self-signal will ensure that we
@@ -2878,29 +3081,30 @@ int kvm_cpu_exec(CPUState *cpu)
if (run_ret < 0) {
if (run_ret == -EINTR || run_ret == -EAGAIN) {
DPRINTF("io window exit\n");
trace_kvm_io_window_exit();
kvm_eat_signals(cpu);
ret = EXCP_INTERRUPT;
break;
}
fprintf(stderr, "error: kvm run failed %s\n",
strerror(-run_ret));
if (!(run_ret == -EFAULT && run->exit_reason == KVM_EXIT_MEMORY_FAULT)) {
fprintf(stderr, "error: kvm run failed %s\n",
strerror(-run_ret));
#ifdef TARGET_PPC
if (run_ret == -EBUSY) {
fprintf(stderr,
"This is probably because your SMT is enabled.\n"
"VCPU can only run on primary threads with all "
"secondary threads offline.\n");
}
if (run_ret == -EBUSY) {
fprintf(stderr,
"This is probably because your SMT is enabled.\n"
"VCPU can only run on primary threads with all "
"secondary threads offline.\n");
}
#endif
ret = -1;
break;
ret = -1;
break;
}
}
trace_kvm_run_exit(cpu->cpu_index, run->exit_reason);
switch (run->exit_reason) {
case KVM_EXIT_IO:
DPRINTF("handle_io\n");
/* Called outside BQL */
kvm_handle_io(run->io.port, attrs,
(uint8_t *)run + run->io.data_offset,
@@ -2910,7 +3114,6 @@ int kvm_cpu_exec(CPUState *cpu)
ret = 0;
break;
case KVM_EXIT_MMIO:
DPRINTF("handle_mmio\n");
/* Called outside BQL */
address_space_rw(&address_space_memory,
run->mmio.phys_addr, attrs,
@@ -2920,11 +3123,9 @@ int kvm_cpu_exec(CPUState *cpu)
ret = 0;
break;
case KVM_EXIT_IRQ_WINDOW_OPEN:
DPRINTF("irq_window_open\n");
ret = EXCP_INTERRUPT;
break;
case KVM_EXIT_SHUTDOWN:
DPRINTF("shutdown\n");
qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
ret = EXCP_INTERRUPT;
break;
@@ -2959,6 +3160,7 @@ int kvm_cpu_exec(CPUState *cpu)
ret = 0;
break;
case KVM_EXIT_SYSTEM_EVENT:
trace_kvm_run_exit_system_event(cpu->cpu_index, run->system_event.type);
switch (run->system_event.type) {
case KVM_SYSTEM_EVENT_SHUTDOWN:
qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
@@ -2976,13 +3178,21 @@ int kvm_cpu_exec(CPUState *cpu)
ret = 0;
break;
default:
DPRINTF("kvm_arch_handle_exit\n");
ret = kvm_arch_handle_exit(cpu, run);
break;
}
break;
case KVM_EXIT_MEMORY_FAULT:
if (run->memory_fault.flags & ~KVM_MEMORY_EXIT_FLAG_PRIVATE) {
error_report("KVM_EXIT_MEMORY_FAULT: Unknown flag 0x%" PRIx64,
(uint64_t)run->memory_fault.flags);
ret = -1;
break;
}
ret = kvm_convert_memory(run->memory_fault.gpa, run->memory_fault.size,
run->memory_fault.flags & KVM_MEMORY_EXIT_FLAG_PRIVATE);
break;
default:
DPRINTF("kvm_arch_handle_exit\n");
ret = kvm_arch_handle_exit(cpu, run);
break;
}
@@ -4077,3 +4287,25 @@ void query_stats_schemas_cb(StatsSchemaList **result, Error **errp)
query_stats_schema_vcpu(first_cpu, &stats_args);
}
}
int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
{
int fd;
struct kvm_create_guest_memfd guest_memfd = {
.size = size,
.flags = flags,
};
if (!kvm_guest_memfd_supported) {
error_setg(errp, "KVM doesn't support guest memfd\n");
return -1;
}
fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_GUEST_MEMFD, &guest_memfd);
if (fd < 0) {
error_setg_errno(errp, errno, "Error creating kvm guest memfd");
return -1;
}
return fd;
}

View File

@@ -15,7 +15,7 @@ kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
kvm_irqchip_release_virq(int virq) "virq %d"
kvm_set_ioeventfd_mmio(int fd, uint64_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%" PRIx64 " val=0x%x assign: %d size: %d match: %d"
kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%x val=0x%x assign: %d size: %d match: %d"
kvm_set_user_memory(uint32_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d"
kvm_set_user_memory(uint16_t as, uint16_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, uint32_t fd, uint64_t fd_offset, int ret) "AddrSpace#%d Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " guest_memfd=%d" " guest_memfd_offset=0x%" PRIx64 " ret=%d"
kvm_clear_dirty_log(uint32_t slot, uint64_t start, uint32_t size) "slot#%"PRId32" start 0x%"PRIx64" size 0x%"PRIx32
kvm_resample_fd_notify(int gsi) "gsi %d"
kvm_dirty_ring_full(int id) "vcpu %d"
@@ -25,4 +25,10 @@ kvm_dirty_ring_reaper(const char *s) "%s"
kvm_dirty_ring_reap(uint64_t count, int64_t t) "reaped %"PRIu64" pages (took %"PRIi64" us)"
kvm_dirty_ring_reaper_kick(const char *reason) "%s"
kvm_dirty_ring_flush(int finished) "%d"
kvm_destroy_vcpu(void) ""
kvm_failed_get_vcpu_mmap_size(void) ""
kvm_cpu_exec(void) ""
kvm_interrupt_exit_request(void) ""
kvm_io_window_exit(void) ""
kvm_run_exit_system_event(int cpu_index, uint32_t event_type) "cpu_index %d, system_even_type %"PRIu32
kvm_convert_memory(uint64_t start, uint64_t size, const char *msg) "start 0x%" PRIx64 " size 0x%" PRIx64 " %s"

View File

@@ -124,3 +124,13 @@ uint32_t kvm_dirty_ring_size(void)
{
return 0;
}
bool kvm_hwpoisoned_mem(void)
{
return false;
}
int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
{
return -ENOSYS;
}

View File

@@ -17,31 +17,29 @@
#include "sysemu/hostmem.h"
#include "hw/i386/hostmem-epc.h"
static void
static bool
sgx_epc_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
{
g_autofree char *name = NULL;
uint32_t ram_flags;
char *name;
int fd;
if (!backend->size) {
error_setg(errp, "can't create backend with size 0");
return;
return false;
}
fd = qemu_open_old("/dev/sgx_vepc", O_RDWR);
if (fd < 0) {
error_setg_errno(errp, errno,
"failed to open /dev/sgx_vepc to alloc SGX EPC");
return;
return false;
}
name = object_get_canonical_path(OBJECT(backend));
ram_flags = (backend->share ? RAM_SHARED : 0) | RAM_PROTECTED;
memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend),
name, backend->size, ram_flags,
fd, 0, errp);
g_free(name);
return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
backend->size, ram_flags, fd, 0, errp);
}
static void sgx_epc_backend_instance_init(Object *obj)

View File

@@ -36,24 +36,25 @@ struct HostMemoryBackendFile {
OnOffAuto rom;
};
static void
static bool
file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
{
#ifndef CONFIG_POSIX
error_setg(errp, "backend '%s' not supported on this host",
object_get_typename(OBJECT(backend)));
return false;
#else
HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
g_autofree gchar *name = NULL;
uint32_t ram_flags;
gchar *name;
if (!backend->size) {
error_setg(errp, "can't create backend with size 0");
return;
return false;
}
if (!fb->mem_path) {
error_setg(errp, "mem-path property not set");
return;
return false;
}
switch (fb->rom) {
@@ -65,18 +66,18 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
if (!fb->readonly) {
error_setg(errp, "property 'rom' = 'on' is not supported with"
" 'readonly' = 'off'");
return;
return false;
}
break;
case ON_OFF_AUTO_OFF:
if (fb->readonly && backend->share) {
error_setg(errp, "property 'rom' = 'off' is incompatible with"
" 'readonly' = 'on' and 'share' = 'on'");
return;
return false;
}
break;
default:
assert(false);
g_assert_not_reached();
}
name = host_memory_backend_get_name(backend);
@@ -84,12 +85,12 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
ram_flags |= fb->readonly ? RAM_READONLY_FD : 0;
ram_flags |= fb->rom == ON_OFF_AUTO_ON ? RAM_READONLY : 0;
ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
ram_flags |= RAM_NAMED_FILE;
memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
backend->size, fb->align, ram_flags,
fb->mem_path, fb->offset, errp);
g_free(name);
return memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
backend->size, fb->align, ram_flags,
fb->mem_path, fb->offset, errp);
#endif
}

View File

@@ -31,17 +31,17 @@ struct HostMemoryBackendMemfd {
bool seal;
};
static void
static bool
memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
{
HostMemoryBackendMemfd *m = MEMORY_BACKEND_MEMFD(backend);
g_autofree char *name = NULL;
uint32_t ram_flags;
char *name;
int fd;
if (!backend->size) {
error_setg(errp, "can't create backend with size 0");
return;
return false;
}
fd = qemu_memfd_create(TYPE_MEMORY_BACKEND_MEMFD, backend->size,
@@ -49,15 +49,15 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL : 0,
errp);
if (fd == -1) {
return;
return false;
}
name = host_memory_backend_get_name(backend);
ram_flags = backend->share ? RAM_SHARED : 0;
ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
backend->size, ram_flags, fd, 0, errp);
g_free(name);
ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
backend->size, ram_flags, fd, 0, errp);
}
static bool

View File

@@ -16,23 +16,24 @@
#include "qemu/module.h"
#include "qom/object_interfaces.h"
static void
static bool
ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
{
g_autofree char *name = NULL;
uint32_t ram_flags;
char *name;
if (!backend->size) {
error_setg(errp, "can't create backend with size 0");
return;
return false;
}
name = host_memory_backend_get_name(backend);
ram_flags = backend->share ? RAM_SHARED : 0;
ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend), name,
backend->size, ram_flags, errp);
g_free(name);
ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
return memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend),
name, backend->size,
ram_flags, errp);
}
static void

View File

@@ -219,7 +219,6 @@ static bool host_memory_backend_get_prealloc(Object *obj, Error **errp)
static void host_memory_backend_set_prealloc(Object *obj, bool value,
Error **errp)
{
Error *local_err = NULL;
HostMemoryBackend *backend = MEMORY_BACKEND(obj);
if (!backend->reserve && value) {
@@ -237,10 +236,8 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
void *ptr = memory_region_get_ram_ptr(&backend->mr);
uint64_t sz = memory_region_size(&backend->mr);
qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
backend->prealloc_context, &local_err);
if (local_err) {
error_propagate(errp, local_err);
if (!qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
backend->prealloc_context, errp)) {
return;
}
backend->prealloc = true;
@@ -279,6 +276,7 @@ static void host_memory_backend_init(Object *obj)
/* TODO: convert access to globals to compat properties */
backend->merge = machine_mem_merge(machine);
backend->dump = machine_dump_guest_core(machine);
backend->guest_memfd = machine_require_guest_memfd(machine);
backend->reserve = true;
backend->prealloc_threads = machine->smp.cpus;
}
@@ -324,91 +322,86 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
{
HostMemoryBackend *backend = MEMORY_BACKEND(uc);
HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(uc);
Error *local_err = NULL;
void *ptr;
uint64_t sz;
if (bc->alloc) {
bc->alloc(backend, &local_err);
if (local_err) {
goto out;
}
if (!bc->alloc) {
return;
}
if (!bc->alloc(backend, errp)) {
return;
}
ptr = memory_region_get_ram_ptr(&backend->mr);
sz = memory_region_size(&backend->mr);
ptr = memory_region_get_ram_ptr(&backend->mr);
sz = memory_region_size(&backend->mr);
if (backend->merge) {
qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
}
if (!backend->dump) {
qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
}
if (backend->merge) {
qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
}
if (!backend->dump) {
qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
}
#ifdef CONFIG_NUMA
unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
/* lastbit == MAX_NODES means maxnode = 0 */
unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
/* ensure policy won't be ignored in case memory is preallocated
* before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
* this doesn't catch hugepage case. */
unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
int mode = backend->policy;
unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
/* lastbit == MAX_NODES means maxnode = 0 */
unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
/* ensure policy won't be ignored in case memory is preallocated
* before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
* this doesn't catch hugepage case. */
unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
int mode = backend->policy;
/* check for invalid host-nodes and policies and give more verbose
* error messages than mbind(). */
if (maxnode && backend->policy == MPOL_DEFAULT) {
error_setg(errp, "host-nodes must be empty for policy default,"
" or you should explicitly specify a policy other"
" than default");
return;
} else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
error_setg(errp, "host-nodes must be set for policy %s",
HostMemPolicy_str(backend->policy));
return;
}
/* check for invalid host-nodes and policies and give more verbose
* error messages than mbind(). */
if (maxnode && backend->policy == MPOL_DEFAULT) {
error_setg(errp, "host-nodes must be empty for policy default,"
" or you should explicitly specify a policy other"
" than default");
return;
} else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
error_setg(errp, "host-nodes must be set for policy %s",
HostMemPolicy_str(backend->policy));
return;
}
/* We can have up to MAX_NODES nodes, but we need to pass maxnode+1
* as argument to mbind() due to an old Linux bug (feature?) which
* cuts off the last specified node. This means backend->host_nodes
* must have MAX_NODES+1 bits available.
*/
assert(sizeof(backend->host_nodes) >=
BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
assert(maxnode <= MAX_NODES);
/* We can have up to MAX_NODES nodes, but we need to pass maxnode+1
* as argument to mbind() due to an old Linux bug (feature?) which
* cuts off the last specified node. This means backend->host_nodes
* must have MAX_NODES+1 bits available.
*/
assert(sizeof(backend->host_nodes) >=
BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
assert(maxnode <= MAX_NODES);
#ifdef HAVE_NUMA_HAS_PREFERRED_MANY
if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
/*
* Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
* silently picks the first node.
*/
mode = MPOL_PREFERRED_MANY;
}
if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
/*
* Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
* silently picks the first node.
*/
mode = MPOL_PREFERRED_MANY;
}
#endif
if (maxnode &&
mbind(ptr, sz, mode, backend->host_nodes, maxnode + 1, flags)) {
if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
error_setg_errno(errp, errno,
"cannot bind memory to host NUMA nodes");
return;
}
}
#endif
/* Preallocate memory after the NUMA policy has been instantiated.
* This is necessary to guarantee memory is allocated with
* specified NUMA policy in place.
*/
if (backend->prealloc) {
qemu_prealloc_mem(memory_region_get_fd(&backend->mr), ptr, sz,
backend->prealloc_threads,
backend->prealloc_context, &local_err);
if (local_err) {
goto out;
}
if (maxnode &&
mbind(ptr, sz, mode, backend->host_nodes, maxnode + 1, flags)) {
if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
error_setg_errno(errp, errno,
"cannot bind memory to host NUMA nodes");
return;
}
}
out:
error_propagate(errp, local_err);
#endif
/* Preallocate memory after the NUMA policy has been instantiated.
* This is necessary to guarantee memory is allocated with
* specified NUMA policy in place.
*/
if (backend->prealloc && !qemu_prealloc_mem(memory_region_get_fd(&backend->mr),
ptr, sz,
backend->prealloc_threads,
backend->prealloc_context, errp)) {
return;
}
}
static bool

View File

@@ -6344,7 +6344,7 @@ BlockDeviceInfoList *bdrv_named_nodes_list(bool flat,
BlockDriverState *bs;
GLOBAL_STATE_CODE();
GRAPH_RDLOCK_GUARD_MAINLOOP();
GRAPH_RDLOCK_GUARD();
list = NULL;
QTAILQ_FOREACH(bs, &graph_bdrv_states, node_list) {

View File

@@ -226,6 +226,9 @@ typedef struct RawPosixAIOData {
struct {
unsigned long op;
} zone_mgmt;
struct {
struct stat *st;
} fstat;
};
} RawPosixAIOData;
@@ -2613,6 +2616,34 @@ static void raw_close(BlockDriverState *bs)
}
}
static int handle_aiocb_fstat(void *opaque)
{
RawPosixAIOData *aiocb = opaque;
if (fstat(aiocb->aio_fildes, aiocb->fstat.st) < 0) {
return -errno;
}
return 0;
}
static int coroutine_fn raw_co_fstat(BlockDriverState *bs, struct stat *st)
{
BDRVRawState *s = bs->opaque;
RawPosixAIOData acb;
acb = (RawPosixAIOData) {
.bs = bs,
.aio_fildes = s->fd,
.aio_type = QEMU_AIO_FSTAT,
.fstat = {
.st = st,
},
};
return raw_thread_pool_submit(handle_aiocb_fstat, &acb);
}
/**
* Truncates the given regular file @fd to @offset and, when growing, fills the
* new space according to @prealloc.
@@ -2857,11 +2888,14 @@ static int64_t coroutine_fn raw_co_getlength(BlockDriverState *bs)
static int64_t coroutine_fn raw_co_get_allocated_file_size(BlockDriverState *bs)
{
struct stat st;
BDRVRawState *s = bs->opaque;
int ret;
if (fstat(s->fd, &st) < 0) {
return -errno;
ret = raw_co_fstat(bs, &st);
if (ret) {
return ret;
}
return (int64_t)st.st_blocks * 512;
}

View File

@@ -149,6 +149,7 @@ block_gen_c = custom_target('block-gen.c',
'../include/block/dirty-bitmap.h',
'../include/block/block_int-io.h',
'../include/block/block-global-state.h',
'../include/block/qapi.h',
'../include/sysemu/block-backend-global-state.h',
'../include/sysemu/block-backend-io.h',
'coroutines.h'

View File

@@ -400,7 +400,7 @@ void hmp_nbd_server_start(Monitor *mon, const QDict *qdict)
bool writable = qdict_get_try_bool(qdict, "writable", false);
bool all = qdict_get_try_bool(qdict, "all", false);
Error *local_err = NULL;
BlockInfoList *block_list, *info;
BlockBackend *blk;
SocketAddress *addr;
NbdServerAddOptions export;
@@ -425,18 +425,24 @@ void hmp_nbd_server_start(Monitor *mon, const QDict *qdict)
return;
}
/* Then try adding all block devices. If one fails, close all and
/*
* Then try adding all block devices. If one fails, close all and
* exit.
*/
block_list = qmp_query_block(NULL);
for (blk = blk_all_next(NULL); blk; blk = blk_all_next(blk)) {
BlockDriverState *bs = blk_bs(blk);
for (info = block_list; info; info = info->next) {
if (!info->value->inserted) {
if (!*blk_name(blk) && !blk_get_attached_dev(blk)) {
continue;
}
bs = bdrv_skip_implicit_filters(bs);
if (!bs || !bs->drv) {
continue;
}
export = (NbdServerAddOptions) {
.device = info->value->device,
.device = g_strdup(blk_name(blk)),
.has_writable = true,
.writable = writable,
};
@@ -449,8 +455,6 @@ void hmp_nbd_server_start(Monitor *mon, const QDict *qdict)
}
}
qapi_free_BlockInfoList(block_list);
exit:
hmp_handle_error(mon, local_err);
}
@@ -744,7 +748,7 @@ static void print_block_info(Monitor *mon, BlockInfo *info,
}
}
void hmp_info_block(Monitor *mon, const QDict *qdict)
void coroutine_fn hmp_info_block(Monitor *mon, const QDict *qdict)
{
BlockInfoList *block_list, *info;
BlockDeviceInfoList *blockdev_list, *blockdev;

View File

@@ -41,10 +41,10 @@
#include "qemu/qemu-print.h"
#include "sysemu/block-backend.h"
BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
BlockDriverState *bs,
bool flat,
Error **errp)
BlockDeviceInfo *coroutine_fn bdrv_block_device_info(BlockBackend *blk,
BlockDriverState *bs,
bool flat,
Error **errp)
{
ImageInfo **p_image_info;
ImageInfo *backing_info;
@@ -234,8 +234,6 @@ bdrv_do_query_node_info(BlockDriverState *bs, BlockNodeInfo *info, Error **errp)
int ret;
Error *err = NULL;
aio_context_acquire(bdrv_get_aio_context(bs));
size = bdrv_getlength(bs);
if (size < 0) {
error_setg_errno(errp, -size, "Can't get image size '%s'",
@@ -248,7 +246,9 @@ bdrv_do_query_node_info(BlockDriverState *bs, BlockNodeInfo *info, Error **errp)
info->filename = g_strdup(bs->filename);
info->format = g_strdup(bdrv_get_format_name(bs));
info->virtual_size = size;
info->actual_size = bdrv_get_allocated_file_size(bs);
bdrv_graph_co_rdlock();
info->actual_size = bdrv_co_get_allocated_file_size(bs);
bdrv_graph_co_rdunlock();
info->has_actual_size = info->actual_size >= 0;
if (bs->encrypted) {
info->encrypted = true;
@@ -304,7 +304,7 @@ bdrv_do_query_node_info(BlockDriverState *bs, BlockNodeInfo *info, Error **errp)
}
out:
aio_context_release(bdrv_get_aio_context(bs));
return;
}
/**
@@ -374,7 +374,7 @@ fail:
}
/**
* bdrv_query_block_graph_info:
* bdrv_co_query_block_graph_info:
* @bs: root node to start from
* @p_info: location to store image information
* @errp: location to store error information
@@ -383,15 +383,17 @@ fail:
*
* @p_info will be set only on success. On error, store error in @errp.
*/
void bdrv_query_block_graph_info(BlockDriverState *bs,
BlockGraphInfo **p_info,
Error **errp)
void coroutine_fn bdrv_co_query_block_graph_info(BlockDriverState *bs,
BlockGraphInfo **p_info,
Error **errp)
{
BlockGraphInfo *info;
BlockChildInfoList **children_list_tail;
BdrvChild *c;
ERRP_GUARD();
assert_bdrv_graph_readable();
info = g_new0(BlockGraphInfo, 1);
bdrv_do_query_node_info(bs, qapi_BlockGraphInfo_base(info), errp);
if (*errp) {
@@ -407,7 +409,7 @@ void bdrv_query_block_graph_info(BlockDriverState *bs,
QAPI_LIST_APPEND(children_list_tail, c_info);
c_info->name = g_strdup(c->name);
bdrv_query_block_graph_info(c->bs, &c_info->info, errp);
bdrv_co_query_block_graph_info(c->bs, &c_info->info, errp);
if (*errp) {
goto fail;
}
@@ -665,13 +667,13 @@ bdrv_query_bds_stats(BlockDriverState *bs, bool blk_level)
return s;
}
BlockInfoList *qmp_query_block(Error **errp)
BlockInfoList *coroutine_fn qmp_query_block(Error **errp)
{
BlockInfoList *head = NULL, **p_next = &head;
BlockBackend *blk;
Error *local_err = NULL;
GRAPH_RDLOCK_GUARD_MAINLOOP();
GRAPH_RDLOCK_GUARD();
for (blk = blk_all_next(NULL); blk; blk = blk_all_next(blk)) {
BlockInfoList *info;

View File

@@ -389,7 +389,7 @@ int bdrv_snapshot_list(BlockDriverState *bs,
QEMUSnapshotInfo **psn_info)
{
GLOBAL_STATE_CODE();
GRAPH_RDLOCK_GUARD_MAINLOOP();
GRAPH_RDLOCK_GUARD();
BlockDriver *drv = bs->drv;
BlockDriverState *fallback_bs = bdrv_snapshot_fallback(bs);

View File

@@ -2871,9 +2871,9 @@ void qmp_drive_backup(DriveBackup *backup, Error **errp)
blockdev_do_action(&action, errp);
}
BlockDeviceInfoList *qmp_query_named_block_nodes(bool has_flat,
bool flat,
Error **errp)
BlockDeviceInfoList *coroutine_fn qmp_query_named_block_nodes(bool has_flat,
bool flat,
Error **errp)
{
bool return_flat = has_flat && flat;

View File

@@ -21,6 +21,7 @@
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#define HW_POISON_H /* avoid poison since we patch against rules it "enforces" */
#include "qemu/osdep.h"
#include "qemu/error-report.h"
#include "qapi/error.h"

View File

@@ -22,6 +22,7 @@
* THE SOFTWARE.
*/
#define HW_POISON_H /* avoid poison since we patch against rules it "enforces" */
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu/module.h"
@@ -198,6 +199,17 @@ static void mux_chr_accept_input(Chardev *chr)
be->chr_read(be->opaque,
&d->buffer[m][d->cons[m]++ & MUX_BUFFER_MASK], 1);
}
#if defined(TARGET_S390X)
/*
* We're still not able to sync producer and consumer, so let's wait a bit
* and try again by then.
*/
if (d->prod[m] != d->cons[m]) {
qemu_mod_timer(d->accept_timer, qemu_get_clock_ns(vm_clock)
+ (int64_t)100000);
}
#endif
}
static int mux_chr_can_read(void *opaque)
@@ -332,6 +344,10 @@ static void qemu_chr_open_mux(Chardev *chr,
}
d->focus = -1;
#if defined(TARGET_S390X)
d->accept_timer = qemu_new_timer_ns(vm_clock,
(QEMUTimerCB *)mux_chr_accept_input, chr);
#endif
/* only default to opened state if we've realized the initial
* set of muxes
*/

View File

@@ -22,6 +22,7 @@
* THE SOFTWARE.
*/
#define HW_POISON_H /* avoid poison since we patch against rules it "enforces" */
#include "qemu/osdep.h"
#include "qemu/cutils.h"
#include "monitor/monitor.h"

View File

@@ -37,6 +37,9 @@ struct MuxChardev {
Chardev parent;
CharBackend *backends[MAX_MUX];
CharBackend chr;
#if defined(TARGET_S390X)
QEMUTimer *accept_timer;
#endif
int focus;
int mux_cnt;
int term_got_escape;

View File

@@ -18,6 +18,7 @@
#CONFIG_QXL=n
#CONFIG_SEV=n
#CONFIG_SGA=n
#CONFIG_TDX=n
#CONFIG_TEST_DEVICES=n
#CONFIG_TPM_CRB=n
#CONFIG_TPM_TIS_ISA=n

View File

@@ -1,4 +1,4 @@
executable('ivshmem-client', files('ivshmem-client.c', 'main.c'), genh,
dependencies: glib,
build_by_default: targetos == 'linux',
install: false)
install: true)

View File

@@ -1,4 +1,4 @@
executable('ivshmem-server', files('ivshmem-server.c', 'main.c'), genh,
dependencies: [qemuutil, rt],
build_by_default: targetos == 'linux',
install: false)
install: true)

View File

@@ -13,12 +13,12 @@ if sphinx_build.found()
sphinx_version = run_command(SPHINX_ARGS + ['--version'],
check: true).stdout().split()[1]
if sphinx_version.version_compare('>=1.7.0')
SPHINX_ARGS += ['-j', 'auto']
SPHINX_ARGS += ['-j', '1']
else
nproc = find_program('nproc')
if nproc.found()
jobs = run_command(nproc, check: true).stdout()
SPHINX_ARGS += ['-j', jobs]
SPHINX_ARGS += ['-j', '1']
endif
endif

View File

@@ -38,6 +38,7 @@ Supported mechanisms
Currently supported confidential guest mechanisms are:
* AMD Secure Encrypted Virtualization (SEV) (see :doc:`i386/amd-memory-encryption`)
* Intel Trust Domain Extension (TDX) (see :doc:`i386/tdx`)
* POWER Protected Execution Facility (PEF) (see :ref:`power-papr-protected-execution-facility-pef`)
* s390x Protected Virtualization (PV) (see :doc:`s390x/protvirt`)

143
docs/system/i386/tdx.rst Normal file
View File

@@ -0,0 +1,143 @@
Intel Trusted Domain eXtension (TDX)
====================================
Intel Trusted Domain eXtensions (TDX) refers to an Intel technology that extends
Virtual Machine Extensions (VMX) and Multi-Key Total Memory Encryption (MKTME)
with a new kind of virtual machine guest called a Trust Domain (TD). A TD runs
in a CPU mode that is designed to protect the confidentiality of its memory
contents and its CPU state from any other software, including the hosting
Virtual Machine Monitor (VMM), unless explicitly shared by the TD itself.
Prerequisites
-------------
To run TD, the physical machine needs to have TDX module loaded and initialized
while KVM hypervisor has TDX support and has TDX enabled. If those requirements
are met, the ``KVM_CAP_VM_TYPES`` will report the support of ``KVM_X86_TDX_VM``.
Trust Domain Virtual Firmware (TDVF)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Trust Domain Virtual Firmware (TDVF) is required to provide TD services to boot
TD Guest OS. TDVF needs to be copied to guest private memory and measured before
the TD boots.
KVM vcpu ioctl ``KVM_MEMORY_MAPPING`` can be used to populates the TDVF content
into its private memory.
Since TDX doesn't support readonly memslot, TDVF cannot be mapped as pflash
device and it actually works as RAM. "-bios" option is chosen to load TDVF.
OVMF is the opensource firmware that implements the TDVF support. Thus the
command line to specify and load TDVF is ``-bios OVMF.fd``
KVM private memory
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TD's memory (RAM) needs to be able to be transformed between private and shared.
Its BIOS (OVMF/TDVF) needs to be mapped as private as well. Thus QEMU needs to
allocate private guest memfd for them via KVM's IOCTL (KVM_CREATE_GUEST_MEMFD),
which requires KVM is newer enough that reports KVM_CAP_GUEST_MEMFD.
Feature Control
---------------
Unlike non-TDX VM, the CPU features (enumerated by CPU or MSR) of a TD is not
under full control of VMM. VMM can only configure part of features of a TD on
``KVM_TDX_INIT_VM`` command of VM scope ``MEMORY_ENCRYPT_OP`` ioctl.
The configurable features have three types:
- Attributes:
- PKS (bit 30) controls whether Supervisor Protection Keys is exposed to TD,
which determines related CPUID bit and CR4 bit;
- PERFMON (bit 63) controls whether PMU is exposed to TD.
- XSAVE related features (XFAM):
XFAM is a 64b mask, which has the same format as XCR0 or IA32_XSS MSR. It
determines the set of extended features available for use by the guest TD.
- CPUID features:
Only some bits of some CPUID leaves are directly configurable by VMM.
What features can be configured is reported via TDX capabilities.
TDX capabilities
~~~~~~~~~~~~~~~~
The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command ``KVM_TDX_CAPABILITIES``
to get the TDX capabilities from KVM. It returns a data structure of
``struct kvm_tdx_capabilites``, which tells the supported configuration of
attributes, XFAM and CPUIDs.
TD attestation
--------------
In TD guest, the attestation process is used to verify the TDX guest
trustworthiness to other entities before provisioning secrets to the guest.
TD attestation is initiated first by calling TDG.MR.REPORT inside TD to get the
REPORT. Then the REPORT data needs to be converted into a remotely verifiable
Quote by SGX Quoting Enclave (QE).
A host daemon, Quote Generation Service (QGS), provides the functionality of
SGX GE. It provides a socket address, to which a TD guest can connect via
"quote-generation-socket" property. On the request of <GETQUOTE> from TD guest,
QEMU sends the TDREPORT to QGS via "quote-generation-socket" socket, and gets
the returning Quoting and return it back to TD guest.
Though "quote-generation-socket" is optional for booting the TD guest, it's a
must for supporting TD guest atteatation.
Launching a TD (TDX VM)
-----------------------
To launch a TDX guest, below are new added and required:
.. parsed-literal::
|qemu_system_x86| \\
-object tdx-guest,id=tdx0 \\
-machine ...,kernel-irqchip=split,confidential-guest-support=tdx0 \\
-bios OVMF.fd \\
If TD attestation support is wanted:
.. parsed-literal::
|qemu_system_x86| \\
-object '{"qom-type":"tdx-guest","id":"tdx0","quote-generation-socket":{"type": "vsock", "cid":"1","port":"1234"}}' \\
-machine ...,kernel-irqchip=split,confidential-guest-support=tdx0 \\
-bios OVMF.fd \\
Debugging
---------
Bit 0 of TD attributes, is DEBUG bit, which decides if the TD runs in off-TD
debug mode. When in off-TD debug mode, TD's VCPU state and private memory are
accessible via given SEAMCALLs. This requires KVM to expose APIs to invoke those
SEAMCALLs and resonponding QEMU change.
It's targeted as future work.
restrictions
------------
- kernel-irqchip must be split;
- No readonly support for private memory;
- No SMM support: SMM support requires manipulating the guset register states
which is not allowed;
Live Migration
--------------
TODO
References
----------
- `TDX Homepage <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html>`__
- `SGX QE <https://github.com/intel/SGXDataCenterAttestationPrimitives/tree/master/QuoteGeneration>`__

View File

@@ -29,6 +29,7 @@ Architectural features
i386/kvm-pv
i386/sgx
i386/amd-memory-encryption
i386/tdx
OS requirements
~~~~~~~~~~~~~~~

View File

@@ -65,6 +65,7 @@ ERST
.help = "show info of one block device or all block devices "
"(-n: show named nodes; -v: show details)",
.cmd = hmp_info_block,
.coroutine = true,
},
SRST

View File

@@ -247,7 +247,6 @@ static void aspeed_ast2400_soc_realize(DeviceState *dev, Error **errp)
Aspeed2400SoCState *a = ASPEED2400_SOC(dev);
AspeedSoCState *s = ASPEED_SOC(dev);
AspeedSoCClass *sc = ASPEED_SOC_GET_CLASS(s);
Error *err = NULL;
g_autofree char *sram_name = NULL;
/* Default boot region (SPI memory or ROMs) */
@@ -276,9 +275,8 @@ static void aspeed_ast2400_soc_realize(DeviceState *dev, Error **errp)
/* SRAM */
sram_name = g_strdup_printf("aspeed.sram.%d", CPU(&a->cpu[0])->cpu_index);
memory_region_init_ram(&s->sram, OBJECT(s), sram_name, sc->sram_size, &err);
if (err) {
error_propagate(errp, err);
if (!memory_region_init_ram(&s->sram, OBJECT(s), sram_name, sc->sram_size,
errp)) {
return;
}
memory_region_add_subregion(s->memory,

View File

@@ -282,7 +282,6 @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, Error **errp)
Aspeed2600SoCState *a = ASPEED2600_SOC(dev);
AspeedSoCState *s = ASPEED_SOC(dev);
AspeedSoCClass *sc = ASPEED_SOC_GET_CLASS(s);
Error *err = NULL;
qemu_irq irq;
g_autofree char *sram_name = NULL;
@@ -355,9 +354,8 @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, Error **errp)
/* SRAM */
sram_name = g_strdup_printf("aspeed.sram.%d", CPU(&a->cpu[0])->cpu_index);
memory_region_init_ram(&s->sram, OBJECT(s), sram_name, sc->sram_size, &err);
if (err) {
error_propagate(errp, err);
if (!memory_region_init_ram(&s->sram, OBJECT(s), sram_name, sc->sram_size,
errp)) {
return;
}
memory_region_add_subregion(s->memory,

View File

@@ -81,7 +81,6 @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp)
{
FslIMX25State *s = FSL_IMX25(dev);
uint8_t i;
Error *err = NULL;
if (!qdev_realize(DEVICE(&s->cpu), NULL, errp)) {
return;
@@ -281,28 +280,22 @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp)
FSL_IMX25_WDT_IRQ));
/* initialize 2 x 16 KB ROM */
memory_region_init_rom(&s->rom[0], OBJECT(dev), "imx25.rom0",
FSL_IMX25_ROM0_SIZE, &err);
if (err) {
error_propagate(errp, err);
if (!memory_region_init_rom(&s->rom[0], OBJECT(dev), "imx25.rom0",
FSL_IMX25_ROM0_SIZE, errp)) {
return;
}
memory_region_add_subregion(get_system_memory(), FSL_IMX25_ROM0_ADDR,
&s->rom[0]);
memory_region_init_rom(&s->rom[1], OBJECT(dev), "imx25.rom1",
FSL_IMX25_ROM1_SIZE, &err);
if (err) {
error_propagate(errp, err);
if (!memory_region_init_rom(&s->rom[1], OBJECT(dev), "imx25.rom1",
FSL_IMX25_ROM1_SIZE, errp)) {
return;
}
memory_region_add_subregion(get_system_memory(), FSL_IMX25_ROM1_ADDR,
&s->rom[1]);
/* initialize internal RAM (128 KB) */
memory_region_init_ram(&s->iram, NULL, "imx25.iram", FSL_IMX25_IRAM_SIZE,
&err);
if (err) {
error_propagate(errp, err);
if (!memory_region_init_ram(&s->iram, NULL, "imx25.iram",
FSL_IMX25_IRAM_SIZE, errp)) {
return;
}
memory_region_add_subregion(get_system_memory(), FSL_IMX25_IRAM_ADDR,

View File

@@ -63,7 +63,6 @@ static void fsl_imx31_realize(DeviceState *dev, Error **errp)
{
FslIMX31State *s = FSL_IMX31(dev);
uint16_t i;
Error *err = NULL;
if (!qdev_realize(DEVICE(&s->cpu), NULL, errp)) {
return;
@@ -188,30 +187,24 @@ static void fsl_imx31_realize(DeviceState *dev, Error **errp)
sysbus_mmio_map(SYS_BUS_DEVICE(&s->wdt), 0, FSL_IMX31_WDT_ADDR);
/* On a real system, the first 16k is a `secure boot rom' */
memory_region_init_rom(&s->secure_rom, OBJECT(dev), "imx31.secure_rom",
FSL_IMX31_SECURE_ROM_SIZE, &err);
if (err) {
error_propagate(errp, err);
if (!memory_region_init_rom(&s->secure_rom, OBJECT(dev), "imx31.secure_rom",
FSL_IMX31_SECURE_ROM_SIZE, errp)) {
return;
}
memory_region_add_subregion(get_system_memory(), FSL_IMX31_SECURE_ROM_ADDR,
&s->secure_rom);
/* There is also a 16k ROM */
memory_region_init_rom(&s->rom, OBJECT(dev), "imx31.rom",
FSL_IMX31_ROM_SIZE, &err);
if (err) {
error_propagate(errp, err);
if (!memory_region_init_rom(&s->rom, OBJECT(dev), "imx31.rom",
FSL_IMX31_ROM_SIZE, errp)) {
return;
}
memory_region_add_subregion(get_system_memory(), FSL_IMX31_ROM_ADDR,
&s->rom);
/* initialize internal RAM (16 KB) */
memory_region_init_ram(&s->iram, NULL, "imx31.iram", FSL_IMX31_IRAM_SIZE,
&err);
if (err) {
error_propagate(errp, err);
if (!memory_region_init_ram(&s->iram, NULL, "imx31.iram",
FSL_IMX31_IRAM_SIZE, errp)) {
return;
}
memory_region_add_subregion(get_system_memory(), FSL_IMX31_IRAM_ADDR,

View File

@@ -109,7 +109,6 @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp)
MachineState *ms = MACHINE(qdev_get_machine());
FslIMX6State *s = FSL_IMX6(dev);
uint16_t i;
Error *err = NULL;
unsigned int smp_cpus = ms->smp.cpus;
if (smp_cpus > FSL_IMX6_NUM_CPUS) {
@@ -423,30 +422,24 @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp)
}
/* ROM memory */
memory_region_init_rom(&s->rom, OBJECT(dev), "imx6.rom",
FSL_IMX6_ROM_SIZE, &err);
if (err) {
error_propagate(errp, err);
if (!memory_region_init_rom(&s->rom, OBJECT(dev), "imx6.rom",
FSL_IMX6_ROM_SIZE, errp)) {
return;
}
memory_region_add_subregion(get_system_memory(), FSL_IMX6_ROM_ADDR,
&s->rom);
/* CAAM memory */
memory_region_init_rom(&s->caam, OBJECT(dev), "imx6.caam",
FSL_IMX6_CAAM_MEM_SIZE, &err);
if (err) {
error_propagate(errp, err);
if (!memory_region_init_rom(&s->caam, OBJECT(dev), "imx6.caam",
FSL_IMX6_CAAM_MEM_SIZE, errp)) {
return;
}
memory_region_add_subregion(get_system_memory(), FSL_IMX6_CAAM_MEM_ADDR,
&s->caam);
/* OCRAM memory */
memory_region_init_ram(&s->ocram, NULL, "imx6.ocram", FSL_IMX6_OCRAM_SIZE,
&err);
if (err) {
error_propagate(errp, err);
if (!memory_region_init_ram(&s->ocram, NULL, "imx6.ocram",
FSL_IMX6_OCRAM_SIZE, errp)) {
return;
}
memory_region_add_subregion(get_system_memory(), FSL_IMX6_OCRAM_ADDR,

View File

@@ -291,12 +291,9 @@ static void integratorcm_realize(DeviceState *d, Error **errp)
{
IntegratorCMState *s = INTEGRATOR_CM(d);
SysBusDevice *dev = SYS_BUS_DEVICE(d);
Error *local_err = NULL;
memory_region_init_ram(&s->flash, OBJECT(d), "integrator.flash", 0x100000,
&local_err);
if (local_err) {
error_propagate(errp, local_err);
if (!memory_region_init_ram(&s->flash, OBJECT(d), "integrator.flash",
0x100000, errp)) {
return;
}

View File

@@ -58,7 +58,6 @@ static void nrf51_soc_realize(DeviceState *dev_soc, Error **errp)
{
NRF51State *s = NRF51_SOC(dev_soc);
MemoryRegion *mr;
Error *err = NULL;
uint8_t i = 0;
hwaddr base_addr = 0;
@@ -92,10 +91,8 @@ static void nrf51_soc_realize(DeviceState *dev_soc, Error **errp)
memory_region_add_subregion_overlap(&s->container, 0, s->board_memory, -1);
memory_region_init_ram(&s->sram, OBJECT(s), "nrf51.sram", s->sram_size,
&err);
if (err) {
error_propagate(errp, err);
if (!memory_region_init_ram(&s->sram, OBJECT(s), "nrf51.sram", s->sram_size,
errp)) {
return;
}
memory_region_add_subregion(&s->container, NRF51_SRAM_BASE, &s->sram);

View File

@@ -418,6 +418,9 @@ static void xen_block_realize(XenDevice *xendev, Error **errp)
xen_block_set_size(blockdev);
if (!monitor_add_blk(conf->blk, blockdev->drive->id, errp)) {
return;
}
blockdev->dataplane =
xen_block_dataplane_create(xendev, blk, conf->logical_block_size,
blockdev->props.iothread);
@@ -874,6 +877,8 @@ static XenBlockDrive *xen_block_drive_create(const char *id,
const char *mode = qdict_get_try_str(opts, "mode");
const char *direct_io_safe = qdict_get_try_str(opts, "direct-io-safe");
const char *discard_enable = qdict_get_try_str(opts, "discard-enable");
const char *suse_diskcache_disable_flush = qdict_get_try_str(opts,
"suse-diskcache-disable-flush");
char *driver = NULL;
char *filename = NULL;
XenBlockDrive *drive = NULL;
@@ -954,6 +959,16 @@ static XenBlockDrive *xen_block_drive_create(const char *id,
}
}
if (suse_diskcache_disable_flush) {
unsigned long value;
if (!qemu_strtoul(suse_diskcache_disable_flush, NULL, 2, &value) && !!value) {
QDict *cache_qdict = qdict_new();
qdict_put_bool(cache_qdict, "no-flush", true);
qdict_put_obj(file_layer, "cache", QOBJECT(cache_qdict));
}
}
/*
* It is necessary to turn file locking off as an emulated device
* may have already opened the same image file.

View File

@@ -1189,6 +1189,11 @@ bool machine_mem_merge(MachineState *machine)
return machine->mem_merge;
}
bool machine_require_guest_memfd(MachineState *machine)
{
return machine->require_guest_memfd;
}
static char *cpu_slot_to_string(const CPUArchId *cpu)
{
GString *s = g_string_new(NULL);

View File

@@ -10,6 +10,11 @@ config SGX
bool
depends on KVM
config TDX
bool
select X86_FW_OVMF
depends on KVM
config PC
bool
imply APPLESMC
@@ -26,6 +31,7 @@ config PC
imply QXL
imply SEV
imply SGX
imply TDX
imply TEST_DEVICES
imply TPM_CRB
imply TPM_TIS_ISA

View File

@@ -975,7 +975,8 @@ static void build_dbg_aml(Aml *table)
aml_append(table, scope);
}
static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg,
bool level_trigger_unsupported)
{
Aml *dev;
Aml *crs;
@@ -987,7 +988,10 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
crs = aml_resource_template();
aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
aml_append(crs, aml_interrupt(AML_CONSUMER,
level_trigger_unsupported ?
AML_EDGE : AML_LEVEL,
AML_ACTIVE_HIGH,
AML_SHARED, irqs, ARRAY_SIZE(irqs)));
aml_append(dev, aml_name_decl("_PRS", crs));
@@ -1011,7 +1015,8 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
return dev;
}
static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
static Aml *build_gsi_link_dev(const char *name, uint8_t uid,
uint8_t gsi, bool level_trigger_unsupported)
{
Aml *dev;
Aml *crs;
@@ -1024,7 +1029,10 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
crs = aml_resource_template();
irqs = gsi;
aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
aml_append(crs, aml_interrupt(AML_CONSUMER,
level_trigger_unsupported ?
AML_EDGE : AML_LEVEL,
AML_ACTIVE_HIGH,
AML_SHARED, &irqs, 1));
aml_append(dev, aml_name_decl("_PRS", crs));
@@ -1043,7 +1051,7 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
}
/* _CRS method - get current settings */
static Aml *build_iqcr_method(bool is_piix4)
static Aml *build_iqcr_method(bool is_piix4, bool level_trigger_unsupported)
{
Aml *if_ctx;
uint32_t irqs;
@@ -1051,7 +1059,9 @@ static Aml *build_iqcr_method(bool is_piix4)
Aml *crs = aml_resource_template();
irqs = 0;
aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
aml_append(crs, aml_interrupt(AML_CONSUMER,
level_trigger_unsupported ?
AML_EDGE : AML_LEVEL,
AML_ACTIVE_HIGH, AML_SHARED, &irqs, 1));
aml_append(method, aml_name_decl("PRR0", crs));
@@ -1085,7 +1095,7 @@ static Aml *build_irq_status_method(void)
return method;
}
static void build_piix4_pci0_int(Aml *table)
static void build_piix4_pci0_int(Aml *table, bool level_trigger_unsupported)
{
Aml *dev;
Aml *crs;
@@ -1098,12 +1108,16 @@ static void build_piix4_pci0_int(Aml *table)
aml_append(sb_scope, pci0_scope);
aml_append(sb_scope, build_irq_status_method());
aml_append(sb_scope, build_iqcr_method(true));
aml_append(sb_scope, build_iqcr_method(true, level_trigger_unsupported));
aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0")));
aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1")));
aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2")));
aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3")));
aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0"),
level_trigger_unsupported));
aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1"),
level_trigger_unsupported));
aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2"),
level_trigger_unsupported));
aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3"),
level_trigger_unsupported));
dev = aml_device("LNKS");
{
@@ -1112,7 +1126,9 @@ static void build_piix4_pci0_int(Aml *table)
crs = aml_resource_template();
irqs = 9;
aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
aml_append(crs, aml_interrupt(AML_CONSUMER,
level_trigger_unsupported ?
AML_EDGE : AML_LEVEL,
AML_ACTIVE_HIGH, AML_SHARED,
&irqs, 1));
aml_append(dev, aml_name_decl("_PRS", crs));
@@ -1198,7 +1214,7 @@ static Aml *build_q35_routing_table(const char *str)
return pkg;
}
static void build_q35_pci0_int(Aml *table)
static void build_q35_pci0_int(Aml *table, bool level_trigger_unsupported)
{
Aml *method;
Aml *sb_scope = aml_scope("_SB");
@@ -1237,25 +1253,41 @@ static void build_q35_pci0_int(Aml *table)
aml_append(sb_scope, pci0_scope);
aml_append(sb_scope, build_irq_status_method());
aml_append(sb_scope, build_iqcr_method(false));
aml_append(sb_scope, build_iqcr_method(false, level_trigger_unsupported));
aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA")));
aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB")));
aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC")));
aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD")));
aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE")));
aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF")));
aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG")));
aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH")));
aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA"),
level_trigger_unsupported));
aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB"),
level_trigger_unsupported));
aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC"),
level_trigger_unsupported));
aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD"),
level_trigger_unsupported));
aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE"),
level_trigger_unsupported));
aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF"),
level_trigger_unsupported));
aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG"),
level_trigger_unsupported));
aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH"),
level_trigger_unsupported));
aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10));
aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11));
aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12));
aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13));
aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14));
aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15));
aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16));
aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17));
aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10,
level_trigger_unsupported));
aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11,
level_trigger_unsupported));
aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12,
level_trigger_unsupported));
aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13,
level_trigger_unsupported));
aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14,
level_trigger_unsupported));
aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15,
level_trigger_unsupported));
aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16,
level_trigger_unsupported));
aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17,
level_trigger_unsupported));
aml_append(table, sb_scope);
}
@@ -1436,6 +1468,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
PCMachineState *pcms = PC_MACHINE(machine);
PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
X86MachineState *x86ms = X86_MACHINE(machine);
bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
AcpiMcfgInfo mcfg;
bool mcfg_valid = !!acpi_get_mcfg(&mcfg);
uint32_t nr_mem = machine->ram_slots;
@@ -1468,7 +1501,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
if (pm->pcihp_bridge_en || pm->pcihp_root_en) {
build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
}
build_piix4_pci0_int(dsdt);
build_piix4_pci0_int(dsdt, level_trigger_unsupported);
} else if (q35) {
sb_scope = aml_scope("_SB");
dev = aml_device("PCI0");
@@ -1512,7 +1545,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
if (pm->pcihp_bridge_en) {
build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
}
build_q35_pci0_int(dsdt);
build_q35_pci0_int(dsdt, level_trigger_unsupported);
}
if (misc->has_hpet) {

View File

@@ -100,9 +100,11 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
int i;
bool x2apic_mode = false;
MachineClass *mc = MACHINE_GET_CLASS(x86ms);
X86MachineClass *x86mc = X86_MACHINE_GET_CLASS(x86ms);
const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(MACHINE(x86ms));
AcpiTable table = { .sig = "APIC", .rev = 3, .oem_id = oem_id,
.oem_table_id = oem_table_id };
bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
acpi_table_begin(&table, table_data);
/* Local APIC Address */
@@ -122,18 +124,42 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
IO_APIC_SECONDARY_ADDRESS, IO_APIC_SECONDARY_IRQBASE);
}
if (x86ms->apic_xrupt_override) {
build_xrupt_override(table_data, 0, 2,
0 /* Flags: Conforms to the specifications of the bus */);
}
for (i = 1; i < 16; i++) {
if (!(x86ms->pci_irq_mask & (1 << i))) {
/* No need for a INT source override structure. */
continue;
if (level_trigger_unsupported) {
/* Force edge trigger */
if (x86mc->apic_xrupt_override) {
build_xrupt_override(table_data, 0, 2,
/* Flags: active high, edge triggered */
1 | (1 << 2));
}
for (i = x86mc->apic_xrupt_override ? 1 : 0; i < 16; i++) {
build_xrupt_override(table_data, i, i,
/* Flags: active high, edge triggered */
1 | (1 << 2));
}
if (x86ms->ioapic2) {
for (i = 0; i < 16; i++) {
build_xrupt_override(table_data, IO_APIC_SECONDARY_IRQBASE + i,
IO_APIC_SECONDARY_IRQBASE + i,
/* Flags: active high, edge triggered */
1 | (1 << 2));
}
}
} else {
if (x86mc->apic_xrupt_override) {
build_xrupt_override(table_data, 0, 2,
0 /* Flags: Conforms to the specifications of the bus */);
}
for (i = 1; i < 16; i++) {
if (!(x86ms->pci_irq_mask & (1 << i))) {
/* No need for a INT source override structure. */
continue;
}
build_xrupt_override(table_data, i, i,
0xd /* Flags: Active high, Level Triggered */);
}
build_xrupt_override(table_data, i, i,
0xd /* Flags: Active high, Level Triggered */);
}
if (x2apic_mode) {

View File

@@ -48,15 +48,25 @@ const char *fw_cfg_arch_key_name(uint16_t key)
return NULL;
}
void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg)
void fw_cfg_build_smbios(PCMachineState *pcms, FWCfgState *fw_cfg)
{
#ifdef CONFIG_SMBIOS
uint8_t *smbios_tables, *smbios_anchor;
size_t smbios_tables_len, smbios_anchor_len;
struct smbios_phys_mem_area *mem_array;
unsigned i, array_count;
MachineState *ms = MACHINE(pcms);
PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
MachineClass *mc = MACHINE_GET_CLASS(pcms);
X86CPU *cpu = X86_CPU(ms->possible_cpus->cpus[0].cpu);
if (pcmc->smbios_defaults) {
/* These values are guest ABI, do not change */
smbios_set_defaults("QEMU", mc->desc, mc->name,
pcmc->smbios_legacy_mode, pcmc->smbios_uuid_encoded,
pcms->smbios_entry_point_type);
}
/* tell smbios about cpuid version and features */
smbios_set_cpuid(cpu->env.cpuid_version, cpu->env.features[FEAT_1_EDX]);

View File

@@ -10,6 +10,7 @@
#define HW_I386_FW_CFG_H
#include "hw/boards.h"
#include "hw/i386/pc.h"
#include "hw/nvram/fw_cfg.h"
#define FW_CFG_IO_BASE 0x510
@@ -22,7 +23,7 @@
FWCfgState *fw_cfg_arch_create(MachineState *ms,
uint16_t boot_cpus,
uint16_t apic_id_limit);
void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg);
void fw_cfg_build_smbios(PCMachineState *ms, FWCfgState *fw_cfg);
void fw_cfg_build_feature_control(MachineState *ms, FWCfgState *fw_cfg);
void fw_cfg_add_acpi_dsdt(Aml *scope, FWCfgState *fw_cfg);

View File

@@ -27,6 +27,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
'port92.c'))
i386_ss.add(when: 'CONFIG_X86_FW_OVMF', if_true: files('pc_sysfw_ovmf.c'),
if_false: files('pc_sysfw_ovmf-stubs.c'))
i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c', 'tdvf-hob.c'))
subdir('kvm')
subdir('xen')

View File

@@ -175,7 +175,7 @@ static void microvm_devices_init(MicrovmMachineState *mms)
&error_abort);
isa_bus_register_input_irqs(isa_bus, x86ms->gsi);
ioapic_init_gsi(gsi_state, "machine");
ioapic_init_gsi(gsi_state, OBJECT(mms));
if (ioapics > 1) {
x86ms->ioapic2 = ioapic_init_secondary(gsi_state);
}

View File

@@ -43,6 +43,7 @@
#include "sysemu/xen.h"
#include "sysemu/reset.h"
#include "kvm/kvm_i386.h"
#include "kvm/tdx.h"
#include "hw/xen/xen.h"
#include "qapi/qmp/qlist.h"
#include "qemu/error-report.h"
@@ -692,22 +693,13 @@ void pc_machine_done(Notifier *notifier, void *data)
acpi_setup();
if (x86ms->fw_cfg) {
fw_cfg_build_smbios(MACHINE(pcms), x86ms->fw_cfg);
fw_cfg_build_smbios(pcms, x86ms->fw_cfg);
fw_cfg_build_feature_control(MACHINE(pcms), x86ms->fw_cfg);
/* update FW_CFG_NB_CPUS to account for -device added CPUs */
fw_cfg_modify_i16(x86ms->fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
}
}
void pc_guest_info_init(PCMachineState *pcms)
{
X86MachineState *x86ms = X86_MACHINE(pcms);
x86ms->apic_xrupt_override = true;
pcms->machine_done.notify = pc_machine_done;
qemu_add_machine_init_done_notifier(&pcms->machine_done);
}
/* setup pci memory address space mapping into system address space */
void pc_pci_as_mapping_init(MemoryRegion *system_memory,
MemoryRegion *pci_address_space)
@@ -1036,16 +1028,18 @@ void pc_memory_init(PCMachineState *pcms,
/* Initialize PC system firmware */
pc_system_firmware_init(pcms, rom_memory);
option_rom_mr = g_malloc(sizeof(*option_rom_mr));
memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
&error_fatal);
if (pcmc->pci_enabled) {
memory_region_set_readonly(option_rom_mr, true);
if (!is_tdx_vm()) {
option_rom_mr = g_malloc(sizeof(*option_rom_mr));
memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
&error_fatal);
if (pcmc->pci_enabled) {
memory_region_set_readonly(option_rom_mr, true);
}
memory_region_add_subregion_overlap(rom_memory,
PC_ROM_MIN_VGA,
option_rom_mr,
1);
}
memory_region_add_subregion_overlap(rom_memory,
PC_ROM_MIN_VGA,
option_rom_mr,
1);
fw_cfg = fw_cfg_arch_create(machine,
x86ms->boot_cpus, x86ms->apic_id_limit);
@@ -1753,6 +1747,9 @@ static void pc_machine_initfn(Object *obj)
object_property_add_alias(OBJECT(pcms), "pcspk-audiodev",
OBJECT(pcms->pcspk), "audiodev");
cxl_machine_init(obj, &pcms->cxl_devices_state);
pcms->machine_done.notify = pc_machine_done;
qemu_add_machine_init_done_notifier(&pcms->machine_done);
}
int pc_machine_kvm_type(MachineState *machine, const char *kvm_type)
@@ -1806,6 +1803,7 @@ static bool pc_hotplug_allowed(MachineState *ms, DeviceState *dev, Error **errp)
static void pc_machine_class_init(ObjectClass *oc, void *data)
{
MachineClass *mc = MACHINE_CLASS(oc);
X86MachineClass *x86mc = X86_MACHINE_CLASS(oc);
PCMachineClass *pcmc = PC_MACHINE_CLASS(oc);
HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
@@ -1825,6 +1823,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
pcmc->pvh_enabled = true;
pcmc->kvmclock_create_always = true;
pcmc->resizable_acpi_blob = true;
x86mc->apic_xrupt_override = true;
assert(!mc->get_hotplug_handler);
mc->get_hotplug_handler = pc_get_hotplug_handler;
mc->hotplug_allowed = pc_hotplug_allowed;

View File

@@ -36,7 +36,6 @@
#include "hw/rtc/mc146818rtc.h"
#include "hw/southbridge/piix.h"
#include "hw/display/ramfb.h"
#include "hw/firmware/smbios.h"
#include "hw/pci/pci.h"
#include "hw/pci/pci_ids.h"
#include "hw/usb.h"
@@ -112,6 +111,7 @@ static void pc_init1(MachineState *machine,
X86MachineState *x86ms = X86_MACHINE(machine);
MemoryRegion *system_memory = get_system_memory();
MemoryRegion *system_io = get_system_io();
Object *phb = NULL;
PCIBus *pci_bus = NULL;
ISABus *isa_bus;
Object *piix4_pm = NULL;
@@ -195,8 +195,6 @@ static void pc_init1(MachineState *machine,
}
if (pcmc->pci_enabled) {
Object *phb;
pci_memory = g_new(MemoryRegion, 1);
memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
rom_memory = pci_memory;
@@ -230,17 +228,6 @@ static void pc_init1(MachineState *machine,
&error_abort);
}
pc_guest_info_init(pcms);
if (pcmc->smbios_defaults) {
MachineClass *mc = MACHINE_GET_CLASS(machine);
/* These values are guest ABI, do not change */
smbios_set_defaults("QEMU", mc->desc,
mc->name, pcmc->smbios_legacy_mode,
pcmc->smbios_uuid_encoded,
pcms->smbios_entry_point_type);
}
/* allocate ram and load rom/bios */
if (!xen_enabled()) {
pc_memory_init(pcms, system_memory, rom_memory, hole64_size);
@@ -323,8 +310,8 @@ static void pc_init1(MachineState *machine,
pc_i8259_create(isa_bus, gsi_state->i8259_irq);
}
if (pcmc->pci_enabled) {
ioapic_init_gsi(gsi_state, "i440fx");
if (phb) {
ioapic_init_gsi(gsi_state, phb);
}
if (tcg_enabled()) {
@@ -344,11 +331,8 @@ static void pc_init1(MachineState *machine,
pc_nic_init(pcmc, isa_bus, pci_bus, pcms->xenbus);
if (pcmc->pci_enabled) {
pc_cmos_init(pcms, idebus[0], idebus[1], rtc_state);
}
#ifdef CONFIG_IDE_ISA
else {
if (!pcmc->pci_enabled) {
DriveInfo *hd[MAX_IDE_BUS * MAX_IDE_DEVS];
int i;
@@ -366,10 +350,11 @@ static void pc_init1(MachineState *machine,
busname[4] = '0' + i;
idebus[i] = qdev_get_child_bus(DEVICE(dev), busname);
}
pc_cmos_init(pcms, idebus[0], idebus[1], rtc_state);
}
#endif
pc_cmos_init(pcms, idebus[0], idebus[1], rtc_state);
if (piix4_pm) {
smi_irq = qemu_allocate_irq(pc_acpi_smi_interrupt, first_cpu, 0);

View File

@@ -45,7 +45,6 @@
#include "hw/i386/amd_iommu.h"
#include "hw/i386/intel_iommu.h"
#include "hw/display/ramfb.h"
#include "hw/firmware/smbios.h"
#include "hw/ide/pci.h"
#include "hw/ide/ahci.h"
#include "hw/intc/ioapic.h"
@@ -130,8 +129,7 @@ static void pc_q35_init(MachineState *machine)
ISADevice *rtc_state;
MemoryRegion *system_memory = get_system_memory();
MemoryRegion *system_io = get_system_io();
MemoryRegion *pci_memory;
MemoryRegion *rom_memory;
MemoryRegion *pci_memory = g_new(MemoryRegion, 1);
GSIState *gsi_state;
ISABus *isa_bus;
int i;
@@ -143,6 +141,8 @@ static void pc_q35_init(MachineState *machine)
bool keep_pci_slot_hpc;
uint64_t pci_hole64_size = 0;
assert(pcmc->pci_enabled);
/* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
* and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
* also known as MMCFG).
@@ -189,37 +189,16 @@ static void pc_q35_init(MachineState *machine)
kvmclock_create(pcmc->kvmclock_create_always);
}
/* pci enabled */
if (pcmc->pci_enabled) {
pci_memory = g_new(MemoryRegion, 1);
memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
rom_memory = pci_memory;
} else {
pci_memory = NULL;
rom_memory = system_memory;
}
pc_guest_info_init(pcms);
if (pcmc->smbios_defaults) {
/* These values are guest ABI, do not change */
smbios_set_defaults("QEMU", mc->desc,
mc->name, pcmc->smbios_legacy_mode,
pcmc->smbios_uuid_encoded,
pcms->smbios_entry_point_type);
}
/* create pci host bus */
phb = OBJECT(qdev_new(TYPE_Q35_HOST_DEVICE));
if (pcmc->pci_enabled) {
pci_hole64_size = object_property_get_uint(phb,
PCI_HOST_PROP_PCI_HOLE64_SIZE,
&error_abort);
}
pci_hole64_size = object_property_get_uint(phb,
PCI_HOST_PROP_PCI_HOLE64_SIZE,
&error_abort);
/* allocate ram and load rom/bios */
pc_memory_init(pcms, system_memory, rom_memory, pci_hole64_size);
memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
pc_memory_init(pcms, system_memory, pci_memory, pci_hole64_size);
object_property_add_child(OBJECT(machine), "q35", phb);
object_property_set_link(phb, PCI_HOST_PROP_RAM_MEM,
@@ -236,6 +215,8 @@ static void pc_q35_init(MachineState *machine)
x86ms->above_4g_mem_size, NULL);
object_property_set_bool(phb, PCI_HOST_BYPASS_IOMMU,
pcms->default_bus_bypass_iommu, NULL);
object_property_set_bool(phb, PCI_HOST_PROP_SMM_RANGES,
x86_machine_is_smm_enabled(x86ms), NULL);
sysbus_realize_and_unref(SYS_BUS_DEVICE(phb), &error_fatal);
/* pci */
@@ -243,14 +224,14 @@ static void pc_q35_init(MachineState *machine)
pcms->bus = host_bus;
/* irq lines */
gsi_state = pc_gsi_create(&x86ms->gsi, pcmc->pci_enabled);
gsi_state = pc_gsi_create(&x86ms->gsi, true);
/* create ISA bus */
lpc = pci_new_multifunction(PCI_DEVFN(ICH9_LPC_DEV, ICH9_LPC_FUNC),
TYPE_ICH9_LPC_DEVICE);
qdev_prop_set_bit(DEVICE(lpc), "smm-enabled",
x86_machine_is_smm_enabled(x86ms));
lpc_dev = DEVICE(lpc);
qdev_prop_set_bit(lpc_dev, "smm-enabled",
x86_machine_is_smm_enabled(x86ms));
for (i = 0; i < IOAPIC_NUM_PINS; i++) {
qdev_connect_gpio_out_named(lpc_dev, ICH9_GPIO_GSI, i, x86ms->gsi[i]);
}
@@ -286,9 +267,7 @@ static void pc_q35_init(MachineState *machine)
pc_i8259_create(isa_bus, gsi_state->i8259_irq);
}
if (pcmc->pci_enabled) {
ioapic_init_gsi(gsi_state, "q35");
}
ioapic_init_gsi(gsi_state, OBJECT(phb));
if (tcg_enabled()) {
x86_register_ferr_irq(x86ms->gsi[13]);
@@ -412,10 +391,6 @@ static void pc_q35_8_0_machine_options(MachineClass *m)
pc_q35_8_1_machine_options(m);
compat_props_add(m->compat_props, hw_compat_8_0, hw_compat_8_0_len);
compat_props_add(m->compat_props, pc_compat_8_0, pc_compat_8_0_len);
/* For pc-q35-8.0 and older, use SMBIOS 2.8 by default */
pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_32;
m->max_cpus = 288;
}
DEFINE_Q35_MACHINE(v8_0, "pc-q35-8.0", NULL,
@@ -448,6 +423,10 @@ static void pc_q35_7_0_machine_options(MachineClass *m)
pcmc->enforce_amd_1tb_hole = false;
compat_props_add(m->compat_props, hw_compat_7_0, hw_compat_7_0_len);
compat_props_add(m->compat_props, pc_compat_7_0, pc_compat_7_0_len);
/* For pc-q35-7.0 and older, use SMBIOS 2.8 by default */
pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_32;
m->max_cpus = 288;
}
DEFINE_Q35_MACHINE(v7_0, "pc-q35-7.0", NULL,

View File

@@ -37,6 +37,7 @@
#include "hw/block/flash.h"
#include "sysemu/kvm.h"
#include "sev.h"
#include "kvm/tdx.h"
#define FLASH_SECTOR_SIZE 4096
@@ -107,17 +108,15 @@ void pc_system_flash_cleanup_unused(PCMachineState *pcms)
{
char *prop_name;
int i;
Object *dev_obj;
assert(PC_MACHINE_GET_CLASS(pcms)->pci_enabled);
for (i = 0; i < ARRAY_SIZE(pcms->flash); i++) {
dev_obj = OBJECT(pcms->flash[i]);
if (!object_property_get_bool(dev_obj, "realized", &error_abort)) {
if (!qdev_is_realized(DEVICE(pcms->flash[i]))) {
prop_name = g_strdup_printf("pflash%d", i);
object_property_del(OBJECT(pcms), prop_name);
g_free(prop_name);
object_unparent(dev_obj);
object_unparent(OBJECT(pcms->flash[i]));
pcms->flash[i] = NULL;
}
}
@@ -265,5 +264,11 @@ void x86_firmware_configure(void *ptr, int size)
}
sev_encrypt_flash(ptr, size, &error_fatal);
} else if (is_tdx_vm()) {
ret = tdx_parse_tdvf(ptr, size);
if (ret) {
error_report("failed to parse TDVF for TDX VM");
exit(1);
}
}
}

147
hw/i386/tdvf-hob.c Normal file
View File

@@ -0,0 +1,147 @@
/*
* SPDX-License-Identifier: GPL-2.0-or-later
* Copyright (c) 2020 Intel Corporation
* Author: Isaku Yamahata <isaku.yamahata at gmail.com>
* <isaku.yamahata at intel.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
* You should have received a copy of the GNU General Public License along
* with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "qemu/osdep.h"
#include "qemu/log.h"
#include "qemu/error-report.h"
#include "e820_memory_layout.h"
#include "hw/i386/pc.h"
#include "hw/i386/x86.h"
#include "hw/pci/pcie_host.h"
#include "sysemu/kvm.h"
#include "standard-headers/uefi/uefi.h"
#include "tdvf-hob.h"
typedef struct TdvfHob {
hwaddr hob_addr;
void *ptr;
int size;
/* working area */
void *current;
void *end;
} TdvfHob;
static uint64_t tdvf_current_guest_addr(const TdvfHob *hob)
{
return hob->hob_addr + (hob->current - hob->ptr);
}
static void tdvf_align(TdvfHob *hob, size_t align)
{
hob->current = QEMU_ALIGN_PTR_UP(hob->current, align);
}
static void *tdvf_get_area(TdvfHob *hob, uint64_t size)
{
void *ret;
if (hob->current + size > hob->end) {
error_report("TD_HOB overrun, size = 0x%" PRIx64, size);
exit(1);
}
ret = hob->current;
hob->current += size;
tdvf_align(hob, 8);
return ret;
}
static void tdvf_hob_add_memory_resources(TdxGuest *tdx, TdvfHob *hob)
{
EFI_HOB_RESOURCE_DESCRIPTOR *region;
EFI_RESOURCE_ATTRIBUTE_TYPE attr;
EFI_RESOURCE_TYPE resource_type;
TdxRamEntry *e;
int i;
for (i = 0; i < tdx->nr_ram_entries; i++) {
e = &tdx->ram_entries[i];
if (e->type == TDX_RAM_UNACCEPTED) {
resource_type = EFI_RESOURCE_MEMORY_UNACCEPTED;
attr = EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED;
} else if (e->type == TDX_RAM_ADDED){
resource_type = EFI_RESOURCE_SYSTEM_MEMORY;
attr = EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE;
} else {
error_report("unknown TDX_RAM_ENTRY type %d", e->type);
exit(1);
}
region = tdvf_get_area(hob, sizeof(*region));
*region = (EFI_HOB_RESOURCE_DESCRIPTOR) {
.Header = {
.HobType = EFI_HOB_TYPE_RESOURCE_DESCRIPTOR,
.HobLength = cpu_to_le16(sizeof(*region)),
.Reserved = cpu_to_le32(0),
},
.Owner = EFI_HOB_OWNER_ZERO,
.ResourceType = cpu_to_le32(resource_type),
.ResourceAttribute = cpu_to_le32(attr),
.PhysicalStart = cpu_to_le64(e->address),
.ResourceLength = cpu_to_le64(e->length),
};
}
}
void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *td_hob)
{
TdvfHob hob = {
.hob_addr = td_hob->address,
.size = td_hob->size,
.ptr = td_hob->mem_ptr,
.current = td_hob->mem_ptr,
.end = td_hob->mem_ptr + td_hob->size,
};
EFI_HOB_GENERIC_HEADER *last_hob;
EFI_HOB_HANDOFF_INFO_TABLE *hit;
/* Note, Efi{Free}Memory{Bottom,Top} are ignored, leave 'em zeroed. */
hit = tdvf_get_area(&hob, sizeof(*hit));
*hit = (EFI_HOB_HANDOFF_INFO_TABLE) {
.Header = {
.HobType = EFI_HOB_TYPE_HANDOFF,
.HobLength = cpu_to_le16(sizeof(*hit)),
.Reserved = cpu_to_le32(0),
},
.Version = cpu_to_le32(EFI_HOB_HANDOFF_TABLE_VERSION),
.BootMode = cpu_to_le32(0),
.EfiMemoryTop = cpu_to_le64(0),
.EfiMemoryBottom = cpu_to_le64(0),
.EfiFreeMemoryTop = cpu_to_le64(0),
.EfiFreeMemoryBottom = cpu_to_le64(0),
.EfiEndOfHobList = cpu_to_le64(0), /* initialized later */
};
tdvf_hob_add_memory_resources(tdx, &hob);
last_hob = tdvf_get_area(&hob, sizeof(*last_hob));
*last_hob = (EFI_HOB_GENERIC_HEADER) {
.HobType = EFI_HOB_TYPE_END_OF_HOB_LIST,
.HobLength = cpu_to_le16(sizeof(*last_hob)),
.Reserved = cpu_to_le32(0),
};
hit->EfiEndOfHobList = tdvf_current_guest_addr(&hob);
}

24
hw/i386/tdvf-hob.h Normal file
View File

@@ -0,0 +1,24 @@
#ifndef HW_I386_TD_HOB_H
#define HW_I386_TD_HOB_H
#include "hw/i386/tdvf.h"
#include "target/i386/kvm/tdx.h"
void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *td_hob);
#define EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE \
(EFI_RESOURCE_ATTRIBUTE_PRESENT | \
EFI_RESOURCE_ATTRIBUTE_INITIALIZED | \
EFI_RESOURCE_ATTRIBUTE_TESTED)
#define EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED \
(EFI_RESOURCE_ATTRIBUTE_PRESENT | \
EFI_RESOURCE_ATTRIBUTE_INITIALIZED | \
EFI_RESOURCE_ATTRIBUTE_TESTED)
#define EFI_RESOURCE_ATTRIBUTE_TDVF_MMIO \
(EFI_RESOURCE_ATTRIBUTE_PRESENT | \
EFI_RESOURCE_ATTRIBUTE_INITIALIZED | \
EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE)
#endif

200
hw/i386/tdvf.c Normal file
View File

@@ -0,0 +1,200 @@
/*
* SPDX-License-Identifier: GPL-2.0-or-later
* Copyright (c) 2020 Intel Corporation
* Author: Isaku Yamahata <isaku.yamahata at gmail.com>
* <isaku.yamahata at intel.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
* You should have received a copy of the GNU General Public License along
* with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "qemu/osdep.h"
#include "qemu/error-report.h"
#include "hw/i386/pc.h"
#include "hw/i386/tdvf.h"
#include "sysemu/kvm.h"
#define TDX_METADATA_OFFSET_GUID "e47a6535-984a-4798-865e-4685a7bf8ec2"
#define TDX_METADATA_VERSION 1
#define TDVF_SIGNATURE 0x46564454 /* TDVF as little endian */
typedef struct {
uint32_t DataOffset;
uint32_t RawDataSize;
uint64_t MemoryAddress;
uint64_t MemoryDataSize;
uint32_t Type;
uint32_t Attributes;
} TdvfSectionEntry;
typedef struct {
uint32_t Signature;
uint32_t Length;
uint32_t Version;
uint32_t NumberOfSectionEntries;
TdvfSectionEntry SectionEntries[];
} TdvfMetadata;
struct tdx_metadata_offset {
uint32_t offset;
};
static TdvfMetadata *tdvf_get_metadata(void *flash_ptr, int size)
{
TdvfMetadata *metadata;
uint32_t offset = 0;
uint8_t *data;
if ((uint32_t) size != size) {
return NULL;
}
if (pc_system_ovmf_table_find(TDX_METADATA_OFFSET_GUID, &data, NULL)) {
offset = size - le32_to_cpu(((struct tdx_metadata_offset *)data)->offset);
if (offset + sizeof(*metadata) > size) {
return NULL;
}
} else {
error_report("Cannot find TDX_METADATA_OFFSET_GUID");
return NULL;
}
metadata = flash_ptr + offset;
/* Finally, verify the signature to determine if this is a TDVF image. */
metadata->Signature = le32_to_cpu(metadata->Signature);
if (metadata->Signature != TDVF_SIGNATURE) {
error_report("Invalid TDVF signature in metadata!");
return NULL;
}
/* Sanity check that the TDVF doesn't overlap its own metadata. */
metadata->Length = le32_to_cpu(metadata->Length);
if (offset + metadata->Length > size) {
return NULL;
}
/* Only version 1 is supported/defined. */
metadata->Version = le32_to_cpu(metadata->Version);
if (metadata->Version != TDX_METADATA_VERSION) {
return NULL;
}
return metadata;
}
static int tdvf_parse_and_check_section_entry(const TdvfSectionEntry *src,
TdxFirmwareEntry *entry)
{
entry->data_offset = le32_to_cpu(src->DataOffset);
entry->data_len = le32_to_cpu(src->RawDataSize);
entry->address = le64_to_cpu(src->MemoryAddress);
entry->size = le64_to_cpu(src->MemoryDataSize);
entry->type = le32_to_cpu(src->Type);
entry->attributes = le32_to_cpu(src->Attributes);
/* sanity check */
if (entry->size < entry->data_len) {
error_report("Broken metadata RawDataSize 0x%x MemoryDataSize 0x%lx",
entry->data_len, entry->size);
return -1;
}
if (!QEMU_IS_ALIGNED(entry->address, TARGET_PAGE_SIZE)) {
error_report("MemoryAddress 0x%lx not page aligned", entry->address);
return -1;
}
if (!QEMU_IS_ALIGNED(entry->size, TARGET_PAGE_SIZE)) {
error_report("MemoryDataSize 0x%lx not page aligned", entry->size);
return -1;
}
switch (entry->type) {
case TDVF_SECTION_TYPE_BFV:
case TDVF_SECTION_TYPE_CFV:
/* The sections that must be copied from firmware image to TD memory */
if (entry->data_len == 0) {
error_report("%d section with RawDataSize == 0", entry->type);
return -1;
}
break;
case TDVF_SECTION_TYPE_TD_HOB:
case TDVF_SECTION_TYPE_TEMP_MEM:
/* The sections that no need to be copied from firmware image */
if (entry->data_len != 0) {
error_report("%d section with RawDataSize 0x%x != 0",
entry->type, entry->data_len);
return -1;
}
break;
default:
error_report("TDVF contains unsupported section type %d", entry->type);
return -1;
}
return 0;
}
int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size)
{
TdvfSectionEntry *sections;
TdvfMetadata *metadata;
ssize_t entries_size;
uint32_t len, i;
metadata = tdvf_get_metadata(flash_ptr, size);
if (!metadata) {
return -EINVAL;
}
//load and parse metadata entries
fw->nr_entries = le32_to_cpu(metadata->NumberOfSectionEntries);
if (fw->nr_entries < 2) {
error_report("Invalid number of fw entries (%u) in TDVF", fw->nr_entries);
return -EINVAL;
}
len = le32_to_cpu(metadata->Length);
entries_size = fw->nr_entries * sizeof(TdvfSectionEntry);
if (len != sizeof(*metadata) + entries_size) {
error_report("TDVF metadata len (0x%x) mismatch, expected (0x%x)",
len, (uint32_t)(sizeof(*metadata) + entries_size));
return -EINVAL;
}
fw->entries = g_new(TdxFirmwareEntry, fw->nr_entries);
sections = g_new(TdvfSectionEntry, fw->nr_entries);
if (!memcpy(sections, (void *)metadata + sizeof(*metadata), entries_size)) {
error_report("Failed to read TDVF section entries");
goto err;
}
for (i = 0; i < fw->nr_entries; i++) {
if (tdvf_parse_and_check_section_entry(&sections[i], &fw->entries[i])) {
goto err;
}
}
g_free(sections);
fw->mem_ptr = flash_ptr;
return 0;
err:
g_free(sections);
fw->entries = 0;
g_free(fw->entries);
return -EINVAL;
}

View File

@@ -47,6 +47,7 @@
#include "hw/intc/i8259.h"
#include "hw/rtc/mc146818rtc.h"
#include "target/i386/sev.h"
#include "kvm/tdx.h"
#include "hw/acpi/cpu_hotplug.h"
#include "hw/irq.h"
@@ -636,20 +637,19 @@ void gsi_handler(void *opaque, int n, int level)
}
}
void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name)
void ioapic_init_gsi(GSIState *gsi_state, Object *parent)
{
DeviceState *dev;
SysBusDevice *d;
unsigned int i;
assert(parent_name);
assert(parent);
if (kvm_ioapic_in_kernel()) {
dev = qdev_new(TYPE_KVM_IOAPIC);
} else {
dev = qdev_new(TYPE_IOAPIC);
}
object_property_add_child(object_resolve_path(parent_name, NULL),
"ioapic", OBJECT(dev));
object_property_add_child(parent, "ioapic", OBJECT(dev));
d = SYS_BUS_DEVICE(dev);
sysbus_realize_and_unref(d, &error_fatal);
sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
@@ -1154,9 +1154,17 @@ void x86_bios_rom_init(MachineState *ms, const char *default_firmware,
(bios_size % 65536) != 0) {
goto bios_error;
}
bios = g_malloc(sizeof(*bios));
memory_region_init_ram(bios, NULL, "pc.bios", bios_size, &error_fatal);
if (sev_enabled()) {
if (is_tdx_vm()) {
memory_region_init_ram_guest_memfd(bios, NULL, "pc.bios", bios_size,
&error_fatal);
tdx_set_tdvf_region(bios);
} else {
memory_region_init_ram(bios, NULL, "pc.bios", bios_size, &error_fatal);
}
if (sev_enabled() || is_tdx_vm()) {
/*
* The concept of a "reset" simply doesn't exist for
* confidential computing guests, we have to destroy and
@@ -1178,17 +1186,20 @@ void x86_bios_rom_init(MachineState *ms, const char *default_firmware,
}
g_free(filename);
/* map the last 128KB of the BIOS in ISA space */
isa_bios_size = MIN(bios_size, 128 * KiB);
isa_bios = g_malloc(sizeof(*isa_bios));
memory_region_init_alias(isa_bios, NULL, "isa-bios", bios,
bios_size - isa_bios_size, isa_bios_size);
memory_region_add_subregion_overlap(rom_memory,
0x100000 - isa_bios_size,
isa_bios,
1);
if (!isapc_ram_fw) {
memory_region_set_readonly(isa_bios, true);
/* For TDX, alias different GPAs to same private memory is not supported */
if (!is_tdx_vm()) {
/* map the last 128KB of the BIOS in ISA space */
isa_bios_size = MIN(bios_size, 128 * KiB);
isa_bios = g_malloc(sizeof(*isa_bios));
memory_region_init_alias(isa_bios, NULL, "isa-bios", bios,
bios_size - isa_bios_size, isa_bios_size);
memory_region_add_subregion_overlap(rom_memory,
0x100000 - isa_bios_size,
isa_bios,
1);
if (!isapc_ram_fw) {
memory_region_set_readonly(isa_bios, true);
}
}
/* map all the bios at the top of memory */
@@ -1386,6 +1397,17 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name,
qapi_free_SgxEPCList(list);
}
static int x86_kvm_type(MachineState *ms, const char *vm_type)
{
X86MachineState *x86ms = X86_MACHINE(ms);
int kvm_type;
kvm_type = kvm_enabled() ? kvm_get_vm_type(ms, vm_type) : 0;
x86ms->vm_type = kvm_type;
return kvm_type;
}
static void x86_machine_initfn(Object *obj)
{
X86MachineState *x86ms = X86_MACHINE(obj);
@@ -1399,6 +1421,7 @@ static void x86_machine_initfn(Object *obj)
x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
x86ms->bus_lock_ratelimit = 0;
x86ms->above_4g_mem_start = 4 * GiB;
x86ms->eoi_intercept_unsupported = false;
}
static void x86_machine_class_init(ObjectClass *oc, void *data)
@@ -1410,6 +1433,7 @@ static void x86_machine_class_init(ObjectClass *oc, void *data)
mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
mc->kvm_type = x86_kvm_type;
x86mc->save_tsc_khz = true;
x86mc->fwcfg_dma_enabled = true;
nc->nmi_monitor_handler = x86_nmi;

View File

@@ -476,7 +476,6 @@ static void setup_interrupt(IVShmemState *s, int vector, Error **errp)
static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
{
Error *local_err = NULL;
struct stat buf;
size_t size;
@@ -496,10 +495,9 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
size = buf.st_size;
/* mmap the region and map into the BAR2 */
memory_region_init_ram_from_fd(&s->server_bar2, OBJECT(s), "ivshmem.bar2",
size, RAM_SHARED, fd, 0, &local_err);
if (local_err) {
error_propagate(errp, local_err);
if (!memory_region_init_ram_from_fd(&s->server_bar2, OBJECT(s),
"ivshmem.bar2", size, RAM_SHARED,
fd, 0, errp)) {
return;
}

View File

@@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
}
static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
uint32_t val, int len)
static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
{
NvmeCtrl *n = NVME(dev);
NvmeSecCtrlEntry *sctrl;
uint16_t sriov_cap = dev->exp.sriov_cap;
uint32_t off = address - sriov_cap;
int i, num_vfs;
int i;
if (!sriov_cap) {
return;
}
if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
if (!(val & PCI_SRIOV_CTRL_VFE)) {
num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
for (i = 0; i < num_vfs; i++) {
sctrl = &n->sec_ctrl_list.sec[i];
nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
}
}
for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
sctrl = &n->sec_ctrl_list.sec[i];
nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
}
}
static void nvme_pci_write_config(PCIDevice *dev, uint32_t address,
uint32_t val, int len)
{
nvme_sriov_pre_write_ctrl(dev, address, val, len);
uint16_t old_num_vfs = pcie_sriov_num_vfs(dev);
pci_default_write_config(dev, address, val, len);
pcie_cap_flr_write_config(dev, address, val, len);
nvme_sriov_post_write_config(dev, old_num_vfs);
}
static const VMStateDescription nvme_vmstate = {

View File

@@ -336,12 +336,9 @@ static void nrf51_nvm_init(Object *obj)
static void nrf51_nvm_realize(DeviceState *dev, Error **errp)
{
NRF51NVMState *s = NRF51_NVM(dev);
Error *err = NULL;
memory_region_init_rom_device(&s->flash, OBJECT(dev), &flash_ops, s,
"nrf51_soc.flash", s->flash_size, &err);
if (err) {
error_propagate(errp, err);
if (!memory_region_init_rom_device(&s->flash, OBJECT(dev), &flash_ops, s,
"nrf51_soc.flash", s->flash_size, errp)) {
return;
}

View File

@@ -179,6 +179,8 @@ static Property q35_host_props[] = {
mch.below_4g_mem_size, 0),
DEFINE_PROP_SIZE(PCI_HOST_ABOVE_4G_MEM_SIZE, Q35PCIHost,
mch.above_4g_mem_size, 0),
DEFINE_PROP_BOOL(PCI_HOST_PROP_SMM_RANGES, Q35PCIHost,
mch.has_smm_ranges, true),
DEFINE_PROP_BOOL("x-pci-hole64-fix", Q35PCIHost, pci_hole64_fix, true),
DEFINE_PROP_END_OF_LIST(),
};
@@ -214,6 +216,7 @@ static void q35_host_initfn(Object *obj)
/* mch's object_initialize resets the default value, set it again */
qdev_prop_set_uint64(DEVICE(s), PCI_HOST_PROP_PCI_HOLE64_SIZE,
Q35_PCI_HOST_HOLE64_SIZE_DEFAULT);
object_property_add(obj, PCI_HOST_PROP_PCI_HOLE_START, "uint32",
q35_host_get_pci_hole_start,
NULL, NULL, NULL);
@@ -476,6 +479,10 @@ static void mch_write_config(PCIDevice *d,
mch_update_pciexbar(mch);
}
if (!mch->has_smm_ranges) {
return;
}
if (ranges_overlap(address, len, MCH_HOST_BRIDGE_SMRAM,
MCH_HOST_BRIDGE_SMRAM_SIZE)) {
mch_update_smram(mch);
@@ -494,10 +501,13 @@ static void mch_write_config(PCIDevice *d,
static void mch_update(MCHPCIState *mch)
{
mch_update_pciexbar(mch);
mch_update_pam(mch);
mch_update_smram(mch);
mch_update_ext_tseg_mbytes(mch);
mch_update_smbase_smram(mch);
if (mch->has_smm_ranges) {
mch_update_smram(mch);
mch_update_ext_tseg_mbytes(mch);
mch_update_smbase_smram(mch);
}
/*
* pci hole goes from end-of-low-ram to io-apic.
@@ -538,19 +548,21 @@ static void mch_reset(DeviceState *qdev)
pci_set_quad(d->config + MCH_HOST_BRIDGE_PCIEXBAR,
MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT);
d->config[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_DEFAULT;
d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT;
d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK;
d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK;
if (mch->has_smm_ranges) {
d->config[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_DEFAULT;
d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT;
d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK;
d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK;
if (mch->ext_tseg_mbytes > 0) {
pci_set_word(d->config + MCH_HOST_BRIDGE_EXT_TSEG_MBYTES,
MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY);
if (mch->ext_tseg_mbytes > 0) {
pci_set_word(d->config + MCH_HOST_BRIDGE_EXT_TSEG_MBYTES,
MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY);
}
d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
}
d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
mch_update(mch);
}
@@ -568,6 +580,20 @@ static void mch_realize(PCIDevice *d, Error **errp)
/* setup pci memory mapping */
pc_pci_as_mapping_init(mch->system_memory, mch->pci_address_space);
/* PAM */
init_pam(&mch->pam_regions[0], OBJECT(mch), mch->ram_memory,
mch->system_memory, mch->pci_address_space,
PAM_BIOS_BASE, PAM_BIOS_SIZE);
for (i = 0; i < ARRAY_SIZE(mch->pam_regions) - 1; ++i) {
init_pam(&mch->pam_regions[i + 1], OBJECT(mch), mch->ram_memory,
mch->system_memory, mch->pci_address_space,
PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
}
if (!mch->has_smm_ranges) {
return;
}
/* if *disabled* show SMRAM to all CPUs */
memory_region_init_alias(&mch->smram_region, OBJECT(mch), "smram-region",
mch->pci_address_space, MCH_HOST_BRIDGE_SMRAM_C_BASE,
@@ -634,15 +660,6 @@ static void mch_realize(PCIDevice *d, Error **errp)
object_property_add_const_link(qdev_get_machine(), "smram",
OBJECT(&mch->smram));
init_pam(&mch->pam_regions[0], OBJECT(mch), mch->ram_memory,
mch->system_memory, mch->pci_address_space,
PAM_BIOS_BASE, PAM_BIOS_SIZE);
for (i = 0; i < ARRAY_SIZE(mch->pam_regions) - 1; ++i) {
init_pam(&mch->pam_regions[i + 1], OBJECT(mch), mch->ram_memory,
mch->system_memory, mch->pci_address_space,
PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
}
}
uint64_t mch_mcfg_base(void)

View File

@@ -345,8 +345,10 @@ static void raven_realize(PCIDevice *d, Error **errp)
d->config[PCI_LATENCY_TIMER] = 0x10;
d->config[PCI_CAPABILITY_LIST] = 0x00;
memory_region_init_rom_nomigrate(&s->bios, OBJECT(s), "bios", BIOS_SIZE,
&error_fatal);
if (!memory_region_init_rom_nomigrate(&s->bios, OBJECT(s), "bios",
BIOS_SIZE, errp)) {
return;
}
memory_region_add_subregion(get_system_memory(), (uint32_t)(-BIOS_SIZE),
&s->bios);
if (s->bios_name) {

View File

@@ -15,7 +15,6 @@
#include "sysemu/kvm.h"
#include "migration/blocker.h"
#include "exec/confidential-guest-support.h"
#include "hw/ppc/pef.h"
#define TYPE_PEF_GUEST "pef-guest"
OBJECT_DECLARE_SIMPLE_TYPE(PefGuest, PEF_GUEST)
@@ -93,7 +92,7 @@ static int kvmppc_svm_off(Error **errp)
#endif
}
int pef_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
static int pef_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
{
if (!object_dynamic_cast(OBJECT(cgs), TYPE_PEF_GUEST)) {
return 0;
@@ -107,7 +106,7 @@ int pef_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
return kvmppc_svm_init(cgs, errp);
}
int pef_kvm_reset(ConfidentialGuestSupport *cgs, Error **errp)
static int pef_kvm_reset(ConfidentialGuestSupport *cgs, Error **errp)
{
if (!object_dynamic_cast(OBJECT(cgs), TYPE_PEF_GUEST)) {
return 0;
@@ -131,6 +130,10 @@ OBJECT_DEFINE_TYPE_WITH_INTERFACES(PefGuest,
static void pef_guest_class_init(ObjectClass *oc, void *data)
{
ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
klass->kvm_init = pef_kvm_init;
klass->kvm_reset = pef_kvm_reset;
}
static void pef_guest_init(Object *obj)

View File

@@ -143,7 +143,6 @@ static void rs6000mc_realize(DeviceState *dev, Error **errp)
RS6000MCState *s = RS6000MC(dev);
int socket = 0;
unsigned int ram_size = s->ram_size / MiB;
Error *local_err = NULL;
while (socket < 6) {
if (ram_size >= 64) {
@@ -165,10 +164,8 @@ static void rs6000mc_realize(DeviceState *dev, Error **errp)
if (s->simm_size[socket]) {
char name[] = "simm.?";
name[5] = socket + '0';
memory_region_init_ram(&s->simm[socket], OBJECT(dev), name,
s->simm_size[socket] * MiB, &local_err);
if (local_err) {
error_propagate(errp, local_err);
if (!memory_region_init_ram(&s->simm[socket], OBJECT(dev), name,
s->simm_size[socket] * MiB, errp)) {
return;
}
memory_region_add_subregion_overlap(get_system_memory(), 0,

View File

@@ -74,6 +74,7 @@
#include "hw/virtio/vhost-scsi-common.h"
#include "exec/ram_addr.h"
#include "exec/confidential-guest-support.h"
#include "hw/usb.h"
#include "qemu/config-file.h"
#include "qemu/error-report.h"
@@ -86,7 +87,6 @@
#include "hw/ppc/spapr_tpm_proxy.h"
#include "hw/ppc/spapr_nvdimm.h"
#include "hw/ppc/spapr_numa.h"
#include "hw/ppc/pef.h"
#include "monitor/monitor.h"
@@ -1687,7 +1687,9 @@ static void spapr_machine_reset(MachineState *machine, ShutdownCause reason)
qemu_guest_getrandom_nofail(spapr->fdt_rng_seed, 32);
}
pef_kvm_reset(machine->cgs, &error_fatal);
if (machine->cgs) {
confidential_guest_kvm_reset(machine->cgs, &error_fatal);
}
spapr_caps_apply(spapr);
first_ppc_cpu = POWERPC_CPU(first_cpu);
@@ -2810,7 +2812,9 @@ static void spapr_machine_init(MachineState *machine)
/*
* if Secure VM (PEF) support is configured, then initialize it
*/
pef_kvm_init(machine->cgs, &error_fatal);
if (machine->cgs) {
confidential_guest_kvm_init(machine->cgs, &error_fatal);
}
msi_nonbroken = true;
@@ -4647,13 +4651,10 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
mc->block_default_type = IF_SCSI;
/*
* Setting max_cpus to INT32_MAX. Both KVM and TCG max_cpus values
* should be limited by the host capability instead of hardcoded.
* max_cpus for KVM guests will be checked in kvm_init(), and TCG
* guests are welcome to have as many CPUs as the host are capable
* of emulate.
* While KVM determines max cpus in kvm_init() using kvm_max_vcpus(),
* In TCG the limit is restricted by the range of CPU IPIs available.
*/
mc->max_cpus = INT32_MAX;
mc->max_cpus = SPAPR_IRQ_NR_IPIS;
mc->no_parallel = 1;
mc->default_boot_order = "";

View File

@@ -23,6 +23,8 @@
#include "trace.h"
QEMU_BUILD_BUG_ON(SPAPR_IRQ_NR_IPIS > SPAPR_XIRQ_BASE);
static const TypeInfo spapr_intc_info = {
.name = TYPE_SPAPR_INTC,
.parent = TYPE_INTERFACE,
@@ -329,7 +331,7 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
int i;
dev = qdev_new(TYPE_SPAPR_XIVE);
qdev_prop_set_uint32(dev, "nr-irqs", smc->nr_xirqs + SPAPR_XIRQ_BASE);
qdev_prop_set_uint32(dev, "nr-irqs", smc->nr_xirqs + SPAPR_IRQ_NR_IPIS);
/*
* 8 XIVE END structures per CPU. One for each available
* priority
@@ -356,7 +358,7 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
}
spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
smc->nr_xirqs + SPAPR_XIRQ_BASE);
smc->nr_xirqs + SPAPR_IRQ_NR_IPIS);
/*
* Mostly we don't actually need this until reset, except that not

View File

@@ -14,6 +14,7 @@
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "exec/ram_addr.h"
#include "exec/confidential-guest-support.h"
#include "hw/s390x/s390-virtio-hcall.h"
#include "hw/s390x/sclp.h"
#include "hw/s390x/s390_flic.h"
@@ -267,7 +268,9 @@ static void ccw_init(MachineState *machine)
s390_init_cpus(machine);
/* Need CPU model to be determined before we can set up PV */
s390_pv_init(machine->cgs, &error_fatal);
if (machine->cgs) {
confidential_guest_kvm_init(machine->cgs, &error_fatal);
}
s390_flic_init();

View File

@@ -1928,7 +1928,7 @@ static void megasas_command_cancelled(SCSIRequest *req)
{
MegasasCmd *cmd = req->hba_private;
if (!cmd) {
if (!cmd || !cmd->frame) {
return;
}
cmd->frame->header.cmd_status = MFI_STAT_SCSI_IO_FAILED;

View File

@@ -327,11 +327,17 @@ static void scsi_read_complete(void * opaque, int ret)
if (r->req.cmd.buf[0] == READ_CAPACITY_10 &&
(ldl_be_p(&r->buf[0]) != 0xffffffffU || s->max_lba == 0)) {
s->blocksize = ldl_be_p(&r->buf[4]);
s->max_lba = ldl_be_p(&r->buf[0]) & 0xffffffffULL;
BlockBackend *blk = s->conf.blk;
BlockDriverState *bs = blk_bs(blk);
s->max_lba = bs->total_sectors - 1;
stl_be_p(&r->buf[0], s->max_lba);
} else if (r->req.cmd.buf[0] == SERVICE_ACTION_IN_16 &&
(r->req.cmd.buf[1] & 31) == SAI_READ_CAPACITY_16) {
s->blocksize = ldl_be_p(&r->buf[8]);
s->max_lba = ldq_be_p(&r->buf[0]);
BlockBackend *blk = s->conf.blk;
BlockDriverState *bs = blk_bs(blk);
s->max_lba = bs->total_sectors - 1;
stq_be_p(&r->buf[0], s->max_lba);
}
/*
@@ -396,7 +402,10 @@ static void scsi_write_complete(void * opaque, int ret)
assert(r->req.aiocb != NULL);
r->req.aiocb = NULL;
if (ret || r->req.io_canceled) {
if (ret || r->req.io_canceled ||
r->io_header.status != SCSI_HOST_OK ||
(r->io_header.driver_status & SG_ERR_DRIVER_TIMEOUT) ||
r->io_header.status != GOOD) {
scsi_command_complete_noio(r, ret);
goto done;
}

View File

@@ -1263,6 +1263,7 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
struct smbios_structure_header *header;
int size;
struct smbios_table *table; /* legacy mode only */
uint8_t *dbl_nulls, *orig_end;
if (!qemu_opts_validate(opts, qemu_smbios_file_opts, errp)) {
return;
@@ -1275,11 +1276,21 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
}
/*
* NOTE: standard double '\0' terminator expected, per smbios spec.
* (except in legacy mode, where the second '\0' is implicit and
* will be inserted by the BIOS).
* NOTE: standard double '\0' terminator expected, per smbios spec,
* unless the data is formatted for legacy mode, which is used by
* pc-i440fx-2.0 and earlier machine types. Legacy mode structures
* without strings have no '\0' terminators, and those with strings
* also don't have an additional '\0' terminator at the end of the
* final string '\0' terminator. The BIOS will add the '\0' terminators
* to comply with the smbios spec.
* For greater compatibility, regardless of the machine type used,
* either format is accepted.
*/
smbios_tables = g_realloc(smbios_tables, smbios_tables_len + size);
smbios_tables = g_realloc(smbios_tables, smbios_tables_len + size + 2);
orig_end = smbios_tables + smbios_tables_len + size;
/* add extra null bytes to end in case of legacy file data */
*orig_end = '\0';
*(orig_end + 1) = '\0';
header = (struct smbios_structure_header *)(smbios_tables +
smbios_tables_len);
@@ -1297,6 +1308,20 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
}
set_bit(header->type, have_binfile_bitmap);
}
for (dbl_nulls = smbios_tables + smbios_tables_len + header->length;
dbl_nulls + 2 <= orig_end; dbl_nulls++) {
if (*dbl_nulls == '\0' && *(dbl_nulls + 1) == '\0') {
break;
}
}
if (dbl_nulls + 2 < orig_end) {
error_setg(errp, "SMBIOS file data malformed");
return;
}
/* increase size by how many extra nulls were actually needed */
size += dbl_nulls + 2 - orig_end;
smbios_tables = g_realloc(smbios_tables, smbios_tables_len + size);
set_bit(header->type, have_binfile_bitmap);
if (header->type == 4) {
smbios_type4_count++;
@@ -1316,6 +1341,17 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
* delete the one we don't need from smbios_set_defaults(),
* once we know which machine version has been requested.
*/
if (dbl_nulls + 2 == orig_end) {
/* chop off nulls to get legacy format */
if (header->length + 2 == size) {
size -= 2;
} else {
size -= 1;
}
} else {
/* undo conversion from legacy format to per-spec format */
size -= dbl_nulls + 2 - orig_end;
}
if (!smbios_entries) {
smbios_entries_len = sizeof(uint16_t);
smbios_entries = g_malloc0(smbios_entries_len);

View File

@@ -577,12 +577,9 @@ static void idreg_realize(DeviceState *ds, Error **errp)
{
IDRegState *s = MACIO_ID_REGISTER(ds);
SysBusDevice *dev = SYS_BUS_DEVICE(ds);
Error *local_err = NULL;
memory_region_init_ram_nomigrate(&s->mem, OBJECT(ds), "sun4m.idreg",
sizeof(idreg_data), &local_err);
if (local_err) {
error_propagate(errp, local_err);
if (!memory_region_init_ram_nomigrate(&s->mem, OBJECT(ds), "sun4m.idreg",
sizeof(idreg_data), errp)) {
return;
}
@@ -631,12 +628,9 @@ static void afx_realize(DeviceState *ds, Error **errp)
{
AFXState *s = TCX_AFX(ds);
SysBusDevice *dev = SYS_BUS_DEVICE(ds);
Error *local_err = NULL;
memory_region_init_ram_nomigrate(&s->mem, OBJECT(ds), "sun4m.afx", 4,
&local_err);
if (local_err) {
error_propagate(errp, local_err);
if (!memory_region_init_ram_nomigrate(&s->mem, OBJECT(ds), "sun4m.afx",
4, errp)) {
return;
}
@@ -715,12 +709,9 @@ static void prom_realize(DeviceState *ds, Error **errp)
{
PROMState *s = OPENPROM(ds);
SysBusDevice *dev = SYS_BUS_DEVICE(ds);
Error *local_err = NULL;
memory_region_init_ram_nomigrate(&s->prom, OBJECT(ds), "sun4m.prom",
PROM_SIZE_MAX, &local_err);
if (local_err) {
error_propagate(errp, local_err);
if (!memory_region_init_ram_nomigrate(&s->prom, OBJECT(ds), "sun4m.prom",
PROM_SIZE_MAX, errp)) {
return;
}

View File

@@ -454,12 +454,9 @@ static void prom_realize(DeviceState *ds, Error **errp)
{
PROMState *s = OPENPROM(ds);
SysBusDevice *dev = SYS_BUS_DEVICE(ds);
Error *local_err = NULL;
memory_region_init_ram_nomigrate(&s->prom, OBJECT(ds), "sun4u.prom",
PROM_SIZE_MAX, &local_err);
if (local_err) {
error_propagate(errp, local_err);
if (!memory_region_init_ram_nomigrate(&s->prom, OBJECT(ds), "sun4u.prom",
PROM_SIZE_MAX, errp)) {
return;
}

View File

@@ -605,8 +605,7 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa,
int fd = memory_region_get_fd(&vmem->memdev->mr);
Error *local_err = NULL;
qemu_prealloc_mem(fd, area, size, 1, NULL, &local_err);
if (local_err) {
if (!qemu_prealloc_mem(fd, area, size, 1, NULL, &local_err)) {
static bool warned;
/*
@@ -1249,8 +1248,7 @@ static int virtio_mem_prealloc_range_cb(VirtIOMEM *vmem, void *arg,
int fd = memory_region_get_fd(&vmem->memdev->mr);
Error *local_err = NULL;
qemu_prealloc_mem(fd, area, size, 1, NULL, &local_err);
if (local_err) {
if (!qemu_prealloc_mem(fd, area, size, 1, NULL, &local_err)) {
error_report_err(local_err);
return -ENOMEM;
}

View File

@@ -48,7 +48,7 @@ void hmp_eject(Monitor *mon, const QDict *qdict);
void hmp_qemu_io(Monitor *mon, const QDict *qdict);
void hmp_info_block(Monitor *mon, const QDict *qdict);
void coroutine_fn hmp_info_block(Monitor *mon, const QDict *qdict);
void hmp_info_blockstats(Monitor *mon, const QDict *qdict);
void hmp_info_block_jobs(Monitor *mon, const QDict *qdict);
void hmp_info_snapshots(Monitor *mon, const QDict *qdict);

View File

@@ -25,14 +25,14 @@
#ifndef BLOCK_QAPI_H
#define BLOCK_QAPI_H
#include "block/block-common.h"
#include "block/graph-lock.h"
#include "block/snapshot.h"
#include "qapi/qapi-types-block-core.h"
BlockDeviceInfo * GRAPH_RDLOCK
bdrv_block_device_info(BlockBackend *blk, BlockDriverState *bs,
bool flat, Error **errp);
BlockDeviceInfo *coroutine_fn GRAPH_RDLOCK
bdrv_block_device_info(BlockBackend *blk, BlockDriverState *bs, bool flat,
Error **errp);
int GRAPH_RDLOCK
bdrv_query_snapshot_info_list(BlockDriverState *bs,
SnapshotInfoList **p_list,
@@ -40,7 +40,10 @@ bdrv_query_snapshot_info_list(BlockDriverState *bs,
void GRAPH_RDLOCK
bdrv_query_image_info(BlockDriverState *bs, ImageInfo **p_info, bool flat,
bool skip_implicit_filters, Error **errp);
void GRAPH_RDLOCK
void coroutine_fn GRAPH_RDLOCK
bdrv_co_query_block_graph_info(BlockDriverState *bs, BlockGraphInfo **p_info,
Error **errp);
void co_wrapper_bdrv_rdlock
bdrv_query_block_graph_info(BlockDriverState *bs, BlockGraphInfo **p_info,
Error **errp);

View File

@@ -31,6 +31,7 @@
#define QEMU_AIO_ZONE_REPORT 0x0100
#define QEMU_AIO_ZONE_MGMT 0x0200
#define QEMU_AIO_ZONE_APPEND 0x0400
#define QEMU_AIO_FSTAT 0x0800
#define QEMU_AIO_TYPE_MASK \
(QEMU_AIO_READ | \
QEMU_AIO_WRITE | \
@@ -42,7 +43,8 @@
QEMU_AIO_TRUNCATE | \
QEMU_AIO_ZONE_REPORT | \
QEMU_AIO_ZONE_MGMT | \
QEMU_AIO_ZONE_APPEND)
QEMU_AIO_ZONE_APPEND | \
QEMU_AIO_FSTAT)
/* AIO flags */
#define QEMU_AIO_MISALIGNED 0x1000

View File

@@ -27,6 +27,7 @@
#include "qemu/coroutine.h"
#include "qemu/throttle.h"
#include "block/block_int.h"
#include "qom/object.h"
/* The ThrottleGroupMember structure indicates membership in a ThrottleGroup

View File

@@ -23,7 +23,10 @@
#include "qom/object.h"
#define TYPE_CONFIDENTIAL_GUEST_SUPPORT "confidential-guest-support"
OBJECT_DECLARE_SIMPLE_TYPE(ConfidentialGuestSupport, CONFIDENTIAL_GUEST_SUPPORT)
OBJECT_DECLARE_TYPE(ConfidentialGuestSupport,
ConfidentialGuestSupportClass,
CONFIDENTIAL_GUEST_SUPPORT)
struct ConfidentialGuestSupport {
Object parent;
@@ -55,8 +58,37 @@ struct ConfidentialGuestSupport {
typedef struct ConfidentialGuestSupportClass {
ObjectClass parent;
int (*kvm_init)(ConfidentialGuestSupport *cgs, Error **errp);
int (*kvm_reset)(ConfidentialGuestSupport *cgs, Error **errp);
} ConfidentialGuestSupportClass;
static inline int confidential_guest_kvm_init(ConfidentialGuestSupport *cgs,
Error **errp)
{
ConfidentialGuestSupportClass *klass;
klass = CONFIDENTIAL_GUEST_SUPPORT_GET_CLASS(cgs);
if (klass->kvm_init) {
return klass->kvm_init(cgs, errp);
}
return 0;
}
static inline int confidential_guest_kvm_reset(ConfidentialGuestSupport *cgs,
Error **errp)
{
ConfidentialGuestSupportClass *klass;
klass = CONFIDENTIAL_GUEST_SUPPORT_GET_CLASS(cgs);
if (klass->kvm_reset) {
return klass->kvm_reset(cgs, errp);
}
return 0;
}
#endif /* !CONFIG_USER_ONLY */
#endif /* QEMU_CONFIDENTIAL_GUEST_SUPPORT_H */

View File

@@ -175,6 +175,8 @@ typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length);
int ram_block_discard_guest_memfd_range(RAMBlock *rb, uint64_t start,
size_t length);
#endif

View File

@@ -243,6 +243,9 @@ typedef struct IOMMUTLBEvent {
/* RAM FD is opened read-only */
#define RAM_READONLY_FD (1 << 11)
/* RAM can be private that has kvm guest memfd backend */
#define RAM_GUEST_MEMFD (1 << 12)
static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end,
@@ -1288,8 +1291,10 @@ void memory_region_init_io(MemoryRegion *mr,
*
* Note that this function does not do anything to cause the data in the
* RAM memory region to be migrated; that is the responsibility of the caller.
*
* Return: true on success, else false setting @errp with error.
*/
void memory_region_init_ram_nomigrate(MemoryRegion *mr,
bool memory_region_init_ram_nomigrate(MemoryRegion *mr,
Object *owner,
const char *name,
uint64_t size,
@@ -1305,13 +1310,16 @@ void memory_region_init_ram_nomigrate(MemoryRegion *mr,
* @name: Region name, becomes part of RAMBlock name used in migration stream
* must be unique within any device
* @size: size of the region.
* @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_NORESERVE.
* @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_NORESERVE,
* RAM_GUEST_MEMFD.
* @errp: pointer to Error*, to store an error if it happens.
*
* Note that this function does not do anything to cause the data in the
* RAM memory region to be migrated; that is the responsibility of the caller.
*
* Return: true on success, else false setting @errp with error.
*/
void memory_region_init_ram_flags_nomigrate(MemoryRegion *mr,
bool memory_region_init_ram_flags_nomigrate(MemoryRegion *mr,
Object *owner,
const char *name,
uint64_t size,
@@ -1338,8 +1346,10 @@ void memory_region_init_ram_flags_nomigrate(MemoryRegion *mr,
*
* Note that this function does not do anything to cause the data in the
* RAM memory region to be migrated; that is the responsibility of the caller.
*
* Return: true on success, else false setting @errp with error.
*/
void memory_region_init_resizeable_ram(MemoryRegion *mr,
bool memory_region_init_resizeable_ram(MemoryRegion *mr,
Object *owner,
const char *name,
uint64_t size,
@@ -1363,15 +1373,17 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
* (getpagesize()) will be used.
* @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
* RAM_NORESERVE, RAM_PROTECTED, RAM_NAMED_FILE, RAM_READONLY,
* RAM_READONLY_FD
* RAM_READONLY_FD, RAM_GUEST_MEMFD
* @path: the path in which to allocate the RAM.
* @offset: offset within the file referenced by path
* @errp: pointer to Error*, to store an error if it happens.
*
* Note that this function does not do anything to cause the data in the
* RAM memory region to be migrated; that is the responsibility of the caller.
*
* Return: true on success, else false setting @errp with error.
*/
void memory_region_init_ram_from_file(MemoryRegion *mr,
bool memory_region_init_ram_from_file(MemoryRegion *mr,
Object *owner,
const char *name,
uint64_t size,
@@ -1391,15 +1403,17 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
* @size: size of the region.
* @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
* RAM_NORESERVE, RAM_PROTECTED, RAM_NAMED_FILE, RAM_READONLY,
* RAM_READONLY_FD
* RAM_READONLY_FD, RAM_GUEST_MEMFD
* @fd: the fd to mmap.
* @offset: offset within the file referenced by fd
* @errp: pointer to Error*, to store an error if it happens.
*
* Note that this function does not do anything to cause the data in the
* RAM memory region to be migrated; that is the responsibility of the caller.
*
* Return: true on success, else false setting @errp with error.
*/
void memory_region_init_ram_from_fd(MemoryRegion *mr,
bool memory_region_init_ram_from_fd(MemoryRegion *mr,
Object *owner,
const char *name,
uint64_t size,
@@ -1494,8 +1508,10 @@ void memory_region_init_alias(MemoryRegion *mr,
* must be unique within any device
* @size: size of the region.
* @errp: pointer to Error*, to store an error if it happens.
*
* Return: true on success, else false setting @errp with error.
*/
void memory_region_init_rom_nomigrate(MemoryRegion *mr,
bool memory_region_init_rom_nomigrate(MemoryRegion *mr,
Object *owner,
const char *name,
uint64_t size,
@@ -1517,8 +1533,10 @@ void memory_region_init_rom_nomigrate(MemoryRegion *mr,
* must be unique within any device
* @size: size of the region.
* @errp: pointer to Error*, to store an error if it happens.
*
* Return: true on success, else false setting @errp with error.
*/
void memory_region_init_rom_device_nomigrate(MemoryRegion *mr,
bool memory_region_init_rom_device_nomigrate(MemoryRegion *mr,
Object *owner,
const MemoryRegionOps *ops,
void *opaque,
@@ -1576,13 +1594,21 @@ void memory_region_init_iommu(void *_iommu_mr,
* give the RAM block a unique name for migration purposes.
* We should lift this restriction and allow arbitrary Objects.
* If you pass a non-NULL non-device @owner then we will assert.
*
* Return: true on success, else false setting @errp with error.
*/
void memory_region_init_ram(MemoryRegion *mr,
bool memory_region_init_ram(MemoryRegion *mr,
Object *owner,
const char *name,
uint64_t size,
Error **errp);
bool memory_region_init_ram_guest_memfd(MemoryRegion *mr,
Object *owner,
const char *name,
uint64_t size,
Error **errp);
/**
* memory_region_init_rom: Initialize a ROM memory region.
*
@@ -1603,8 +1629,10 @@ void memory_region_init_ram(MemoryRegion *mr,
* must be unique within any device
* @size: size of the region.
* @errp: pointer to Error*, to store an error if it happens.
*
* Return: true on success, else false setting @errp with error.
*/
void memory_region_init_rom(MemoryRegion *mr,
bool memory_region_init_rom(MemoryRegion *mr,
Object *owner,
const char *name,
uint64_t size,
@@ -1634,8 +1662,10 @@ void memory_region_init_rom(MemoryRegion *mr,
* must be unique within any device
* @size: size of the region.
* @errp: pointer to Error*, to store an error if it happens.
*
* Return: true on success, else false setting @errp with error.
*/
void memory_region_init_rom_device(MemoryRegion *mr,
bool memory_region_init_rom_device(MemoryRegion *mr,
Object *owner,
const MemoryRegionOps *ops,
void *opaque,
@@ -1702,6 +1732,16 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
*/
bool memory_region_is_protected(MemoryRegion *mr);
/**
* memory_region_has_guest_memfd: check whether a memory region has guest_memfd
* associated
*
* Returns %true if a memory region's ram_block has valid guest_memfd assigned.
*
* @mr: the memory region being queried
*/
bool memory_region_has_guest_memfd(MemoryRegion *mr);
/**
* memory_region_get_iommu: check whether a memory region is an iommu
*

View File

@@ -109,7 +109,7 @@ long qemu_maxrampagesize(void);
* @mr: the memory region where the ram block is
* @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
* RAM_NORESERVE, RAM_PROTECTED, RAM_NAMED_FILE, RAM_READONLY,
* RAM_READONLY_FD
* RAM_READONLY_FD, RAM_GUEST_MEMFD
* @mem_path or @fd: specify the backing file or device
* @offset: Offset into target file
* @errp: pointer to Error*, to store an error if it happens

View File

@@ -41,6 +41,7 @@ struct RAMBlock {
QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
int fd;
uint64_t fd_offset;
int guest_memfd;
size_t page_size;
/* dirty bitmap used during migration */
unsigned long *bmap;

View File

@@ -30,6 +30,7 @@ bool machine_usb(MachineState *machine);
int machine_phandle_start(MachineState *machine);
bool machine_dump_guest_core(MachineState *machine);
bool machine_mem_merge(MachineState *machine);
bool machine_require_guest_memfd(MachineState *machine);
HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine);
void machine_set_cpu_numa_node(MachineState *machine,
const CpuInstanceProperties *props,
@@ -364,6 +365,7 @@ struct MachineState {
char *dt_compatible;
bool dump_guest_core;
bool mem_merge;
bool require_guest_memfd;
bool usb;
bool usb_disabled;
char *firmware;

View File

@@ -12,7 +12,6 @@
#include "hw/hotplug.h"
#include "qom/object.h"
#include "hw/i386/sgx-epc.h"
#include "hw/firmware/smbios.h"
#include "hw/cxl/cxl.h"
#define HPET_INTCAP "hpet-intcap"
@@ -152,8 +151,6 @@ extern int fd_bootchk;
void pc_acpi_smi_interrupt(void *opaque, int irq, int level);
void pc_guest_info_init(PCMachineState *pcms);
#define PCI_HOST_PROP_RAM_MEM "ram-mem"
#define PCI_HOST_PROP_PCI_MEM "pci-mem"
#define PCI_HOST_PROP_SYSTEM_MEM "system-mem"
@@ -165,6 +162,7 @@ void pc_guest_info_init(PCMachineState *pcms);
#define PCI_HOST_PROP_PCI_HOLE64_SIZE "pci-hole64-size"
#define PCI_HOST_BELOW_4G_MEM_SIZE "below-4g-mem-size"
#define PCI_HOST_ABOVE_4G_MEM_SIZE "above-4g-mem-size"
#define PCI_HOST_PROP_SMM_RANGES "smm-ranges"
void pc_pci_as_mapping_init(MemoryRegion *system_memory,

58
include/hw/i386/tdvf.h Normal file
View File

@@ -0,0 +1,58 @@
/*
* SPDX-License-Identifier: GPL-2.0-or-later
* Copyright (c) 2020 Intel Corporation
* Author: Isaku Yamahata <isaku.yamahata at gmail.com>
* <isaku.yamahata at intel.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
* You should have received a copy of the GNU General Public License along
* with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef HW_I386_TDVF_H
#define HW_I386_TDVF_H
#include "qemu/osdep.h"
#define TDVF_SECTION_TYPE_BFV 0
#define TDVF_SECTION_TYPE_CFV 1
#define TDVF_SECTION_TYPE_TD_HOB 2
#define TDVF_SECTION_TYPE_TEMP_MEM 3
#define TDVF_SECTION_ATTRIBUTES_MR_EXTEND (1U << 0)
#define TDVF_SECTION_ATTRIBUTES_PAGE_AUG (1U << 1)
typedef struct TdxFirmwareEntry {
uint32_t data_offset;
uint32_t data_len;
uint64_t address;
uint64_t size;
uint32_t type;
uint32_t attributes;
void *mem_ptr;
} TdxFirmwareEntry;
typedef struct TdxFirmware {
void *mem_ptr;
uint32_t nr_entries;
TdxFirmwareEntry *entries;
} TdxFirmware;
#define for_each_tdx_fw_entry(fw, e) \
for (e = (fw)->entries; e != (fw)->entries + (fw)->nr_entries; e++)
int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size);
#endif /* HW_I386_TDVF_H */

View File

@@ -34,6 +34,8 @@ struct X86MachineClass {
bool save_tsc_khz;
/* use DMA capable linuxboot option rom */
bool fwcfg_dma_enabled;
/* CPU and apic information: */
bool apic_xrupt_override;
};
struct X86MachineState {
@@ -41,6 +43,7 @@ struct X86MachineState {
MachineState parent;
/*< public >*/
unsigned int vm_type;
/* Pointers to devices and objects: */
ISADevice *rtc;
@@ -57,7 +60,7 @@ struct X86MachineState {
uint64_t above_4g_mem_start;
/* CPU and apic information: */
bool apic_xrupt_override;
bool eoi_intercept_unsupported;
unsigned pci_irq_mask;
unsigned apic_id_limit;
uint16_t boot_cpus;
@@ -138,7 +141,7 @@ typedef struct GSIState {
qemu_irq x86_allocate_cpu_irq(void);
void gsi_handler(void *opaque, int n, int level);
void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name);
void ioapic_init_gsi(GSIState *gsi_state, Object *parent);
DeviceState *ioapic_init_secondary(GSIState *gsi_state);
/* pc_sysfw.c */

View File

@@ -50,6 +50,7 @@ struct MCHPCIState {
MemoryRegion tseg_blackhole, tseg_window;
MemoryRegion smbase_blackhole, smbase_window;
bool has_smram_at_smbase;
bool has_smm_ranges;
Range pci_hole;
uint64_t below_4g_mem_size;
uint64_t above_4g_mem_size;

View File

@@ -1,17 +0,0 @@
/*
* PEF (Protected Execution Facility) for POWER support
*
* Copyright Red Hat.
*
* This work is licensed under the terms of the GNU GPL, version 2 or later.
* See the COPYING file in the top-level directory.
*
*/
#ifndef HW_PPC_PEF_H
#define HW_PPC_PEF_H
int pef_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
int pef_kvm_reset(ConfidentialGuestSupport *cgs, Error **errp);
#endif /* HW_PPC_PEF_H */

View File

@@ -14,9 +14,21 @@
#include "qom/object.h"
/*
* IRQ range offsets per device type
* The XIVE IRQ backend uses the same layout as the XICS backend but
* covers the full range of the IRQ number space. The IRQ numbers for
* the CPU IPIs are allocated at the bottom of this space, below 4K,
* to preserve compatibility with XICS which does not use that range.
*/
/*
* CPU IPI range (XIVE only)
*/
#define SPAPR_IRQ_IPI 0x0
#define SPAPR_IRQ_NR_IPIS 0x1000
/*
* IRQ range offsets per device type
*/
#define SPAPR_XIRQ_BASE XICS_IRQ_BASE /* 0x1000 */
#define SPAPR_IRQ_EPOW (SPAPR_XIRQ_BASE + 0x0000)

View File

@@ -678,8 +678,10 @@ typedef struct ThreadContext ThreadContext;
* memory area starting at @area with the size of @sz. After a successful call,
* each page in the area was faulted in writable at least once, for example,
* after allocating file blocks for mapped files.
*
* Return: true on success, else false setting @errp with error.
*/
void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
bool qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
ThreadContext *tc, Error **errp);
/**

View File

@@ -53,7 +53,7 @@ extern "C" {
* Format modifiers may change any property of the buffer, including the number
* of planes and/or the required allocation size. Format modifiers are
* vendor-namespaced, and as such the relationship between a fourcc code and a
* modifier is specific to the modifer being used. For example, some modifiers
* modifier is specific to the modifier being used. For example, some modifiers
* may preserve meaning - such as number of planes - from the fourcc code,
* whereas others may not.
*
@@ -78,7 +78,7 @@ extern "C" {
* format.
* - Higher-level programs interfacing with KMS/GBM/EGL/Vulkan/etc: these users
* see modifiers as opaque tokens they can check for equality and intersect.
* These users musn't need to know to reason about the modifier value
* These users mustn't need to know to reason about the modifier value
* (i.e. they are not expected to extract information out of the modifier).
*
* Vendors should document their modifier usage in as much detail as
@@ -322,6 +322,8 @@ extern "C" {
* index 1 = Cr:Cb plane, [39:0] Cr1:Cb1:Cr0:Cb0 little endian
*/
#define DRM_FORMAT_NV15 fourcc_code('N', 'V', '1', '5') /* 2x2 subsampled Cr:Cb plane */
#define DRM_FORMAT_NV20 fourcc_code('N', 'V', '2', '0') /* 2x1 subsampled Cr:Cb plane */
#define DRM_FORMAT_NV30 fourcc_code('N', 'V', '3', '0') /* non-subsampled Cr:Cb plane */
/*
* 2 plane YCbCr MSB aligned
@@ -537,7 +539,7 @@ extern "C" {
* This is a tiled layout using 4Kb tiles in row-major layout.
* Within the tile pixels are laid out in 16 256 byte units / sub-tiles which
* are arranged in four groups (two wide, two high) with column-major layout.
* Each group therefore consits out of four 256 byte units, which are also laid
* Each group therefore consists out of four 256 byte units, which are also laid
* out as 2x2 column-major.
* 256 byte units are made out of four 64 byte blocks of pixels, producing
* either a square block or a 2:1 unit.
@@ -1100,7 +1102,7 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t modifier)
*/
/*
* The top 4 bits (out of the 56 bits alloted for specifying vendor specific
* The top 4 bits (out of the 56 bits allotted for specifying vendor specific
* modifiers) denote the category for modifiers. Currently we have three
* categories of modifiers ie AFBC, MISC and AFRC. We can have a maximum of
* sixteen different categories.
@@ -1416,7 +1418,7 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t modifier)
* Amlogic FBC Memory Saving mode
*
* Indicates the storage is packed when pixel size is multiple of word
* boudaries, i.e. 8bit should be stored in this mode to save allocation
* boundaries, i.e. 8bit should be stored in this mode to save allocation
* memory.
*
* This mode reduces body layout to 3072 bytes per 64x32 superblock with

View File

@@ -1266,6 +1266,8 @@ struct ethtool_rxfh_indir {
* hardware hash key.
* @hfunc: Defines the current RSS hash function used by HW (or to be set to).
* Valid values are one of the %ETH_RSS_HASH_*.
* @input_xfrm: Defines how the input data is transformed. Valid values are one
* of %RXH_XFRM_*.
* @rsvd8: Reserved for future use; see the note on reserved space.
* @rsvd32: Reserved for future use; see the note on reserved space.
* @rss_config: RX ring/queue index for each hash value i.e., indirection table
@@ -1285,7 +1287,8 @@ struct ethtool_rxfh {
uint32_t indir_size;
uint32_t key_size;
uint8_t hfunc;
uint8_t rsvd8[3];
uint8_t input_xfrm;
uint8_t rsvd8[2];
uint32_t rsvd32;
uint32_t rss_config[];
};
@@ -1992,6 +1995,15 @@ static inline int ethtool_validate_duplex(uint8_t duplex)
#define WOL_MODE_COUNT 8
/* RSS hash function data
* XOR the corresponding source and destination fields of each specified
* protocol. Both copies of the XOR'ed fields are fed into the RSS and RXHASH
* calculation. Note that this XORing reduces the input set entropy and could
* be exploited to reduce the RSS queue spread.
*/
#define RXH_XFRM_SYM_XOR (1 << 0)
#define RXH_XFRM_NO_CHANGE 0xff
/* L2-L4 network traffic flow types */
#define TCP_V4_FLOW 0x01 /* hash or spec (tcp_ip4_spec) */
#define UDP_V4_FLOW 0x02 /* hash or spec (udp_ip4_spec) */
@@ -2128,18 +2140,6 @@ enum ethtool_reset_flags {
* refused. For drivers: ignore this field (use kernel's
* __ETHTOOL_LINK_MODE_MASK_NBITS instead), any change to it will
* be overwritten by kernel.
* @supported: Bitmap with each bit meaning given by
* %ethtool_link_mode_bit_indices for the link modes, physical
* connectors and other link features for which the interface
* supports autonegotiation or auto-detection. Read-only.
* @advertising: Bitmap with each bit meaning given by
* %ethtool_link_mode_bit_indices for the link modes, physical
* connectors and other link features that are advertised through
* autonegotiation or enabled for auto-detection.
* @lp_advertising: Bitmap with each bit meaning given by
* %ethtool_link_mode_bit_indices for the link modes, and other
* link features that the link partner advertised through
* autonegotiation; 0 if unknown or not applicable. Read-only.
* @transceiver: Used to distinguish different possible PHY types,
* reported consistently by PHYLIB. Read-only.
* @master_slave_cfg: Master/slave port mode.
@@ -2181,6 +2181,21 @@ enum ethtool_reset_flags {
* %set_link_ksettings() should validate all fields other than @cmd
* and @link_mode_masks_nwords that are not described as read-only or
* deprecated, and must ignore all fields described as read-only.
*
* @link_mode_masks is divided into three bitfields, each of length
* @link_mode_masks_nwords:
* - supported: Bitmap with each bit meaning given by
* %ethtool_link_mode_bit_indices for the link modes, physical
* connectors and other link features for which the interface
* supports autonegotiation or auto-detection. Read-only.
* - advertising: Bitmap with each bit meaning given by
* %ethtool_link_mode_bit_indices for the link modes, physical
* connectors and other link features that are advertised through
* autonegotiation or enabled for auto-detection.
* - lp_advertising: Bitmap with each bit meaning given by
* %ethtool_link_mode_bit_indices for the link modes, and other
* link features that the link partner advertised through
* autonegotiation; 0 if unknown or not applicable. Read-only.
*/
struct ethtool_link_settings {
uint32_t cmd;

View File

@@ -209,7 +209,7 @@
* - add FUSE_HAS_EXPIRE_ONLY
*
* 7.39
* - add FUSE_DIRECT_IO_RELAX
* - add FUSE_DIRECT_IO_ALLOW_MMAP
* - add FUSE_STATX and related structures
*/
@@ -405,8 +405,7 @@ struct fuse_file_lock {
* FUSE_CREATE_SUPP_GROUP: add supplementary group info to create, mkdir,
* symlink and mknod (single group that matches parent)
* FUSE_HAS_EXPIRE_ONLY: kernel supports expiry-only entry invalidation
* FUSE_DIRECT_IO_RELAX: relax restrictions in FOPEN_DIRECT_IO mode, for now
* allow shared mmap
* FUSE_DIRECT_IO_ALLOW_MMAP: allow shared mmap in FOPEN_DIRECT_IO mode.
*/
#define FUSE_ASYNC_READ (1 << 0)
#define FUSE_POSIX_LOCKS (1 << 1)
@@ -445,7 +444,10 @@ struct fuse_file_lock {
#define FUSE_HAS_INODE_DAX (1ULL << 33)
#define FUSE_CREATE_SUPP_GROUP (1ULL << 34)
#define FUSE_HAS_EXPIRE_ONLY (1ULL << 35)
#define FUSE_DIRECT_IO_RELAX (1ULL << 36)
#define FUSE_DIRECT_IO_ALLOW_MMAP (1ULL << 36)
/* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */
#define FUSE_DIRECT_IO_RELAX FUSE_DIRECT_IO_ALLOW_MMAP
/**
* CUSE INIT request/reply flags

View File

@@ -80,6 +80,7 @@
#define PCI_HEADER_TYPE_NORMAL 0
#define PCI_HEADER_TYPE_BRIDGE 1
#define PCI_HEADER_TYPE_CARDBUS 2
#define PCI_HEADER_TYPE_MFD 0x80 /* Multi-Function Device (possible) */
#define PCI_BIST 0x0f /* 8 bits */
#define PCI_BIST_CODE_MASK 0x0f /* Return result */
@@ -637,6 +638,7 @@
#define PCI_EXP_RTCAP 0x1e /* Root Capabilities */
#define PCI_EXP_RTCAP_CRSVIS 0x0001 /* CRS Software Visibility capability */
#define PCI_EXP_RTSTA 0x20 /* Root Status */
#define PCI_EXP_RTSTA_PME_RQ_ID 0x0000ffff /* PME Requester ID */
#define PCI_EXP_RTSTA_PME 0x00010000 /* PME status */
#define PCI_EXP_RTSTA_PENDING 0x00020000 /* PME pending */
/*
@@ -930,12 +932,13 @@
/* Process Address Space ID */
#define PCI_PASID_CAP 0x04 /* PASID feature register */
#define PCI_PASID_CAP_EXEC 0x02 /* Exec permissions Supported */
#define PCI_PASID_CAP_PRIV 0x04 /* Privilege Mode Supported */
#define PCI_PASID_CAP_EXEC 0x0002 /* Exec permissions Supported */
#define PCI_PASID_CAP_PRIV 0x0004 /* Privilege Mode Supported */
#define PCI_PASID_CAP_WIDTH 0x1f00
#define PCI_PASID_CTRL 0x06 /* PASID control register */
#define PCI_PASID_CTRL_ENABLE 0x01 /* Enable bit */
#define PCI_PASID_CTRL_EXEC 0x02 /* Exec permissions Enable */
#define PCI_PASID_CTRL_PRIV 0x04 /* Privilege Mode Enable */
#define PCI_PASID_CTRL_ENABLE 0x0001 /* Enable bit */
#define PCI_PASID_CTRL_EXEC 0x0002 /* Exec permissions Enable */
#define PCI_PASID_CTRL_PRIV 0x0004 /* Privilege Mode Enable */
#define PCI_EXT_CAP_PASID_SIZEOF 8
/* Single Root I/O Virtualization */
@@ -975,6 +978,8 @@
#define PCI_LTR_VALUE_MASK 0x000003ff
#define PCI_LTR_SCALE_MASK 0x00001c00
#define PCI_LTR_SCALE_SHIFT 10
#define PCI_LTR_NOSNOOP_VALUE 0x03ff0000 /* Max No-Snoop Latency Value */
#define PCI_LTR_NOSNOOP_SCALE 0x1c000000 /* Scale for Max Value */
#define PCI_EXT_CAP_LTR_SIZEOF 8
/* Access Control Service */
@@ -1042,9 +1047,16 @@
#define PCI_EXP_DPC_STATUS 0x08 /* DPC Status */
#define PCI_EXP_DPC_STATUS_TRIGGER 0x0001 /* Trigger Status */
#define PCI_EXP_DPC_STATUS_TRIGGER_RSN 0x0006 /* Trigger Reason */
#define PCI_EXP_DPC_STATUS_TRIGGER_RSN_UNCOR 0x0000 /* Uncorrectable error */
#define PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE 0x0002 /* Rcvd ERR_NONFATAL */
#define PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE 0x0004 /* Rcvd ERR_FATAL */
#define PCI_EXP_DPC_STATUS_TRIGGER_RSN_IN_EXT 0x0006 /* Reason in Trig Reason Extension field */
#define PCI_EXP_DPC_STATUS_INTERRUPT 0x0008 /* Interrupt Status */
#define PCI_EXP_DPC_RP_BUSY 0x0010 /* Root Port Busy */
#define PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT 0x0060 /* Trig Reason Extension */
#define PCI_EXP_DPC_STATUS_TRIGGER_RSN_RP_PIO 0x0000 /* RP PIO error */
#define PCI_EXP_DPC_STATUS_TRIGGER_RSN_SW_TRIGGER 0x0020 /* DPC SW Trigger bit */
#define PCI_EXP_DPC_RP_PIO_FEP 0x1f00 /* RP PIO First Err Ptr */
#define PCI_EXP_DPC_SOURCE_ID 0x0A /* DPC Source Identifier */
@@ -1088,6 +1100,8 @@
#define PCI_L1SS_CTL1_LTR_L12_TH_VALUE 0x03ff0000 /* LTR_L1.2_THRESHOLD_Value */
#define PCI_L1SS_CTL1_LTR_L12_TH_SCALE 0xe0000000 /* LTR_L1.2_THRESHOLD_Scale */
#define PCI_L1SS_CTL2 0x0c /* Control 2 Register */
#define PCI_L1SS_CTL2_T_PWR_ON_SCALE 0x00000003 /* T_POWER_ON Scale */
#define PCI_L1SS_CTL2_T_PWR_ON_VALUE 0x000000f8 /* T_POWER_ON Value */
/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */
#define PCI_DVSEC_HEADER1 0x4 /* Designated Vendor-Specific Header1 */

View File

@@ -185,5 +185,12 @@ struct vhost_vdpa_iova_range {
* DRIVER_OK
*/
#define VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK 0x6
/* Device may expose the virtqueue's descriptor area, driver area and
* device area to a different group for ASID binding than where its
* buffers may reside. Requires VHOST_BACKEND_F_IOTLB_ASID.
*/
#define VHOST_BACKEND_F_DESC_ASID 0x7
/* IOTLB don't flush memory mapping across device reset */
#define VHOST_BACKEND_F_IOTLB_PERSIST 0x8
#endif

View File

@@ -52,7 +52,7 @@
* rest are per-device feature bits.
*/
#define VIRTIO_TRANSPORT_F_START 28
#define VIRTIO_TRANSPORT_F_END 41
#define VIRTIO_TRANSPORT_F_END 42
#ifndef VIRTIO_CONFIG_NO_LEGACY
/* Do we get callbacks when the ring is completely used, even if we've
@@ -103,8 +103,19 @@
*/
#define VIRTIO_F_NOTIFICATION_DATA 38
/* This feature indicates that the driver uses the data provided by the device
* as a virtqueue identifier in available buffer notifications.
*/
#define VIRTIO_F_NOTIF_CONFIG_DATA 39
/*
* This feature indicates that the driver can reset a queue individually.
*/
#define VIRTIO_F_RING_RESET 40
/*
* This feature indicates that the device support administration virtqueues.
*/
#define VIRTIO_F_ADMIN_VQ 41
#endif /* _LINUX_VIRTIO_CONFIG_H */

View File

@@ -166,6 +166,20 @@ struct virtio_pci_common_cfg {
uint32_t queue_used_hi; /* read-write */
};
/*
* Warning: do not use sizeof on this: use offsetofend for
* specific fields you need.
*/
struct virtio_pci_modern_common_cfg {
struct virtio_pci_common_cfg cfg;
uint16_t queue_notify_data; /* read-write */
uint16_t queue_reset; /* read-write */
uint16_t admin_queue_index; /* read-only */
uint16_t admin_queue_num; /* read-only */
};
/* Fields in VIRTIO_PCI_CAP_PCI_CFG: */
struct virtio_pci_cfg_cap {
struct virtio_pci_cap cap;
@@ -204,7 +218,72 @@ struct virtio_pci_cfg_cap {
#define VIRTIO_PCI_COMMON_Q_USEDHI 52
#define VIRTIO_PCI_COMMON_Q_NDATA 56
#define VIRTIO_PCI_COMMON_Q_RESET 58
#define VIRTIO_PCI_COMMON_ADM_Q_IDX 60
#define VIRTIO_PCI_COMMON_ADM_Q_NUM 62
#endif /* VIRTIO_PCI_NO_MODERN */
/* Admin command status. */
#define VIRTIO_ADMIN_STATUS_OK 0
/* Admin command opcode. */
#define VIRTIO_ADMIN_CMD_LIST_QUERY 0x0
#define VIRTIO_ADMIN_CMD_LIST_USE 0x1
/* Admin command group type. */
#define VIRTIO_ADMIN_GROUP_TYPE_SRIOV 0x1
/* Transitional device admin command. */
#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE 0x2
#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ 0x3
#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE 0x4
#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ 0x5
#define VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO 0x6
struct QEMU_PACKED virtio_admin_cmd_hdr {
uint16_t opcode;
/*
* 1 - SR-IOV
* 2-65535 - reserved
*/
uint16_t group_type;
/* Unused, reserved for future extensions. */
uint8_t reserved1[12];
uint64_t group_member_id;
};
struct QEMU_PACKED virtio_admin_cmd_status {
uint16_t status;
uint16_t status_qualifier;
/* Unused, reserved for future extensions. */
uint8_t reserved2[4];
};
struct QEMU_PACKED virtio_admin_cmd_legacy_wr_data {
uint8_t offset; /* Starting offset of the register(s) to write. */
uint8_t reserved[7];
uint8_t registers[];
};
struct QEMU_PACKED virtio_admin_cmd_legacy_rd_data {
uint8_t offset; /* Starting offset of the register(s) to read. */
};
#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END 0
#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_DEV 0x1
#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM 0x2
#define VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO 4
struct QEMU_PACKED virtio_admin_cmd_notify_info_data {
uint8_t flags; /* 0 = end of list, 1 = owner device, 2 = member device */
uint8_t bar; /* BAR of the member or the owner device */
uint8_t padding[6];
uint64_t offset; /* Offset within bar. */
};
struct virtio_admin_cmd_notify_info_result {
struct virtio_admin_cmd_notify_info_data entries[VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO];
};
#endif

View File

@@ -14,6 +14,13 @@
#include "standard-headers/linux/virtio_ids.h"
#include "standard-headers/linux/virtio_config.h"
/* Feature bits */
/* guest physical address range will be indicated as shared memory region 0 */
#define VIRTIO_PMEM_F_SHMEM_REGION 0
/* shmid of the shared memory region corresponding to the pmem */
#define VIRTIO_PMEM_SHMEM_REGION_ID 0
struct virtio_pmem_config {
uint64_t start;
uint64_t size;

View File

@@ -0,0 +1,198 @@
/*
* Copyright (C) 2020 Intel Corporation
*
* Author: Isaku Yamahata <isaku.yamahata at gmail.com>
* <isaku.yamahata at intel.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
* You should have received a copy of the GNU General Public License along
* with this program; if not, see <http://www.gnu.org/licenses/>.
*
*/
#ifndef HW_I386_UEFI_H
#define HW_I386_UEFI_H
/***************************************************************************/
/*
* basic EFI definitions
* supplemented with UEFI Specification Version 2.8 (Errata A)
* released February 2020
*/
/* UEFI integer is little endian */
typedef struct {
uint32_t Data1;
uint16_t Data2;
uint16_t Data3;
uint8_t Data4[8];
} EFI_GUID;
typedef enum {
EfiReservedMemoryType,
EfiLoaderCode,
EfiLoaderData,
EfiBootServicesCode,
EfiBootServicesData,
EfiRuntimeServicesCode,
EfiRuntimeServicesData,
EfiConventionalMemory,
EfiUnusableMemory,
EfiACPIReclaimMemory,
EfiACPIMemoryNVS,
EfiMemoryMappedIO,
EfiMemoryMappedIOPortSpace,
EfiPalCode,
EfiPersistentMemory,
EfiUnacceptedMemoryType,
EfiMaxMemoryType
} EFI_MEMORY_TYPE;
#define EFI_HOB_HANDOFF_TABLE_VERSION 0x0009
#define EFI_HOB_TYPE_HANDOFF 0x0001
#define EFI_HOB_TYPE_MEMORY_ALLOCATION 0x0002
#define EFI_HOB_TYPE_RESOURCE_DESCRIPTOR 0x0003
#define EFI_HOB_TYPE_GUID_EXTENSION 0x0004
#define EFI_HOB_TYPE_FV 0x0005
#define EFI_HOB_TYPE_CPU 0x0006
#define EFI_HOB_TYPE_MEMORY_POOL 0x0007
#define EFI_HOB_TYPE_FV2 0x0009
#define EFI_HOB_TYPE_LOAD_PEIM_UNUSED 0x000A
#define EFI_HOB_TYPE_UEFI_CAPSULE 0x000B
#define EFI_HOB_TYPE_FV3 0x000C
#define EFI_HOB_TYPE_UNUSED 0xFFFE
#define EFI_HOB_TYPE_END_OF_HOB_LIST 0xFFFF
typedef struct {
uint16_t HobType;
uint16_t HobLength;
uint32_t Reserved;
} EFI_HOB_GENERIC_HEADER;
typedef uint64_t EFI_PHYSICAL_ADDRESS;
typedef uint32_t EFI_BOOT_MODE;
typedef struct {
EFI_HOB_GENERIC_HEADER Header;
uint32_t Version;
EFI_BOOT_MODE BootMode;
EFI_PHYSICAL_ADDRESS EfiMemoryTop;
EFI_PHYSICAL_ADDRESS EfiMemoryBottom;
EFI_PHYSICAL_ADDRESS EfiFreeMemoryTop;
EFI_PHYSICAL_ADDRESS EfiFreeMemoryBottom;
EFI_PHYSICAL_ADDRESS EfiEndOfHobList;
} EFI_HOB_HANDOFF_INFO_TABLE;
#define EFI_RESOURCE_SYSTEM_MEMORY 0x00000000
#define EFI_RESOURCE_MEMORY_MAPPED_IO 0x00000001
#define EFI_RESOURCE_IO 0x00000002
#define EFI_RESOURCE_FIRMWARE_DEVICE 0x00000003
#define EFI_RESOURCE_MEMORY_MAPPED_IO_PORT 0x00000004
#define EFI_RESOURCE_MEMORY_RESERVED 0x00000005
#define EFI_RESOURCE_IO_RESERVED 0x00000006
#define EFI_RESOURCE_MEMORY_UNACCEPTED 0x00000007
#define EFI_RESOURCE_MAX_MEMORY_TYPE 0x00000008
#define EFI_RESOURCE_ATTRIBUTE_PRESENT 0x00000001
#define EFI_RESOURCE_ATTRIBUTE_INITIALIZED 0x00000002
#define EFI_RESOURCE_ATTRIBUTE_TESTED 0x00000004
#define EFI_RESOURCE_ATTRIBUTE_SINGLE_BIT_ECC 0x00000008
#define EFI_RESOURCE_ATTRIBUTE_MULTIPLE_BIT_ECC 0x00000010
#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_1 0x00000020
#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_2 0x00000040
#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTED 0x00000080
#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTED 0x00000100
#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTED 0x00000200
#define EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE 0x00000400
#define EFI_RESOURCE_ATTRIBUTE_WRITE_COMBINEABLE 0x00000800
#define EFI_RESOURCE_ATTRIBUTE_WRITE_THROUGH_CACHEABLE 0x00001000
#define EFI_RESOURCE_ATTRIBUTE_WRITE_BACK_CACHEABLE 0x00002000
#define EFI_RESOURCE_ATTRIBUTE_16_BIT_IO 0x00004000
#define EFI_RESOURCE_ATTRIBUTE_32_BIT_IO 0x00008000
#define EFI_RESOURCE_ATTRIBUTE_64_BIT_IO 0x00010000
#define EFI_RESOURCE_ATTRIBUTE_UNCACHED_EXPORTED 0x00020000
#define EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTED 0x00040000
#define EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTABLE 0x00080000
#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTABLE 0x00100000
#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTABLE 0x00200000
#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTABLE 0x00400000
#define EFI_RESOURCE_ATTRIBUTE_PERSISTENT 0x00800000
#define EFI_RESOURCE_ATTRIBUTE_PERSISTABLE 0x01000000
#define EFI_RESOURCE_ATTRIBUTE_MORE_RELIABLE 0x02000000
typedef uint32_t EFI_RESOURCE_TYPE;
typedef uint32_t EFI_RESOURCE_ATTRIBUTE_TYPE;
typedef struct {
EFI_HOB_GENERIC_HEADER Header;
EFI_GUID Owner;
EFI_RESOURCE_TYPE ResourceType;
EFI_RESOURCE_ATTRIBUTE_TYPE ResourceAttribute;
EFI_PHYSICAL_ADDRESS PhysicalStart;
uint64_t ResourceLength;
} EFI_HOB_RESOURCE_DESCRIPTOR;
typedef struct {
EFI_HOB_GENERIC_HEADER Header;
EFI_GUID Name;
/* guid specific data follows */
} EFI_HOB_GUID_TYPE;
typedef struct {
EFI_HOB_GENERIC_HEADER Header;
EFI_PHYSICAL_ADDRESS BaseAddress;
uint64_t Length;
} EFI_HOB_FIRMWARE_VOLUME;
typedef struct {
EFI_HOB_GENERIC_HEADER Header;
EFI_PHYSICAL_ADDRESS BaseAddress;
uint64_t Length;
EFI_GUID FvName;
EFI_GUID FileName;
} EFI_HOB_FIRMWARE_VOLUME2;
typedef struct {
EFI_HOB_GENERIC_HEADER Header;
EFI_PHYSICAL_ADDRESS BaseAddress;
uint64_t Length;
uint32_t AuthenticationStatus;
bool ExtractedFv;
EFI_GUID FvName;
EFI_GUID FileName;
} EFI_HOB_FIRMWARE_VOLUME3;
typedef struct {
EFI_HOB_GENERIC_HEADER Header;
uint8_t SizeOfMemorySpace;
uint8_t SizeOfIoSpace;
uint8_t Reserved[6];
} EFI_HOB_CPU;
typedef struct {
EFI_HOB_GENERIC_HEADER Header;
} EFI_HOB_MEMORY_POOL;
typedef struct {
EFI_HOB_GENERIC_HEADER Header;
EFI_PHYSICAL_ADDRESS BaseAddress;
uint64_t Length;
} EFI_HOB_UEFI_CAPSULE;
#define EFI_HOB_OWNER_ZERO \
((EFI_GUID){ 0x00000000, 0x0000, 0x0000, \
{ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 } })
#endif

View File

@@ -47,7 +47,15 @@ OBJECT_DECLARE_TYPE(HostMemoryBackend, HostMemoryBackendClass,
struct HostMemoryBackendClass {
ObjectClass parent_class;
void (*alloc)(HostMemoryBackend *backend, Error **errp);
/**
* alloc: Allocate memory from backend.
*
* @backend: the #HostMemoryBackend.
* @errp: pointer to Error*, to store an error if it happens.
*
* Return: true on success, else false setting @errp with error.
*/
bool (*alloc)(HostMemoryBackend *backend, Error **errp);
};
/**
@@ -66,6 +74,7 @@ struct HostMemoryBackend {
uint64_t size;
bool merge, dump, use_canonical_path;
bool prealloc, is_mapped, share, reserve;
bool guest_memfd;
uint32_t prealloc_threads;
ThreadContext *prealloc_context;
DECLARE_BITMAP(host_nodes, MAX_NODES + 1);

View File

@@ -341,6 +341,7 @@ int kvm_arch_get_default_type(MachineState *ms);
int kvm_arch_init(MachineState *ms, KVMState *s);
int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp);
int kvm_arch_init_vcpu(CPUState *cpu);
int kvm_arch_destroy_vcpu(CPUState *cpu);
@@ -538,4 +539,17 @@ bool kvm_arch_cpu_check_are_resettable(void);
bool kvm_dirty_ring_enabled(void);
uint32_t kvm_dirty_ring_size(void);
/**
* kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page
* reported for the VM.
*/
bool kvm_hwpoisoned_mem(void);
int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp);
int kvm_set_memory_attributes_private(hwaddr start, hwaddr size);
int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size);
int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
#endif

View File

@@ -30,6 +30,8 @@ typedef struct KVMSlot
int as_id;
/* Cache of the offset in ram address space */
ram_addr_t ram_start_offset;
int guest_memfd;
hwaddr guest_memfd_offset;
} KVMSlot;
typedef struct KVMMemoryUpdate {

View File

@@ -491,6 +491,38 @@ struct kvm_smccc_filter {
#define KVM_HYPERCALL_EXIT_SMC (1U << 0)
#define KVM_HYPERCALL_EXIT_16BIT (1U << 1)
/*
* Get feature ID registers userspace writable mask.
*
* From DDI0487J.a, D19.2.66 ("ID_AA64MMFR2_EL1, AArch64 Memory Model
* Feature Register 2"):
*
* "The Feature ID space is defined as the System register space in
* AArch64 with op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7},
* op2=={0-7}."
*
* This covers all currently known R/O registers that indicate
* anything useful feature wise, including the ID registers.
*
* If we ever need to introduce a new range, it will be described as
* such in the range field.
*/
#define KVM_ARM_FEATURE_ID_RANGE_IDX(op0, op1, crn, crm, op2) \
({ \
__u64 __op1 = (op1) & 3; \
__op1 -= (__op1 == 3); \
(__op1 << 6 | ((crm) & 7) << 3 | (op2)); \
})
#define KVM_ARM_FEATURE_ID_RANGE 0
#define KVM_ARM_FEATURE_ID_RANGE_SIZE (3 * 8 * 8)
struct reg_mask_range {
__u64 addr; /* Pointer to mask array */
__u32 range; /* Requested range */
__u32 reserved[13];
};
#endif
#endif /* __ARM_KVM_H__ */

View File

@@ -71,7 +71,7 @@ __SYSCALL(__NR_fremovexattr, sys_fremovexattr)
#define __NR_getcwd 17
__SYSCALL(__NR_getcwd, sys_getcwd)
#define __NR_lookup_dcookie 18
__SC_COMP(__NR_lookup_dcookie, sys_lookup_dcookie, compat_sys_lookup_dcookie)
__SYSCALL(__NR_lookup_dcookie, sys_ni_syscall)
#define __NR_eventfd2 19
__SYSCALL(__NR_eventfd2, sys_eventfd2)
#define __NR_epoll_create1 20
@@ -816,15 +816,34 @@ __SYSCALL(__NR_process_mrelease, sys_process_mrelease)
__SYSCALL(__NR_futex_waitv, sys_futex_waitv)
#define __NR_set_mempolicy_home_node 450
__SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
#define __NR_cachestat 451
__SYSCALL(__NR_cachestat, sys_cachestat)
#define __NR_fchmodat2 452
__SYSCALL(__NR_fchmodat2, sys_fchmodat2)
#define __NR_map_shadow_stack 453
__SYSCALL(__NR_map_shadow_stack, sys_map_shadow_stack)
#define __NR_futex_wake 454
__SYSCALL(__NR_futex_wake, sys_futex_wake)
#define __NR_futex_wait 455
__SYSCALL(__NR_futex_wait, sys_futex_wait)
#define __NR_futex_requeue 456
__SYSCALL(__NR_futex_requeue, sys_futex_requeue)
#define __NR_statmount 457
__SYSCALL(__NR_statmount, sys_statmount)
#define __NR_listmount 458
__SYSCALL(__NR_listmount, sys_listmount)
#define __NR_lsm_get_self_attr 459
__SYSCALL(__NR_lsm_get_self_attr, sys_lsm_get_self_attr)
#define __NR_lsm_set_self_attr 460
__SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
#define __NR_lsm_list_modules 461
__SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
#undef __NR_syscalls
#define __NR_syscalls 453
#define __NR_syscalls 462
/*
* 32 bit systems traditionally used different

Some files were not shown because too many files have changed in this diff Show More