Add a new CPU model named SapphireRapids-TDX. It is a template of CPUID
for the TDX on SPR platform. Some CPU Info like FMS and model id is
retrived from a stepping E SPR SDP.
The listed features all belong to the TDX-supported CPUID bits, including:
* Native supported ones.
* TDX fixed1 features.
* Configurable (if native) && KVM-supported features; (If not supported
by KVM, exposing uncondionally may bring unexpected behavior).
In addition, the legacy KVM-specific default features are not fit for
TDX CPU model, so skip the assignment in x86_cpu_load_model.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Some native CPUID features would be mask off as they don't have a feature
name in feature_word_info. To avoid the misalignment with the native
value seen in TD guest, avoid do migratbale feature filter in TDX VM
case.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
tdx_fixed0/1 only tracks the setting in env->features[]. cpu_x86_cpuid is
called to construct the cpuid_data. It will do some adjustment based on
env->features, which could break the existing tdx_fixed0/1.
Avoid clearing CPUID_PDCM when pmu is disabled and TDX is enabled.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
In TDX, there are two types of features cannot be removed by
"-feature_name" QEMU command parameter:
1. the fixed1 features;
2. the XFAM controlled features;
Note, type 2, XFAM controlled features, is special that usually one
XFAM bit controls a set of related features. For example, XFAM[2]
controls the enabling of avx, avx2, veas, etc. Even though avx, avx2,
veas are not fixed1, they cannot be removed if it's controlling bit
XFAM[2] is configured 1. For simplicity, make all XFAM controlled
features as disallowed-minus except the controlling feature.
Prompt a warning and forbid the removal when "-cpu xxx,-feature_name" is
used to remove a disallowed-minus feature.
Co-developed-by: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
XFAM (eXtended Features Available Mask) is defiend as a 64b bitmap,
which uses the state-component bitmap format, the same as XCR0 or
IA32_XSS MSR. XFAM determines the set of extended features available for
use by the guest TD.
In TDX 1.0, the enumeration of XSAVE related features are controlled by
their corresponding XFAM bits. I.e., all AVX512 features are controlled
by XFAM[7:5] that if XFAM[7:5] is 000b, the avx512f, avx512dq,
avx512fma, ..., etc are 0.
Since XFAM setting is passed to KVM by CPUID leaf 0xD (specifically,
leaves FEAT_XSAVE_XCR0_LO, FEAT_XSAVE_XCR0_HI, FEAT_XSAVE_XSS_LO,
FEAT_XSAVE_XSS_HI) apply XFAM dependencies after xsave components are
finalized.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Expose mark_unavailable_features() out of cpu.c so that it can be used
by TDX when features are masked off.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Split user_features into two arrays. In x86_cpu_expand_features, feature
minus comes after the feature plus. Then minus has precedence when users
add "+feature,-feature" at the same time. As a result, the splited
user_minus_features[] has precedence over user_plus_features[].
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
When old TDVF that doesn't implement GUID is used, print out a log to
tell people to upgrade to new TDVF for upstream.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
1. hardcode the TDX metadata offset if no GUID found;
2. No GUID found means old TDVF and make HOB creation work for old version;
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
KVM_TDX_CAPABILITIES is changed from VM scope to back to KVM scope.
To keep qemu works with the both ABI, add workaround for this change.
Try KVM scope IOCTL first, and then VM scope if it fails.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
KVM vm_type ABI was changed so that KVM_X86_TDX_VM is now 1, it was 2.
To keep qemu works with the both ABI, add workaround the ABI change.
Try KVM_X86_TDX_VM=1, and then 2 if it fails.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
TDG.VP.VMCALL<GetQuote> in GHCI1.0 spec is changed so that the shared GPA
size is passed in R13. With The old GHCI R13 isn't used and the buffer size
is assumed to be fixed 8KBytes.
If R13 input value is zero, it assumes the old GetQuote ABI and handle it
for compatibility.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
MapGPA is a hypercall to convert GPA from/to private GPA to/from shared GPA.
As the conversion function is already implemented as kvm_convert_memory,
wire it to TDX hypercall exit.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
RFC: Currently the discussion is on-going. The GHCI spec would be updated.
The current behaviour is that qemu delivers interrupts to the vcpu that
issued TDG.VP.VMCALL<GetQuote>. Instead deliver interrupts that issued
TDG.VP.VMCALL<SetupEventNotifyInterrupt> for cpu affinity.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
To avoid no response from QGS server, setup a timer for the transaction. If
timeout, make it an error and interrupt guest. Define the threshold of time
to 30s at present, maybe change to other value if not appropriate.
Extract the common cleanup code to make it more clear.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
When TD guest invokes getquote tdvmcall, QEMU will register a async qio
task with default context when the qio channel is connected. However, as
there is a blocking action (recvmsg()) in qio_channel_read() and it will
block main thread and make TD guest have no response until the server
returns.
Set the io channel non-blocking and register the socket fd with the main
loop. Move the read operation into the callback. When the fd is readable,
inovke the callback to handle the quote data.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
For GetQuote, delegate a request to Quote Generation Service. Add property
of address of quote generation server and On request, connect to the
server, read request buffer from shared guest memory, send the request
buffer to the server and store the response into shared guest memory and
notify TD guest by interrupt.
"quote-generation-service" is a property to specify Quote Generation
Service(QGS) in qemu socket address format. The examples of the supported
format are "vsock:2:1234", "unix:/run/qgs", "localhost:1234".
command line example:
qemu-system-x86_64 \
-object 'tdx-guest,id=tdx0,quote-generation-service=localhost:1234' \
-machine confidential-guest-support=tdx0
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
For SetupEventNotifyInterrupt, record interrupt vector requested by
TDG.VP.VMCALL<SETUP_EVENT_NOTIFY_INTERRUPT> for GetQuote.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
When creating TDX vm, three sha384 hash values can be provided for
TDX attestation.
So far they were hard coded as 0. Now allow user to specify those values
via property mrconfigid, mrowner and mrownerconfig. String for those
properties are hex string of 48 * 2 length.
example
-object tdx-guest, \
mrconfigid=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef, \
mrowner=fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210, \
mrownerconfig=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Implement property_add_sha384() which converts hex string <-> uint8_t[48]
It will be used for TDX which uses sha384 for measurement.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
TDX requires x2apic and "resets" vCPUs to have x2apic enabled. Model
this in QEMU and unconditionally enable x2apic interrupt routing.
This fixes issues where interrupts from IRQFD would not get forwarded to
the guest due to KVM silently dropping the invalid routing entry.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Don't get/put state of TDX VMs since accessing/mutating guest state of
production TDs is not supported.
Note, it will be allowed for a debug TD. Corresponding support will be
introduced when debug TD support is implemented in the future.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
For TDs, only MSR_IA32_UCODE_REV in kvm_init_msrs() can be configured
by VMM, while the features enumerated/controlled by other MSRs except
MSR_IA32_UCODE_REV in kvm_init_msrs() are not under control of VMM.
Only configure MSR_IA32_UCODE_REV for TDs.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
TSC of TDs is not accessible and KVM doesn't allow access of
MSR_IA32_TSC for TDs. To avoid the assert() in kvm_get_tsc, make
kvm_synchronize_all_tsc() noop for TDs,
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
When level trigger isn't supported on x86 platform,
forcibly report edge trigger in acpi tables.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Add a new bool member, eoi_intercept_unsupported, to X86MachineState
with default value false. Set true for TDX VM.
Inability to intercept eoi causes impossibility to emulate level
triggered interrupt to be re-injected when level is still kept active.
which affects interrupt controller emulation.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
TDX CPU state is protected and thus vcpu state cann't be reset by VMM.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Legacy PIC (8259) cannot be supported for TDX VMs since TDX module
doesn't allow directly interrupt injection. Using posted interrupts
for the PIC is not a viable option as the guest BIOS/kernel will not
do EOI for PIC IRQs, i.e. will leave the vIRR bit set.
Hence disable PIC for TDX VMs and error out if user wants PIC.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
TDX doesn't support SMM and VMM cannot emulate SMM for TDX VMs because
VMM cannot manipulate TDX VM's memory.
Disable SMM for TDX VMs and error out if user requests to enable SMM.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Invoke KVM_TDX_FINALIZE_VM to finalize the TD's measurement and make
the TD vCPUs runnable once machine initialization is complete.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
TDX vcpu needs to be initialized by SEAMCALL(TDH.VP.INIT) and KVM
provides vcpu level IOCTL KVM_TDX_INIT_VCPU for it.
KVM_TDX_INIT_VCPU needs the address of the HOB as input. Invoke it for
each vcpu after HOB list is created.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Allocate the dedicated memory to hold bios image instead of re-using
mmapped guest memory because the initial memory conversion to private
memory wipes out the bios image by madvise(REMOVE).
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
TDVF firmware (CODE and VARS) needs to be added/copied to TD's private
memory via KVM_TDX_INIT_MEM_REGION, as well as TD HOB and TEMP memory.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
The TD HOB list is used to pass the information from VMM to TDVF. The TD
HOB must include PHIT HOB and Resource Descriptor HOB. More details can
be found in TDVF specification and PI specification.
Build the TD HOB in TDX's machine_init_done callback.
Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
The RAM of TDX VM can be classified into two types:
- TDX_RAM_UNACCEPTED: default type of TDX memory, which needs to be
accepted by TDX guest before it can be used and will be all-zeros
after being accepted.
- TDX_RAM_ADDED: the RAM that is ADD'ed to TD guest before running, and
can be used directly. E.g., TD HOB and TEMP MEM that needed by TDVF.
Maintain TdxRamEntries[] which grabs the initial RAM info from e820 table
and mark each RAM range as default type TDX_RAM_UNACCEPTED.
Then turn the range of TD HOB and TEMP MEM to TDX_RAM_ADDED since these
ranges will be ADD'ed before TD runs and no need to be accepted runtime.
The TdxRamEntries[] are later used to setup the memory TD resource HOB
that passes memory info from QEMU to TDVF.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
For each TDVF sections, QEMU needs to copy the content to guest
private memory via KVM API (KVM_TDX_INIT_MEM_REGION).
Introduce a field @mem_ptr for TdxFirmwareEntry to track the memory
pointer of each TDVF sections. So that QEMU can add/copy them to guest
private memory later.
TDVF sections can be classified into two groups:
- Firmware itself, e.g., TDVF BFV and CFV, that located separately from
guest RAM. Its memory pointer is the bios pointer.
- Sections located at guest RAM, e.g., TEMP_MEM and TD_HOB.
mmap a new memory range for them.
Register a machine_init_done callback to do the stuff.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
For TDX, the address below 1MB are entirely general RAM. No need to
initialize pc.rom memory region for TDs.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
TDX doesn't support map different GPAs to same private memory. Thus,
aliasing top 128KB of BIOS as isa-bios is not supported.
On the other hand, TDX guest cannot go to real mode, it can work fine
without isa-bios.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
TDX cannot support pflash device since it doesn't support read-only
memslot and doesn't support emulation. Load TDVF(OVMF) with -bios option
for TDs.
When boot a TD, besides load TDVF to the address below 4G, it needs
parse TDVF metadata.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
TDX VM needs to boot with its specialized firmware, Trusted Domain
Virtual Firmware (TDVF). QEMU needs to parse TDVF and map it in TD
guest memory prior to running the TDX VM.
A TDVF Metadata in TDVF image describes the structure of firmware.
QEMU refers to it to setup memory for TDVF. Introduce function
tdvf_parse_metadata() to parse the metadata from TDVF image and store
the info of each TDVF section.
TDX metadata is located by a TDX metadata offset block, which is a
GUID-ed structure. The data portion of the GUID structure contains
only an 4-byte field that is the offset of TDX metadata to the end
of firmware file.
Select X86_FW_OVMF when TDX is enable to leverage existing functions
to parse and search OVMF's GUID-ed structures.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
TDX only supports readonly for shared memory but not for private memory.
In the view of QEMU, it has no idea whether a memslot is used as shared
memory of private. Thus just mark kvm_readonly_mem_enabled to false to
TDX VM for simplicity.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>