518 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
		
		
			
		
	
	
			518 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
|   | Backwards compatibility
 | ||
|  | =======================
 | ||
|  | 
 | ||
|  | How backwards compatibility works
 | ||
|  | ---------------------------------
 | ||
|  | 
 | ||
|  | When we do migration, we have two QEMU processes: the source and the
 | ||
|  | target.  There are two cases, they are the same version or they are
 | ||
|  | different versions.  The easy case is when they are the same version.
 | ||
|  | The difficult one is when they are different versions.
 | ||
|  | 
 | ||
|  | There are two things that are different, but they have very similar
 | ||
|  | names and sometimes get confused:
 | ||
|  | 
 | ||
|  | - QEMU version
 | ||
|  | - machine type version
 | ||
|  | 
 | ||
|  | Let's start with a practical example, we start with:
 | ||
|  | 
 | ||
|  | - qemu-system-x86_64 (v5.2), from now on qemu-5.2.
 | ||
|  | - qemu-system-x86_64 (v5.1), from now on qemu-5.1.
 | ||
|  | 
 | ||
|  | Related to this are the "latest" machine types defined on each of
 | ||
|  | them:
 | ||
|  | 
 | ||
|  | - pc-q35-5.2 (newer one in qemu-5.2) from now on pc-5.2
 | ||
|  | - pc-q35-5.1 (newer one in qemu-5.1) from now on pc-5.1
 | ||
|  | 
 | ||
|  | First of all, migration is only supposed to work if you use the same
 | ||
|  | machine type in both source and destination. The QEMU hardware
 | ||
|  | configuration needs to be the same also on source and destination.
 | ||
|  | Most aspects of the backend configuration can be changed at will,
 | ||
|  | except for a few cases where the backend features influence frontend
 | ||
|  | device feature exposure.  But that is not relevant for this section.
 | ||
|  | 
 | ||
|  | I am going to list the number of combinations that we can have.  Let's
 | ||
|  | start with the trivial ones, QEMU is the same on source and
 | ||
|  | destination:
 | ||
|  | 
 | ||
|  | 1 - qemu-5.2 -M pc-5.2  -> migrates to -> qemu-5.2 -M pc-5.2
 | ||
|  | 
 | ||
|  |   This is the latest QEMU with the latest machine type.
 | ||
|  |   This have to work, and if it doesn't work it is a bug.
 | ||
|  | 
 | ||
|  | 2 - qemu-5.1 -M pc-5.1  -> migrates to -> qemu-5.1 -M pc-5.1
 | ||
|  | 
 | ||
|  |   Exactly the same case than the previous one, but for 5.1.
 | ||
|  |   Nothing to see here either.
 | ||
|  | 
 | ||
|  | This are the easiest ones, we will not talk more about them in this
 | ||
|  | section.
 | ||
|  | 
 | ||
|  | Now we start with the more interesting cases.  Consider the case where
 | ||
|  | we have the same QEMU version in both sides (qemu-5.2) but we are using
 | ||
|  | the latest machine type for that version (pc-5.2) but one of an older
 | ||
|  | QEMU version, in this case pc-5.1.
 | ||
|  | 
 | ||
|  | 3 - qemu-5.2 -M pc-5.1  -> migrates to -> qemu-5.2 -M pc-5.1
 | ||
|  | 
 | ||
|  |   It needs to use the definition of pc-5.1 and the devices as they
 | ||
|  |   were configured on 5.1, but this should be easy in the sense that
 | ||
|  |   both sides are the same QEMU and both sides have exactly the same
 | ||
|  |   idea of what the pc-5.1 machine is.
 | ||
|  | 
 | ||
|  | 4 - qemu-5.1 -M pc-5.2  -> migrates to -> qemu-5.1 -M pc-5.2
 | ||
|  | 
 | ||
|  |   This combination is not possible as the qemu-5.1 doesn't understand
 | ||
|  |   pc-5.2 machine type.  So nothing to worry here.
 | ||
|  | 
 | ||
|  | Now it comes the interesting ones, when both QEMU processes are
 | ||
|  | different.  Notice also that the machine type needs to be pc-5.1,
 | ||
|  | because we have the limitation than qemu-5.1 doesn't know pc-5.2.  So
 | ||
|  | the possible cases are:
 | ||
|  | 
 | ||
|  | 5 - qemu-5.2 -M pc-5.1  -> migrates to -> qemu-5.1 -M pc-5.1
 | ||
|  | 
 | ||
|  |   This migration is known as newer to older.  We need to make sure
 | ||
|  |   when we are developing 5.2 we need to take care about not to break
 | ||
|  |   migration to qemu-5.1.  Notice that we can't make updates to
 | ||
|  |   qemu-5.1 to understand whatever qemu-5.2 decides to change, so it is
 | ||
|  |   in qemu-5.2 side to make the relevant changes.
 | ||
|  | 
 | ||
|  | 6 - qemu-5.1 -M pc-5.1  -> migrates to -> qemu-5.2 -M pc-5.1
 | ||
|  | 
 | ||
|  |   This migration is known as older to newer.  We need to make sure
 | ||
|  |   than we are able to receive migrations from qemu-5.1. The problem is
 | ||
|  |   similar to the previous one.
 | ||
|  | 
 | ||
|  | If qemu-5.1 and qemu-5.2 were the same, there will not be any
 | ||
|  | compatibility problems.  But the reason that we create qemu-5.2 is to
 | ||
|  | get new features, devices, defaults, etc.
 | ||
|  | 
 | ||
|  | If we get a device that has a new feature, or change a default value,
 | ||
|  | we have a problem when we try to migrate between different QEMU
 | ||
|  | versions.
 | ||
|  | 
 | ||
|  | So we need a way to tell qemu-5.2 that when we are using machine type
 | ||
|  | pc-5.1, it needs to **not** use the feature, to be able to migrate to
 | ||
|  | real qemu-5.1.
 | ||
|  | 
 | ||
|  | And the equivalent part when migrating from qemu-5.1 to qemu-5.2.
 | ||
|  | qemu-5.2 has to expect that it is not going to get data for the new
 | ||
|  | feature, because qemu-5.1 doesn't know about it.
 | ||
|  | 
 | ||
|  | How do we tell QEMU about these device feature changes?  In
 | ||
|  | hw/core/machine.c:hw_compat_X_Y arrays.
 | ||
|  | 
 | ||
|  | If we change a default value, we need to put back the old value on
 | ||
|  | that array.  And the device, during initialization needs to look at
 | ||
|  | that array to see what value it needs to get for that feature.  And
 | ||
|  | what are we going to put in that array, the value of a property.
 | ||
|  | 
 | ||
|  | To create a property for a device, we need to use one of the
 | ||
|  | DEFINE_PROP_*() macros. See include/hw/qdev-properties.h to find the
 | ||
|  | macros that exist.  With it, we set the default value for that
 | ||
|  | property, and that is what it is going to get in the latest released
 | ||
|  | version.  But if we want a different value for a previous version, we
 | ||
|  | can change that in the hw_compat_X_Y arrays.
 | ||
|  | 
 | ||
|  | hw_compat_X_Y is an array of registers that have the format:
 | ||
|  | 
 | ||
|  | - name_device
 | ||
|  | - name_property
 | ||
|  | - value
 | ||
|  | 
 | ||
|  | Let's see a practical example.
 | ||
|  | 
 | ||
|  | In qemu-5.2 virtio-blk-device got multi queue support.  This is a
 | ||
|  | change that is not backward compatible.  In qemu-5.1 it has one
 | ||
|  | queue. In qemu-5.2 it has the same number of queues as the number of
 | ||
|  | cpus in the system.
 | ||
|  | 
 | ||
|  | When we are doing migration, if we migrate from a device that has 4
 | ||
|  | queues to a device that have only one queue, we don't know where to
 | ||
|  | put the extra information for the other 3 queues, and we fail
 | ||
|  | migration.
 | ||
|  | 
 | ||
|  | Similar problem when we migrate from qemu-5.1 that has only one queue
 | ||
|  | to qemu-5.2, we only sent information for one queue, but destination
 | ||
|  | has 4, and we have 3 queues that are not properly initialized and
 | ||
|  | anything can happen.
 | ||
|  | 
 | ||
|  | So, how can we address this problem.  Easy, just convince qemu-5.2
 | ||
|  | that when it is running pc-5.1, it needs to set the number of queues
 | ||
|  | for virtio-blk-devices to 1.
 | ||
|  | 
 | ||
|  | That way we fix the cases 5 and 6.
 | ||
|  | 
 | ||
|  | 5 - qemu-5.2 -M pc-5.1  -> migrates to -> qemu-5.1 -M pc-5.1
 | ||
|  | 
 | ||
|  |     qemu-5.2 -M pc-5.1 sets number of queues to be 1.
 | ||
|  |     qemu-5.1 -M pc-5.1 expects number of queues to be 1.
 | ||
|  | 
 | ||
|  |     correct.  migration works.
 | ||
|  | 
 | ||
|  | 6 - qemu-5.1 -M pc-5.1  -> migrates to -> qemu-5.2 -M pc-5.1
 | ||
|  | 
 | ||
|  |     qemu-5.1 -M pc-5.1 sets number of queues to be 1.
 | ||
|  |     qemu-5.2 -M pc-5.1 expects number of queues to be 1.
 | ||
|  | 
 | ||
|  |     correct.  migration works.
 | ||
|  | 
 | ||
|  | And now the other interesting case, case 3.  In this case we have:
 | ||
|  | 
 | ||
|  | 3 - qemu-5.2 -M pc-5.1  -> migrates to -> qemu-5.2 -M pc-5.1
 | ||
|  | 
 | ||
|  |     Here we have the same QEMU in both sides.  So it doesn't matter a
 | ||
|  |     lot if we have set the number of queues to 1 or not, because
 | ||
|  |     they are the same.
 | ||
|  | 
 | ||
|  |     WRONG!
 | ||
|  | 
 | ||
|  |     Think what happens if we do one of this double migrations:
 | ||
|  | 
 | ||
|  |     A -> migrates -> B -> migrates -> C
 | ||
|  | 
 | ||
|  |     where:
 | ||
|  | 
 | ||
|  |     A: qemu-5.1 -M pc-5.1
 | ||
|  |     B: qemu-5.2 -M pc-5.1
 | ||
|  |     C: qemu-5.2 -M pc-5.1
 | ||
|  | 
 | ||
|  |     migration A -> B is case 6, so number of queues needs to be 1.
 | ||
|  | 
 | ||
|  |     migration B -> C is case 3, so we don't care.  But actually we
 | ||
|  |     care because we haven't started the guest in qemu-5.2, it came
 | ||
|  |     migrated from qemu-5.1.  So to be in the safe place, we need to
 | ||
|  |     always use number of queues 1 when we are using pc-5.1.
 | ||
|  | 
 | ||
|  | Now, how was this done in reality?  The following commit shows how it
 | ||
|  | was done::
 | ||
|  | 
 | ||
|  |   commit 9445e1e15e66c19e42bea942ba810db28052cd05
 | ||
|  |   Author: Stefan Hajnoczi <stefanha@redhat.com>
 | ||
|  |   Date:   Tue Aug 18 15:33:47 2020 +0100
 | ||
|  | 
 | ||
|  |   virtio-blk-pci: default num_queues to -smp N
 | ||
|  | 
 | ||
|  | The relevant parts for migration are::
 | ||
|  | 
 | ||
|  |     @@ -1281,7 +1284,8 @@ static Property virtio_blk_properties[] = {
 | ||
|  |      #endif
 | ||
|  |          DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0,
 | ||
|  |                          true),
 | ||
|  |     -    DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1),
 | ||
|  |     +    DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues,
 | ||
|  |     +                       VIRTIO_BLK_AUTO_NUM_QUEUES),
 | ||
|  |          DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 256),
 | ||
|  | 
 | ||
|  | It changes the default value of num_queues.  But it fishes it for old
 | ||
|  | machine types to have the right value::
 | ||
|  | 
 | ||
|  |     @@ -31,6 +31,7 @@
 | ||
|  |      GlobalProperty hw_compat_5_1[] = {
 | ||
|  |          ...
 | ||
|  |     +    { "virtio-blk-device", "num-queues", "1"},
 | ||
|  |          ...
 | ||
|  |      };
 | ||
|  | 
 | ||
|  | A device with different features on both sides
 | ||
|  | ----------------------------------------------
 | ||
|  | 
 | ||
|  | Let's assume that we are using the same QEMU binary on both sides,
 | ||
|  | just to make the things easier.  But we have a device that has
 | ||
|  | different features on both sides of the migration.  That can be
 | ||
|  | because the devices are different, because the kernel driver of both
 | ||
|  | devices have different features, whatever.
 | ||
|  | 
 | ||
|  | How can we get this to work with migration.  The way to do that is
 | ||
|  | "theoretically" easy.  You have to get the features that the device
 | ||
|  | has in the source of the migration.  The features that the device has
 | ||
|  | on the target of the migration, you get the intersection of the
 | ||
|  | features of both sides, and that is the way that you should launch
 | ||
|  | QEMU.
 | ||
|  | 
 | ||
|  | Notice that this is not completely related to QEMU.  The most
 | ||
|  | important thing here is that this should be handled by the managing
 | ||
|  | application that launches QEMU.  If QEMU is configured correctly, the
 | ||
|  | migration will succeed.
 | ||
|  | 
 | ||
|  | That said, actually doing it is complicated.  Almost all devices are
 | ||
|  | bad at being able to be launched with only some features enabled.
 | ||
|  | With one big exception: cpus.
 | ||
|  | 
 | ||
|  | You can read the documentation for QEMU x86 cpu models here:
 | ||
|  | 
 | ||
|  | https://qemu-project.gitlab.io/qemu/system/qemu-cpu-models.html
 | ||
|  | 
 | ||
|  | See when they talk about migration they recommend that one chooses the
 | ||
|  | newest cpu model that is supported for all cpus.
 | ||
|  | 
 | ||
|  | Let's say that we have:
 | ||
|  | 
 | ||
|  | Host A:
 | ||
|  | 
 | ||
|  | Device X has the feature Y
 | ||
|  | 
 | ||
|  | Host B:
 | ||
|  | 
 | ||
|  | Device X has not the feature Y
 | ||
|  | 
 | ||
|  | If we try to migrate without any care from host A to host B, it will
 | ||
|  | fail because when migration tries to load the feature Y on
 | ||
|  | destination, it will find that the hardware is not there.
 | ||
|  | 
 | ||
|  | Doing this would be the equivalent of doing with cpus:
 | ||
|  | 
 | ||
|  | Host A:
 | ||
|  | 
 | ||
|  | $ qemu-system-x86_64 -cpu host
 | ||
|  | 
 | ||
|  | Host B:
 | ||
|  | 
 | ||
|  | $ qemu-system-x86_64 -cpu host
 | ||
|  | 
 | ||
|  | When both hosts have different cpu features this is guaranteed to
 | ||
|  | fail.  Especially if Host B has less features than host A.  If host A
 | ||
|  | has less features than host B, sometimes it works.  Important word of
 | ||
|  | last sentence is "sometimes".
 | ||
|  | 
 | ||
|  | So, forgetting about cpu models and continuing with the -cpu host
 | ||
|  | example, let's see that the differences of the cpus is that Host A and
 | ||
|  | B have the following features:
 | ||
|  | 
 | ||
|  | Features:   'pcid'  'stibp' 'taa-no'
 | ||
|  | Host A:        X       X
 | ||
|  | Host B:                        X
 | ||
|  | 
 | ||
|  | And we want to migrate between them, the way configure both QEMU cpu
 | ||
|  | will be:
 | ||
|  | 
 | ||
|  | Host A:
 | ||
|  | 
 | ||
|  | $ qemu-system-x86_64 -cpu host,pcid=off,stibp=off
 | ||
|  | 
 | ||
|  | Host B:
 | ||
|  | 
 | ||
|  | $ qemu-system-x86_64 -cpu host,taa-no=off
 | ||
|  | 
 | ||
|  | And you would be able to migrate between them.  It is responsibility
 | ||
|  | of the management application or of the user to make sure that the
 | ||
|  | configuration is correct.  QEMU doesn't know how to look at this kind
 | ||
|  | of features in general.
 | ||
|  | 
 | ||
|  | Notice that we don't recommend to use -cpu host for migration.  It is
 | ||
|  | used in this example because it makes the example simpler.
 | ||
|  | 
 | ||
|  | Other devices have worse control about individual features.  If they
 | ||
|  | want to be able to migrate between hosts that show different features,
 | ||
|  | the device needs a way to configure which ones it is going to use.
 | ||
|  | 
 | ||
|  | In this section we have considered that we are using the same QEMU
 | ||
|  | binary in both sides of the migration.  If we use different QEMU
 | ||
|  | versions process, then we need to have into account all other
 | ||
|  | differences and the examples become even more complicated.
 | ||
|  | 
 | ||
|  | How to mitigate when we have a backward compatibility error
 | ||
|  | -----------------------------------------------------------
 | ||
|  | 
 | ||
|  | We broke migration for old machine types continuously during
 | ||
|  | development.  But as soon as we find that there is a problem, we fix
 | ||
|  | it.  The problem is what happens when we detect after we have done a
 | ||
|  | release that something has gone wrong.
 | ||
|  | 
 | ||
|  | Let see how it worked with one example.
 | ||
|  | 
 | ||
|  | After the release of qemu-8.0 we found a problem when doing migration
 | ||
|  | of the machine type pc-7.2.
 | ||
|  | 
 | ||
|  | - $ qemu-7.2 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
 | ||
|  | 
 | ||
|  |   This migration works
 | ||
|  | 
 | ||
|  | - $ qemu-8.0 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
 | ||
|  | 
 | ||
|  |   This migration works
 | ||
|  | 
 | ||
|  | - $ qemu-8.0 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
 | ||
|  | 
 | ||
|  |   This migration fails
 | ||
|  | 
 | ||
|  | - $ qemu-7.2 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
 | ||
|  | 
 | ||
|  |   This migration fails
 | ||
|  | 
 | ||
|  | So clearly something fails when migration between qemu-7.2 and
 | ||
|  | qemu-8.0 with machine type pc-7.2.  The error messages, and git bisect
 | ||
|  | pointed to this commit.
 | ||
|  | 
 | ||
|  | In qemu-8.0 we got this commit::
 | ||
|  | 
 | ||
|  |     commit 010746ae1db7f52700cb2e2c46eb94f299cfa0d2
 | ||
|  |     Author: Jonathan Cameron <Jonathan.Cameron@huawei.com>
 | ||
|  |     Date:   Thu Mar 2 13:37:02 2023 +0000
 | ||
|  | 
 | ||
|  |     hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register
 | ||
|  | 
 | ||
|  | 
 | ||
|  | The relevant bits of the commit for our example are this ones::
 | ||
|  | 
 | ||
|  |     --- a/hw/pci/pcie_aer.c
 | ||
|  |     +++ b/hw/pci/pcie_aer.c
 | ||
|  |     @@ -112,6 +112,10 @@ int pcie_aer_init(PCIDevice *dev,
 | ||
|  | 
 | ||
|  |          pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
 | ||
|  |                       PCI_ERR_UNC_SUPPORTED);
 | ||
|  |     +    pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
 | ||
|  |     +                 PCI_ERR_UNC_MASK_DEFAULT);
 | ||
|  |     +    pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
 | ||
|  |     +                 PCI_ERR_UNC_SUPPORTED);
 | ||
|  | 
 | ||
|  |          pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER,
 | ||
|  |                      PCI_ERR_UNC_SEVERITY_DEFAULT);
 | ||
|  | 
 | ||
|  | The patch changes how we configure PCI space for AER.  But QEMU fails
 | ||
|  | when the PCI space configuration is different between source and
 | ||
|  | destination.
 | ||
|  | 
 | ||
|  | The following commit shows how this got fixed::
 | ||
|  | 
 | ||
|  |     commit 5ed3dabe57dd9f4c007404345e5f5bf0e347317f
 | ||
|  |     Author: Leonardo Bras <leobras@redhat.com>
 | ||
|  |     Date:   Tue May 2 21:27:02 2023 -0300
 | ||
|  | 
 | ||
|  |     hw/pci: Disable PCI_ERR_UNCOR_MASK register for machine type < 8.0
 | ||
|  | 
 | ||
|  |     [...]
 | ||
|  | 
 | ||
|  | The relevant parts of the fix in QEMU are as follow:
 | ||
|  | 
 | ||
|  | First, we create a new property for the device to be able to configure
 | ||
|  | the old behaviour or the new behaviour::
 | ||
|  | 
 | ||
|  |     diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 | ||
|  |     index 8a87ccc8b0..5153ad63d6 100644
 | ||
|  |     --- a/hw/pci/pci.c
 | ||
|  |     +++ b/hw/pci/pci.c
 | ||
|  |     @@ -79,6 +79,8 @@ static Property pci_props[] = {
 | ||
|  |          DEFINE_PROP_STRING("failover_pair_id", PCIDevice,
 | ||
|  |                             failover_pair_id),
 | ||
|  |          DEFINE_PROP_UINT32("acpi-index",  PCIDevice, acpi_index, 0),
 | ||
|  |     +    DEFINE_PROP_BIT("x-pcie-err-unc-mask", PCIDevice, cap_present,
 | ||
|  |     +                    QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
 | ||
|  |          DEFINE_PROP_END_OF_LIST()
 | ||
|  |      };
 | ||
|  | 
 | ||
|  | Notice that we enable the feature for new machine types.
 | ||
|  | 
 | ||
|  | Now we see how the fix is done.  This is going to depend on what kind
 | ||
|  | of breakage happens, but in this case it is quite simple::
 | ||
|  | 
 | ||
|  |     diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
 | ||
|  |     index 103667c368..374d593ead 100644
 | ||
|  |     --- a/hw/pci/pcie_aer.c
 | ||
|  |     +++ b/hw/pci/pcie_aer.c
 | ||
|  |     @@ -112,10 +112,13 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver,
 | ||
|  |     uint16_t offset,
 | ||
|  | 
 | ||
|  |          pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
 | ||
|  |                       PCI_ERR_UNC_SUPPORTED);
 | ||
|  |     -    pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
 | ||
|  |     -                 PCI_ERR_UNC_MASK_DEFAULT);
 | ||
|  |     -    pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
 | ||
|  |     -                 PCI_ERR_UNC_SUPPORTED);
 | ||
|  |     +
 | ||
|  |     +    if (dev->cap_present & QEMU_PCIE_ERR_UNC_MASK) {
 | ||
|  |     +        pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
 | ||
|  |     +                     PCI_ERR_UNC_MASK_DEFAULT);
 | ||
|  |     +        pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
 | ||
|  |     +                     PCI_ERR_UNC_SUPPORTED);
 | ||
|  |     +    }
 | ||
|  | 
 | ||
|  |          pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER,
 | ||
|  |                       PCI_ERR_UNC_SEVERITY_DEFAULT);
 | ||
|  | 
 | ||
|  | I.e. If the property bit is enabled, we configure it as we did for
 | ||
|  | qemu-8.0.  If the property bit is not set, we configure it as it was in 7.2.
 | ||
|  | 
 | ||
|  | And now, everything that is missing is disabling the feature for old
 | ||
|  | machine types::
 | ||
|  | 
 | ||
|  |     diff --git a/hw/core/machine.c b/hw/core/machine.c
 | ||
|  |     index 47a34841a5..07f763eb2e 100644
 | ||
|  |     --- a/hw/core/machine.c
 | ||
|  |     +++ b/hw/core/machine.c
 | ||
|  |     @@ -48,6 +48,7 @@ GlobalProperty hw_compat_7_2[] = {
 | ||
|  |          { "e1000e", "migrate-timadj", "off" },
 | ||
|  |          { "virtio-mem", "x-early-migration", "false" },
 | ||
|  |          { "migration", "x-preempt-pre-7-2", "true" },
 | ||
|  |     +    { TYPE_PCI_DEVICE, "x-pcie-err-unc-mask", "off" },
 | ||
|  |      };
 | ||
|  |      const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
 | ||
|  | 
 | ||
|  | And now, when qemu-8.0.1 is released with this fix, all combinations
 | ||
|  | are going to work as supposed.
 | ||
|  | 
 | ||
|  | - $ qemu-7.2 -M pc-7.2  ->  qemu-7.2 -M pc-7.2 (works)
 | ||
|  | - $ qemu-8.0.1 -M pc-7.2  ->  qemu-8.0.1 -M pc-7.2 (works)
 | ||
|  | - $ qemu-8.0.1 -M pc-7.2  ->  qemu-7.2 -M pc-7.2 (works)
 | ||
|  | - $ qemu-7.2 -M pc-7.2  ->  qemu-8.0.1 -M pc-7.2 (works)
 | ||
|  | 
 | ||
|  | So the normality has been restored and everything is ok, no?
 | ||
|  | 
 | ||
|  | Not really, now our matrix is much bigger.  We started with the easy
 | ||
|  | cases, migration from the same version to the same version always
 | ||
|  | works:
 | ||
|  | 
 | ||
|  | - $ qemu-7.2 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
 | ||
|  | - $ qemu-8.0 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
 | ||
|  | - $ qemu-8.0.1 -M pc-7.2  ->  qemu-8.0.1 -M pc-7.2
 | ||
|  | 
 | ||
|  | Now the interesting ones.  When the QEMU processes versions are
 | ||
|  | different.  For the 1st set, their fail and we can do nothing, both
 | ||
|  | versions are released and we can't change anything.
 | ||
|  | 
 | ||
|  | - $ qemu-7.2 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
 | ||
|  | - $ qemu-8.0 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
 | ||
|  | 
 | ||
|  | This two are the ones that work. The whole point of making the
 | ||
|  | change in qemu-8.0.1 release was to fix this issue:
 | ||
|  | 
 | ||
|  | - $ qemu-7.2 -M pc-7.2  ->  qemu-8.0.1 -M pc-7.2
 | ||
|  | - $ qemu-8.0.1 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
 | ||
|  | 
 | ||
|  | But now we found that qemu-8.0 neither can migrate to qemu-7.2 not
 | ||
|  | qemu-8.0.1.
 | ||
|  | 
 | ||
|  | - $ qemu-8.0 -M pc-7.2  ->  qemu-8.0.1 -M pc-7.2
 | ||
|  | - $ qemu-8.0.1 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
 | ||
|  | 
 | ||
|  | So, if we start a pc-7.2 machine in qemu-8.0 we can't migrate it to
 | ||
|  | anything except to qemu-8.0.
 | ||
|  | 
 | ||
|  | Can we do better?
 | ||
|  | 
 | ||
|  | Yeap.  If we know that we are going to do this migration:
 | ||
|  | 
 | ||
|  | - $ qemu-8.0 -M pc-7.2  ->  qemu-8.0.1 -M pc-7.2
 | ||
|  | 
 | ||
|  | We can launch the appropriate devices with::
 | ||
|  | 
 | ||
|  |   --device...,x-pci-e-err-unc-mask=on
 | ||
|  | 
 | ||
|  | And now we can receive a migration from 8.0.  And from now on, we can
 | ||
|  | do that migration to new machine types if we remember to enable that
 | ||
|  | property for pc-7.2.  Notice that we need to remember, it is not
 | ||
|  | enough to know that the source of the migration is qemu-8.0.  Think of
 | ||
|  | this example:
 | ||
|  | 
 | ||
|  | $ qemu-8.0 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 -> qemu-8.2 -M pc-7.2
 | ||
|  | 
 | ||
|  | In the second migration, the source is not qemu-8.0, but we still have
 | ||
|  | that "problem" and have that property enabled.  Notice that we need to
 | ||
|  | continue having this mark/property until we have this machine
 | ||
|  | rebooted.  But it is not a normal reboot (that don't reload QEMU) we
 | ||
|  | need the machine to poweroff/poweron on a fixed QEMU.  And from now
 | ||
|  | on we can use the proper real machine.
 |