xen/559bc633-x86-cpupool-clear-proper-cpu_valid-bit-on-CPU-teardown.patch

# Commit 8022b05284dea80e24813d03180788ec7277a0bd
# Date 2015-07-07 14:29:39 +0200
# Author Dario Faggioli <dario.faggioli@citrix.com>
# Committer Jan Beulich <jbeulich@suse.com>
x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown

In fact, when a pCPU goes down, we want to clear its
bit in the correct cpupool's valid mask, rather than
always in cpupool0's one.

Before this commit, all the pCPUs in the non-default
pool(s) will be considered immediately valid, during
system resume, even the one that have not been brought
up yet. As a result, the (Credit1) scheduler will attempt
to run its load balancing logic on them, causing the
following Oops:

# xl cpupool-cpu-remove Pool-0 8-15
# xl cpupool-create name=\"Pool-1\"
# xl cpupool-cpu-add Pool-1 8-15
--> suspend
--> resume
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    8
(XEN) RIP:    e008:[<ffff82d080123078>] csched_schedule+0x4be/0xb97
(XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
(XEN) rax: 80007d2f7fccb780   rbx: 0000000000000009   rcx: 0000000000000000
(XEN) rdx: ffff82d08031ed40   rsi: ffff82d080334980   rdi: 0000000000000000
(XEN) rbp: ffff83010000fe20   rsp: ffff83010000fd40   r8:  0000000000000004
(XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
(XEN) r12: ffff8303191ea870   r13: ffff8303226aadf0   r14: 0000000000000009
(XEN) r15: 0000000000000008   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 00000000dba9d000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) ... ... ...
(XEN) Xen call trace:
(XEN)    [<ffff82d080123078>] csched_schedule+0x4be/0xb97
(XEN)    [<ffff82d08012c732>] schedule+0x12a/0x63c
(XEN)    [<ffff82d08012f8c8>] __do_softirq+0x82/0x8d
(XEN)    [<ffff82d08012f920>] do_softirq+0x13/0x15
(XEN)    [<ffff82d080164791>] idle_loop+0x5b/0x6b
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 8:
(XEN) GENERAL PROTECTION FAULT
(XEN) [error_code=0000]
(XEN) ****************************************

The reason why the error is a #GP fault is that, without
this commit, we try to access the per-cpu area of a not
yet allocated and initialized pCPU.
In fact, %rax, which is what is used as pointer, is
80007d2f7fccb780, and we also have this:

#define INVALID_PERCPU_AREA (0x8000000000000000L - (long)__per_cpu_start)

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>

--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -816,7 +816,6 @@ void __cpu_disable(void)
     remove_siblinginfo(cpu);
 
     /* It's now safe to remove this processor from the online map */
-    cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
     cpumask_clear_cpu(cpu, &cpu_online_map);
     fixup_irqs();
 
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -529,6 +529,7 @@ static int cpupool_cpu_remove(unsigned i
             if ( cpumask_test_cpu(cpu, (*c)->cpu_valid ) )
             {
                 cpumask_set_cpu(cpu, (*c)->cpu_suspended);
+                cpumask_clear_cpu(cpu, (*c)->cpu_valid);
                 break;
             }
         }
@@ -551,6 +552,7 @@ static int cpupool_cpu_remove(unsigned i
          * If we are not suspending, we are hot-unplugging cpu, and that is
          * allowed only for CPUs in pool0.
          */
+        cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
         ret = 0;
     }
- bnc#935634 - VUL-0: CVE-2015-3259: xen: XSA-137: xl command line config handling stack overflow 55a62eb0-xl-correct-handling-of-extra_config-in-main_cpupoolcreate.patch - bsc#907514 - Bus fatal error & sles12 sudden reboot has been observed - bsc#910258 - SLES12 Xen host crashes with FATAL NMI after shutdown of guest with VT-d NIC - bsc#918984 - Bus fatal error & sles11-SP4 sudden reboot has been observed - bsc#923967 - Partner-L3: Bus fatal error & sles11-SP3 sudden reboot has been observed 552d293b-x86-vMSI-X-honor-all-mask-requests.patch 552d2966-x86-vMSI-X-add-valid-bits-for-read-acceleration.patch 5576f143-x86-adjust-PV-I-O-emulation-functions-types.patch 55795a52-x86-vMSI-X-support-qword-MMIO-access.patch 5583d9c5-x86-MSI-X-cleanup.patch 5583da09-x86-MSI-track-host-and-guest-masking-separately.patch 55b0a218-x86-PCI-CFG-write-intercept.patch 55b0a255-x86-MSI-X-maskall.patch 55b0a283-x86-MSI-X-teardown.patch 55b0a2ab-x86-MSI-X-enable.patch 55b0a2db-x86-MSI-track-guest-masking.patch - Upstream patches from Jan 552d0f49-x86-traps-identify-the-vcpu-in-context-when-dumping-regs.patch 559bc633-x86-cpupool-clear-proper-cpu_valid-bit-on-CPU-teardown.patch 559bc64e-credit1-properly-deal-with-CPUs-not-in-any-pool.patch 559bc87f-x86-hvmloader-avoid-data-corruption-with-xenstore-rw.patch 55a66a1e-make-rangeset_report_ranges-report-all-ranges.patch 55a77e4f-dmar-device-scope-mem-leak-fix.patch OBS-URL: https://build.opensuse.org/package/show/Virtualization/xen?expand=0&rev=373 2015-08-27 00:28:15 +02:00			`# Commit 8022b05284dea80e24813d03180788ec7277a0bd`
			`# Date 2015-07-07 14:29:39 +0200`
			`# Author Dario Faggioli <dario.faggioli@citrix.com>`
			`# Committer Jan Beulich <jbeulich@suse.com>`
			`x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown`

			`In fact, when a pCPU goes down, we want to clear its`
			`bit in the correct cpupool's valid mask, rather than`
			`always in cpupool0's one.`

			`Before this commit, all the pCPUs in the non-default`
			`pool(s) will be considered immediately valid, during`
			`system resume, even the one that have not been brought`
			`up yet. As a result, the (Credit1) scheduler will attempt`
			`to run its load balancing logic on them, causing the`
			`following Oops:`

			`# xl cpupool-cpu-remove Pool-0 8-15`
			`# xl cpupool-create name=\"Pool-1\"`
			`# xl cpupool-cpu-add Pool-1 8-15`
			`--> suspend`
			`--> resume`
			`(XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]----`
			`(XEN) CPU: 8`
			`(XEN) RIP: e008:[<ffff82d080123078>] csched_schedule+0x4be/0xb97`
			`(XEN) RFLAGS: 0000000000010087 CONTEXT: hypervisor`
			`(XEN) rax: 80007d2f7fccb780 rbx: 0000000000000009 rcx: 0000000000000000`
			`(XEN) rdx: ffff82d08031ed40 rsi: ffff82d080334980 rdi: 0000000000000000`
			`(XEN) rbp: ffff83010000fe20 rsp: ffff83010000fd40 r8: 0000000000000004`
			`(XEN) r9: 0000ffff0000ffff r10: 00ff00ff00ff00ff r11: 0f0f0f0f0f0f0f0f`
			`(XEN) r12: ffff8303191ea870 r13: ffff8303226aadf0 r14: 0000000000000009`
			`(XEN) r15: 0000000000000008 cr0: 000000008005003b cr4: 00000000000026f0`
			`(XEN) cr3: 00000000dba9d000 cr2: 0000000000000000`
			`(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008`
			`(XEN) ... ... ...`
			`(XEN) Xen call trace:`
			`(XEN) [<ffff82d080123078>] csched_schedule+0x4be/0xb97`
			`(XEN) [<ffff82d08012c732>] schedule+0x12a/0x63c`
			`(XEN) [<ffff82d08012f8c8>] __do_softirq+0x82/0x8d`
			`(XEN) [<ffff82d08012f920>] do_softirq+0x13/0x15`
			`(XEN) [<ffff82d080164791>] idle_loop+0x5b/0x6b`
			`(XEN)`
			`(XEN) ****************************************`
			`(XEN) Panic on CPU 8:`
			`(XEN) GENERAL PROTECTION FAULT`
			`(XEN) [error_code=0000]`
			`(XEN) ****************************************`

			`The reason why the error is a #GP fault is that, without`
			`this commit, we try to access the per-cpu area of a not`
			`yet allocated and initialized pCPU.`
			`In fact, %rax, which is what is used as pointer, is`
			`80007d2f7fccb780, and we also have this:`

			`#define INVALID_PERCPU_AREA (0x8000000000000000L - (long)__per_cpu_start)`

			`Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>`
			`Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>`
			`Acked-by: Juergen Gross <jgross@suse.com>`

			`--- a/xen/arch/x86/smpboot.c`
			`+++ b/xen/arch/x86/smpboot.c`
			`@@ -816,7 +816,6 @@ void __cpu_disable(void)`
			`remove_siblinginfo(cpu);`

			`/* It's now safe to remove this processor from the online map */`
			`- cpumask_clear_cpu(cpu, cpupool0->cpu_valid);`
			`cpumask_clear_cpu(cpu, &cpu_online_map);`
			`fixup_irqs();`

			`--- a/xen/common/cpupool.c`
			`+++ b/xen/common/cpupool.c`
			`@@ -529,6 +529,7 @@ static int cpupool_cpu_remove(unsigned i`
			`if ( cpumask_test_cpu(cpu, (*c)->cpu_valid ) )`
			`{`
			`cpumask_set_cpu(cpu, (*c)->cpu_suspended);`
			`+ cpumask_clear_cpu(cpu, (*c)->cpu_valid);`
			`break;`
			`}`
			`}`
			`@@ -551,6 +552,7 @@ static int cpupool_cpu_remove(unsigned i`
			`* If we are not suspending, we are hot-unplugging cpu, and that is`
			`* allowed only for CPUs in pool0.`
			`*/`
			`+ cpumask_clear_cpu(cpu, cpupool0->cpu_valid);`
			`ret = 0;`
			`}`