Message ID | 20190710052018.14628-1-sjitindarsingh@gmail.com |
---|---|
State | Accepted |
Headers | show |
Series | powerpc: mm: Limit rma_size to 1TB when running without HV mode | expand |
On Wed, Jul 10, 2019 at 03:20:18PM +1000, Suraj Jitindar Singh wrote: > The virtual real mode addressing (VRMA) mechanism is used when a > partition is using HPT (Hash Page Table) translation and performs > real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this > mode effective address bits 0:23 are treated as zero (i.e. the access > is aliased to 0) and the access is performed using an implicit 1TB SLB > entry. > > The size of the RMA (Real Memory Area) is communicated to the guest as > the size of the first memory region in the device tree. And because of > the mechanism described above can be expected to not exceed 1TB. In the > event that the host erroneously represents the RMA as being larger than > 1TB, guest accesses in real mode to memory addresses above 1TB will be > aliased down to below 1TB. This means that a memory access performed in > real mode may differ to one performed in virtual mode for the same memory > address, which would likely have unintended consequences. > > To avoid this outcome have the guest explicitly limit the size of the > RMA to the current maximum, which is 1TB. This means that even if the > first memory block is larger than 1TB, only the first 1TB should be > accessed in real mode. > > Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> > --- > arch/powerpc/mm/book3s64/hash_utils.c | 8 ++++++++ > 1 file changed, 8 insertions(+) Hi, Tested this patch and now Power8 compat guest boots fine with mem >1024G on Power9 host. Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Host: P9; kernel: 5.2.0-00915-g5ad18b2e60b7 Before this patch: Guest crashes.. [0.000000] BUG: Kernel NULL pointer dereference at 0x00000028 [0.000000] Faulting instruction address: 0xc00000000102caa0 [0.000000] Oops: Kernel access of bad area, sig: 11 [#1] [0.000000] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [0.000000] Modules linked in: [0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-03135-ge9a83bd23220 #24 [0.000000] NIP: c00000000102caa0 LR: c00000000102ca84 CTR: 0000000000000000 [0.000000] REGS: c000000001603ba0 TRAP: 0380 Not tainted (5.2.0-03135-ge9a83bd23220) [0.000000] MSR: 8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 24000428 XER: 20000000 [0.000000] CFAR: c00000000102c1d8 IRQMASK: 1 [0.000000] GPR00: c00000000102ca84 c000000001603e30 c000000001605300 0000010000000000 [0.000000] GPR04: 0000000000000000 0000000000000000 c00000ffffff8000 c000000001863dc8 [0.000000] GPR08: 0000000000002028 0000000000000000 c00000ffffff8000 0000000000000009 [0.000000] GPR12: 0000000000000000 c0000000018f0000 000000007dc5fef0 00000000012e1220 [0.000000] GPR16: 00000000012e10a0 fffffffffffffffd 000000007dc5fef0 000000000130fcc0 [0.000000] GPR20: 0000000000000014 0000000001a80000 000000002fff0000 fffffffffffffffd [0.000000] GPR24: 0000000001d0000c c000000000000000 c000000001641ed8 c000000001641b78 [0.000000] GPR28: 0000000000000000 0000000000000000 0000010000000000 0000000000000000 [0.000000] NIP [c00000000102caa0] emergency_stack_init+0xb8/0x118 [0.000000] LR [c00000000102ca84] emergency_stack_init+0x9c/0x118 [0.000000] Call Trace: [0.000000] [c000000001603e30] [c00000000102ca84] emergency_stack_init+0x9c/0x118 (unreliable) [0.000000] [c000000001603e80] [c00000000102bd54] setup_arch+0x2fc/0x388 [0.000000] [c000000001603ef0] [c000000001023ccc] start_kernel+0xa4/0x660 [0.000000] [c000000001603f90] [c00000000000b774] start_here_common+0x1c/0x528 [0.000000] Instruction dump: [0.000000] 7ffc07b4 7fc3f378 7bfd1f24 7f84e378 4bfff6e9 3f620004 3b7bc878 7f84e378 [0.000000] 39434000 7fc3f378 e93b0000 7d29e82a <f9490028> 4bfff6c5 e93b0000 7f84e378 [0.000000] random: get_random_bytes called from print_oops_end_marker+0x6c/0xa0 with crng_init=0 [0.000000] ---[ end trace 0000000000000000 ]--- [0.000000] [0.000000] Kernel panic - not syncing: Attempted to kill the idle task! ------------------------- With this patch: # virsh start --console p8 Domain p8 started Connected to domain p8 .. .. Fedora 27 (Twenty Seven) Kernel 5.2.0-03136-gf709b0494ad9 on an ppc64le (hvc0) atest-guest login: # free -g total used free shared buff/cache available Mem: 1028 0 1027 0 0 1025 Swap: 0 0 Regards, -Satheesh. > > diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c > index 28ced26f2a00..4d0e2cce9cd5 100644 > --- a/arch/powerpc/mm/book3s64/hash_utils.c > +++ b/arch/powerpc/mm/book3s64/hash_utils.c > @@ -1901,11 +1901,19 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base, > * > * For guests on platforms before POWER9, we clamp the it limit to 1G > * to avoid some funky things such as RTAS bugs etc... > + * On POWER9 we limit to 1TB in case the host erroneously told us that > + * the RMA was >1TB. Effective address bits 0:23 are treated as zero > + * (meaning the access is aliased to zero i.e. addr = addr % 1TB) > + * for virtual real mode addressing and so it doesn't make sense to > + * have an area larger than 1TB as it can't be addressed. > */ > if (!early_cpu_has_feature(CPU_FTR_HVMODE)) { > ppc64_rma_size = first_memblock_size; > if (!early_cpu_has_feature(CPU_FTR_ARCH_300)) > ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x40000000); > + else > + ppc64_rma_size = min_t(u64, ppc64_rma_size, > + 1UL << SID_SHIFT_1T); > > /* Finally limit subsequent allocations */ > memblock_set_current_limit(ppc64_rma_size); > -- > 2.13.6 >
On Wed, Jul 10, 2019 at 03:20:18PM +1000, Suraj Jitindar Singh wrote: > The virtual real mode addressing (VRMA) mechanism is used when a > partition is using HPT (Hash Page Table) translation and performs > real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this > mode effective address bits 0:23 are treated as zero (i.e. the access > is aliased to 0) and the access is performed using an implicit 1TB SLB > entry. > > The size of the RMA (Real Memory Area) is communicated to the guest as > the size of the first memory region in the device tree. And because of > the mechanism described above can be expected to not exceed 1TB. In the > event that the host erroneously represents the RMA as being larger than > 1TB, guest accesses in real mode to memory addresses above 1TB will be > aliased down to below 1TB. This means that a memory access performed in > real mode may differ to one performed in virtual mode for the same memory > address, which would likely have unintended consequences. > > To avoid this outcome have the guest explicitly limit the size of the > RMA to the current maximum, which is 1TB. This means that even if the > first memory block is larger than 1TB, only the first 1TB should be > accessed in real mode. > > Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Although I'd really like to also see some comments added in allocate_paca_ptrs() explaining the constraints there. Oh, also, basing this on the non-compat PVR is bogus, but it's still better than what we had. > --- > arch/powerpc/mm/book3s64/hash_utils.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c > index 28ced26f2a00..4d0e2cce9cd5 100644 > --- a/arch/powerpc/mm/book3s64/hash_utils.c > +++ b/arch/powerpc/mm/book3s64/hash_utils.c > @@ -1901,11 +1901,19 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base, > * > * For guests on platforms before POWER9, we clamp the it limit to 1G > * to avoid some funky things such as RTAS bugs etc... > + * On POWER9 we limit to 1TB in case the host erroneously told us that > + * the RMA was >1TB. Effective address bits 0:23 are treated as zero > + * (meaning the access is aliased to zero i.e. addr = addr % 1TB) > + * for virtual real mode addressing and so it doesn't make sense to > + * have an area larger than 1TB as it can't be addressed. > */ > if (!early_cpu_has_feature(CPU_FTR_HVMODE)) { > ppc64_rma_size = first_memblock_size; > if (!early_cpu_has_feature(CPU_FTR_ARCH_300)) > ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x40000000); > + else > + ppc64_rma_size = min_t(u64, ppc64_rma_size, > + 1UL << SID_SHIFT_1T); > > /* Finally limit subsequent allocations */ > memblock_set_current_limit(ppc64_rma_size);
Suraj Jitindar Singh <sjitindarsingh@gmail.com> writes: > The virtual real mode addressing (VRMA) mechanism is used when a > partition is using HPT (Hash Page Table) translation and performs > real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this > mode effective address bits 0:23 are treated as zero (i.e. the access > is aliased to 0) and the access is performed using an implicit 1TB SLB > entry. > > The size of the RMA (Real Memory Area) is communicated to the guest as > the size of the first memory region in the device tree. And because of > the mechanism described above can be expected to not exceed 1TB. In the > event that the host erroneously represents the RMA as being larger than > 1TB, guest accesses in real mode to memory addresses above 1TB will be > aliased down to below 1TB. This means that a memory access performed in > real mode may differ to one performed in virtual mode for the same memory > address, which would likely have unintended consequences. > > To avoid this outcome have the guest explicitly limit the size of the > RMA to the current maximum, which is 1TB. This means that even if the > first memory block is larger than 1TB, only the first 1TB should be > accessed in real mode. > > Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> I added: Fixes: c3ab300ea555 ("powerpc: Add POWER9 cputable entry") Cc: stable@vger.kernel.org # v4.6+ Which is not exactly correct, but probably good enough? cheers > diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c > index 28ced26f2a00..4d0e2cce9cd5 100644 > --- a/arch/powerpc/mm/book3s64/hash_utils.c > +++ b/arch/powerpc/mm/book3s64/hash_utils.c > @@ -1901,11 +1901,19 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base, > * > * For guests on platforms before POWER9, we clamp the it limit to 1G > * to avoid some funky things such as RTAS bugs etc... > + * On POWER9 we limit to 1TB in case the host erroneously told us that > + * the RMA was >1TB. Effective address bits 0:23 are treated as zero > + * (meaning the access is aliased to zero i.e. addr = addr % 1TB) > + * for virtual real mode addressing and so it doesn't make sense to > + * have an area larger than 1TB as it can't be addressed. > */ > if (!early_cpu_has_feature(CPU_FTR_HVMODE)) { > ppc64_rma_size = first_memblock_size; > if (!early_cpu_has_feature(CPU_FTR_ARCH_300)) > ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x40000000); > + else > + ppc64_rma_size = min_t(u64, ppc64_rma_size, > + 1UL << SID_SHIFT_1T); > > /* Finally limit subsequent allocations */ > memblock_set_current_limit(ppc64_rma_size); > -- > 2.13.6
On Fri, 2019-07-12 at 23:09 +1000, Michael Ellerman wrote: > Suraj Jitindar Singh <sjitindarsingh@gmail.com> writes: > > The virtual real mode addressing (VRMA) mechanism is used when a > > partition is using HPT (Hash Page Table) translation and performs > > real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this > > mode effective address bits 0:23 are treated as zero (i.e. the > > access > > is aliased to 0) and the access is performed using an implicit 1TB > > SLB > > entry. > > > > The size of the RMA (Real Memory Area) is communicated to the guest > > as > > the size of the first memory region in the device tree. And because > > of > > the mechanism described above can be expected to not exceed 1TB. In > > the > > event that the host erroneously represents the RMA as being larger > > than > > 1TB, guest accesses in real mode to memory addresses above 1TB will > > be > > aliased down to below 1TB. This means that a memory access > > performed in > > real mode may differ to one performed in virtual mode for the same > > memory > > address, which would likely have unintended consequences. > > > > To avoid this outcome have the guest explicitly limit the size of > > the > > RMA to the current maximum, which is 1TB. This means that even if > > the > > first memory block is larger than 1TB, only the first 1TB should be > > accessed in real mode. > > > > Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> > > I added: > > Fixes: c3ab300ea555 ("powerpc: Add POWER9 cputable entry") > Cc: stable@vger.kernel.org # v4.6+ > > > Which is not exactly correct, but probably good enough? I think we actually want: Fixes: c610d65c0ad0 ("powerpc/pseries: lift RTAS limit for hash") Which is what actually caused it to break and for the issue to present itself. > > cheers > > > diff --git a/arch/powerpc/mm/book3s64/hash_utils.c > > b/arch/powerpc/mm/book3s64/hash_utils.c > > index 28ced26f2a00..4d0e2cce9cd5 100644 > > --- a/arch/powerpc/mm/book3s64/hash_utils.c > > +++ b/arch/powerpc/mm/book3s64/hash_utils.c > > @@ -1901,11 +1901,19 @@ void > > hash__setup_initial_memory_limit(phys_addr_t first_memblock_base, > > * > > * For guests on platforms before POWER9, we clamp the it > > limit to 1G > > * to avoid some funky things such as RTAS bugs etc... > > + * On POWER9 we limit to 1TB in case the host erroneously > > told us that > > + * the RMA was >1TB. Effective address bits 0:23 are > > treated as zero > > + * (meaning the access is aliased to zero i.e. addr = addr > > % 1TB) > > + * for virtual real mode addressing and so it doesn't make > > sense to > > + * have an area larger than 1TB as it can't be addressed. > > */ > > if (!early_cpu_has_feature(CPU_FTR_HVMODE)) { > > ppc64_rma_size = first_memblock_size; > > if (!early_cpu_has_feature(CPU_FTR_ARCH_300)) > > ppc64_rma_size = min_t(u64, > > ppc64_rma_size, 0x40000000); > > + else > > + ppc64_rma_size = min_t(u64, > > ppc64_rma_size, > > + 1UL << > > SID_SHIFT_1T); > > > > /* Finally limit subsequent allocations */ > > memblock_set_current_limit(ppc64_rma_size); > > -- > > 2.13.6
Suraj Jitindar Singh <sjitindarsingh@gmail.com> writes: > On Fri, 2019-07-12 at 23:09 +1000, Michael Ellerman wrote: >> Suraj Jitindar Singh <sjitindarsingh@gmail.com> writes: >> > The virtual real mode addressing (VRMA) mechanism is used when a >> > partition is using HPT (Hash Page Table) translation and performs >> > real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this >> > mode effective address bits 0:23 are treated as zero (i.e. the >> > access >> > is aliased to 0) and the access is performed using an implicit 1TB >> > SLB >> > entry. >> > >> > The size of the RMA (Real Memory Area) is communicated to the guest >> > as >> > the size of the first memory region in the device tree. And because >> > of >> > the mechanism described above can be expected to not exceed 1TB. In >> > the >> > event that the host erroneously represents the RMA as being larger >> > than >> > 1TB, guest accesses in real mode to memory addresses above 1TB will >> > be >> > aliased down to below 1TB. This means that a memory access >> > performed in >> > real mode may differ to one performed in virtual mode for the same >> > memory >> > address, which would likely have unintended consequences. >> > >> > To avoid this outcome have the guest explicitly limit the size of >> > the >> > RMA to the current maximum, which is 1TB. This means that even if >> > the >> > first memory block is larger than 1TB, only the first 1TB should be >> > accessed in real mode. >> > >> > Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> >> >> I added: >> >> Fixes: c3ab300ea555 ("powerpc: Add POWER9 cputable entry") >> Cc: stable@vger.kernel.org # v4.6+ >> >> >> Which is not exactly correct, but probably good enough? > > I think we actually want: > Fixes: c610d65c0ad0 ("powerpc/pseries: lift RTAS limit for hash") > > Which is what actually caused it to break and for the issue to present > itself. Thanks, I used that instead. cheers
On Wed, 2019-07-10 at 05:20:18 UTC, Suraj Jitindar Singh wrote: > The virtual real mode addressing (VRMA) mechanism is used when a > partition is using HPT (Hash Page Table) translation and performs > real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this > mode effective address bits 0:23 are treated as zero (i.e. the access > is aliased to 0) and the access is performed using an implicit 1TB SLB > entry. > > The size of the RMA (Real Memory Area) is communicated to the guest as > the size of the first memory region in the device tree. And because of > the mechanism described above can be expected to not exceed 1TB. In the > event that the host erroneously represents the RMA as being larger than > 1TB, guest accesses in real mode to memory addresses above 1TB will be > aliased down to below 1TB. This means that a memory access performed in > real mode may differ to one performed in virtual mode for the same memory > address, which would likely have unintended consequences. > > To avoid this outcome have the guest explicitly limit the size of the > RMA to the current maximum, which is 1TB. This means that even if the > first memory block is larger than 1TB, only the first 1TB should be > accessed in real mode. > > Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> > Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/da0ef93310e67ae6902efded60b6724dab27a5d1 cheers
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c index 28ced26f2a00..4d0e2cce9cd5 100644 --- a/arch/powerpc/mm/book3s64/hash_utils.c +++ b/arch/powerpc/mm/book3s64/hash_utils.c @@ -1901,11 +1901,19 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base, * * For guests on platforms before POWER9, we clamp the it limit to 1G * to avoid some funky things such as RTAS bugs etc... + * On POWER9 we limit to 1TB in case the host erroneously told us that + * the RMA was >1TB. Effective address bits 0:23 are treated as zero + * (meaning the access is aliased to zero i.e. addr = addr % 1TB) + * for virtual real mode addressing and so it doesn't make sense to + * have an area larger than 1TB as it can't be addressed. */ if (!early_cpu_has_feature(CPU_FTR_HVMODE)) { ppc64_rma_size = first_memblock_size; if (!early_cpu_has_feature(CPU_FTR_ARCH_300)) ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x40000000); + else + ppc64_rma_size = min_t(u64, ppc64_rma_size, + 1UL << SID_SHIFT_1T); /* Finally limit subsequent allocations */ memblock_set_current_limit(ppc64_rma_size);
The virtual real mode addressing (VRMA) mechanism is used when a partition is using HPT (Hash Page Table) translation and performs real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this mode effective address bits 0:23 are treated as zero (i.e. the access is aliased to 0) and the access is performed using an implicit 1TB SLB entry. The size of the RMA (Real Memory Area) is communicated to the guest as the size of the first memory region in the device tree. And because of the mechanism described above can be expected to not exceed 1TB. In the event that the host erroneously represents the RMA as being larger than 1TB, guest accesses in real mode to memory addresses above 1TB will be aliased down to below 1TB. This means that a memory access performed in real mode may differ to one performed in virtual mode for the same memory address, which would likely have unintended consequences. To avoid this outcome have the guest explicitly limit the size of the RMA to the current maximum, which is 1TB. This means that even if the first memory block is larger than 1TB, only the first 1TB should be accessed in real mode. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> --- arch/powerpc/mm/book3s64/hash_utils.c | 8 ++++++++ 1 file changed, 8 insertions(+)