diff mbox series

powerpc/mm/radix: Fix kernel crash when running subpage protect test

Message ID 20190430075907.7701-1-aneesh.kumar@linux.ibm.com (mailing list archive)
State Accepted
Commit 2c474c03505677cfd987d52e8bf42abe8c270529
Headers show
Series powerpc/mm/radix: Fix kernel crash when running subpage protect test | expand

Commit Message

Aneesh Kumar K V April 30, 2019, 7:59 a.m. UTC
This patch fixes the below crash by making sure we touch the subpage protection
related structures only if we know they are allocated on the platform. With
radix translation we don't allocate hash context at all and trying to access
subpage_prot_table results in

 Faulting instruction address: 0xc00000000008bdb4
 Oops: Kernel access of bad area, sig: 11 [#1]
 LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
 ....
 NIP [c00000000008bdb4] sys_subpage_prot+0x74/0x590
 LR [c00000000000b688] system_call+0x5c/0x70
 Call Trace:
 [c00020002c6b7d30] [c00020002c6b7d90] 0xc00020002c6b7d90 (unreliable)
 [c00020002c6b7e20] [c00000000000b688] system_call+0x5c/0x70
 Instruction dump:
 fb61ffd8 fb81ffe0 fba1ffe8 fbc1fff0 fbe1fff8 f821ff11 e92d1178 f9210068
 39200000 e92d0968 ebe90630 e93f03e8 <eb891038> 60000000 3860fffe e9410068

We also move the subpage_prot_table with mmp_sem held to avoid racec
between two parallel subpage_prot syscall.

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/subpage-prot.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

Comments

Sachin Sant April 30, 2019, 8:51 a.m. UTC | #1
> On 30-Apr-2019, at 1:29 PM, Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote:
> 
> This patch fixes the below crash by making sure we touch the subpage protection
> related structures only if we know they are allocated on the platform. With
> radix translation we don't allocate hash context at all and trying to access
> subpage_prot_table results in
> 
> Faulting instruction address: 0xc00000000008bdb4
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
> ....
> NIP [c00000000008bdb4] sys_subpage_prot+0x74/0x590
> LR [c00000000000b688] system_call+0x5c/0x70
> Call Trace:
> [c00020002c6b7d30] [c00020002c6b7d90] 0xc00020002c6b7d90 (unreliable)
> [c00020002c6b7e20] [c00000000000b688] system_call+0x5c/0x70
> Instruction dump:
> fb61ffd8 fb81ffe0 fba1ffe8 fbc1fff0 fbe1fff8 f821ff11 e92d1178 f9210068
> 39200000 e92d0968 ebe90630 e93f03e8 <eb891038> 60000000 3860fffe e9410068
> 
> We also move the subpage_prot_table with mmp_sem held to avoid racec
> between two parallel subpage_prot syscall.
> 
> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> —

Thanks for the patch. Fixes the kernel crash.

Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com <mailto:sachinp@linux.vnet.ibm.com>>

Thanks
-Sachin
Michael Ellerman May 1, 2019, 1:44 a.m. UTC | #2
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> This patch fixes the below crash by making sure we touch the subpage protection
> related structures only if we know they are allocated on the platform. With
> radix translation we don't allocate hash context at all and trying to access
> subpage_prot_table results in
>
>  Faulting instruction address: 0xc00000000008bdb4
>  Oops: Kernel access of bad area, sig: 11 [#1]
>  LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
>  ....
>  NIP [c00000000008bdb4] sys_subpage_prot+0x74/0x590
>  LR [c00000000000b688] system_call+0x5c/0x70
>  Call Trace:
>  [c00020002c6b7d30] [c00020002c6b7d90] 0xc00020002c6b7d90 (unreliable)
>  [c00020002c6b7e20] [c00000000000b688] system_call+0x5c/0x70
>  Instruction dump:
>  fb61ffd8 fb81ffe0 fba1ffe8 fbc1fff0 fbe1fff8 f821ff11 e92d1178 f9210068
>  39200000 e92d0968 ebe90630 e93f03e8 <eb891038> 60000000 3860fffe e9410068
>
> We also move the subpage_prot_table with mmp_sem held to avoid racec
> between two parallel subpage_prot syscall.
>
> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Presumably it was:

701101865f5d ("powerpc/mm: Reduce memory usage for mm_context_t for radix")

That caused the breakage?

cheers
Aneesh Kumar K V May 1, 2019, 10:32 a.m. UTC | #3
On 5/1/19 7:14 AM, Michael Ellerman wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> 
>> This patch fixes the below crash by making sure we touch the subpage protection
>> related structures only if we know they are allocated on the platform. With
>> radix translation we don't allocate hash context at all and trying to access
>> subpage_prot_table results in
>>
>>   Faulting instruction address: 0xc00000000008bdb4
>>   Oops: Kernel access of bad area, sig: 11 [#1]
>>   LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
>>   ....
>>   NIP [c00000000008bdb4] sys_subpage_prot+0x74/0x590
>>   LR [c00000000000b688] system_call+0x5c/0x70
>>   Call Trace:
>>   [c00020002c6b7d30] [c00020002c6b7d90] 0xc00020002c6b7d90 (unreliable)
>>   [c00020002c6b7e20] [c00000000000b688] system_call+0x5c/0x70
>>   Instruction dump:
>>   fb61ffd8 fb81ffe0 fba1ffe8 fbc1fff0 fbe1fff8 f821ff11 e92d1178 f9210068
>>   39200000 e92d0968 ebe90630 e93f03e8 <eb891038> 60000000 3860fffe e9410068
>>
>> We also move the subpage_prot_table with mmp_sem held to avoid racec
>> between two parallel subpage_prot syscall.
>>
>> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> 
> Presumably it was:
> 
> 701101865f5d ("powerpc/mm: Reduce memory usage for mm_context_t for radix")
> 
> That caused the breakage?
> 

that is correct. Sorry i missed the Fixes: line.

-aneesh
diff mbox series

Patch

diff --git a/arch/powerpc/mm/subpage-prot.c b/arch/powerpc/mm/subpage-prot.c
index c9dff4e1f295..473dd430e306 100644
--- a/arch/powerpc/mm/subpage-prot.c
+++ b/arch/powerpc/mm/subpage-prot.c
@@ -90,16 +90,18 @@  static void hpte_flush_range(struct mm_struct *mm, unsigned long addr,
 static void subpage_prot_clear(unsigned long addr, unsigned long len)
 {
 	struct mm_struct *mm = current->mm;
-	struct subpage_prot_table *spt = mm_ctx_subpage_prot(&mm->context);
+	struct subpage_prot_table *spt;
 	u32 **spm, *spp;
 	unsigned long i;
 	size_t nw;
 	unsigned long next, limit;
 
+	down_write(&mm->mmap_sem);
+
+	spt = mm_ctx_subpage_prot(&mm->context);
 	if (!spt)
-		return ;
+		goto err_out;
 
-	down_write(&mm->mmap_sem);
 	limit = addr + len;
 	if (limit > spt->maxaddr)
 		limit = spt->maxaddr;
@@ -127,6 +129,8 @@  static void subpage_prot_clear(unsigned long addr, unsigned long len)
 		/* now flush any existing HPTEs for the range */
 		hpte_flush_range(mm, addr, nw);
 	}
+
+err_out:
 	up_write(&mm->mmap_sem);
 }
 
@@ -189,7 +193,7 @@  SYSCALL_DEFINE3(subpage_prot, unsigned long, addr,
 		unsigned long, len, u32 __user *, map)
 {
 	struct mm_struct *mm = current->mm;
-	struct subpage_prot_table *spt = mm_ctx_subpage_prot(&mm->context);
+	struct subpage_prot_table *spt;
 	u32 **spm, *spp;
 	unsigned long i;
 	size_t nw;
@@ -219,6 +223,7 @@  SYSCALL_DEFINE3(subpage_prot, unsigned long, addr,
 
 	down_write(&mm->mmap_sem);
 
+	spt = mm_ctx_subpage_prot(&mm->context);
 	if (!spt) {
 		/*
 		 * Allocate subpage prot table if not already done.