diff mbox

tlb_batch_add_one()

Message ID d71c45e6-a54a-517f-b347-89ee4e7b196e@oracle.com
State RFC
Delegated to: David Miller
Headers show

Commit Message

Nitin Gupta March 31, 2017, 12:42 a.m. UTC
On 3/30/17 4:59 PM, Nitin Gupta wrote:
> On 3/30/17 2:54 PM, David Miller wrote:
>> From: David Miller <davem@davemloft.net>
>> Date: Thu, 30 Mar 2017 14:25:53 -0700 (PDT)
>>
>>> From: Nitin Gupta <nitin.m.gupta@oracle.com>
>>> Date: Thu, 30 Mar 2017 13:47:11 -0700
>>>
>>>> I will be sending a fix for these call-sites today.
>>>
>>> I already have a fix I'm testing now which I'll check in after my
>>> regression test passes.
>>
>> So even with the shifts fixed, as per the patch below, I'm still getting
>> corruptions during gcc bootstraps.
>>
>> If I cannot figure out what the problem is by the end of the day I'm
>> reverting the change.  I've already spend a week trying to figure out
>> what introduced these regressions...
>>
>> diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
>> index afda3bb..ee8066c 100644
>> --- a/arch/sparc/mm/tlb.c
>> +++ b/arch/sparc/mm/tlb.c
>> @@ -154,7 +154,7 @@ static void tlb_batch_pmd_scan(struct mm_struct *mm, unsigned long vaddr,
>>  		if (pte_val(*pte) & _PAGE_VALID) {
>>  			bool exec = pte_exec(*pte);
>>  
>> -			tlb_batch_add_one(mm, vaddr, exec, false);
>> +			tlb_batch_add_one(mm, vaddr, exec, PAGE_SHIFT);
>>  		}
>>  		pte++;
>>  		vaddr += PAGE_SIZE;
>> @@ -209,9 +209,9 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>>  			pte_t orig_pte = __pte(pmd_val(orig));
>>  			bool exec = pte_exec(orig_pte);
>>  
>> -			tlb_batch_add_one(mm, addr, exec, true);
>> +			tlb_batch_add_one(mm, addr, exec, REAL_HPAGE_SHIFT);
>>  			tlb_batch_add_one(mm, addr + REAL_HPAGE_SIZE, exec,
>> -					true);
>> +					  REAL_HPAGE_SHIFT);
> 
> 
> Shifts are still wrong: tlb_batch_add_one -> flush_tsb_user_page expects
> HPAGE_SHIFT to be passed for 8M pages so that 'HUGE' tsb is flushed
> instead of the BASE one. So, we need to pass HPAGE_SHIFT here.
> 
> Also, I see that sun4v_tte_to_shift() should return HPAGE_SHIFT for 4MB
> case instead of REAL_HPAGE_SHIFT (same for sun4u version).
> 
> And finally, huge_tte_to_shift() can have the size check removed after
> fixing huge_tte_to_shift() as above.
> 
> Currently my test machine is having some disk issues, so I will be back
> with results once the machine is back up.
> 

Or more simply, the check in flush_tsb_user() and flush_tsb_user_page()
can be fixed. Below is the full patch (including your changes):

The patch is untested and my test machine is still not ready, so would
be testing it tonight/tomorrow.



 arch/sparc/mm/tlb.c | 6 +++---
 arch/sparc/mm/tsb.c | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

 		if (tlb_type == cheetah_plus || tlb_type == hypervisor)

Comments

Nitin Gupta March 31, 2017, 2:46 a.m. UTC | #1
On 3/30/17 5:42 PM, Nitin Gupta wrote:
> On 3/30/17 4:59 PM, Nitin Gupta wrote:
>> On 3/30/17 2:54 PM, David Miller wrote:
>>> From: David Miller <davem@davemloft.net>
>>> Date: Thu, 30 Mar 2017 14:25:53 -0700 (PDT)
>>>
>>>> From: Nitin Gupta <nitin.m.gupta@oracle.com>
>>>> Date: Thu, 30 Mar 2017 13:47:11 -0700
>>>>
>>>>> I will be sending a fix for these call-sites today.
>>>>
>>>> I already have a fix I'm testing now which I'll check in after my
>>>> regression test passes.
>>>
>>> So even with the shifts fixed, as per the patch below, I'm still getting
>>> corruptions during gcc bootstraps.
>>>
>>> If I cannot figure out what the problem is by the end of the day I'm
>>> reverting the change.  I've already spend a week trying to figure out
>>> what introduced these regressions...
>>>
>>> diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
>>> index afda3bb..ee8066c 100644
>>> --- a/arch/sparc/mm/tlb.c
>>> +++ b/arch/sparc/mm/tlb.c
>>> @@ -154,7 +154,7 @@ static void tlb_batch_pmd_scan(struct mm_struct *mm, unsigned long vaddr,
>>>  		if (pte_val(*pte) & _PAGE_VALID) {
>>>  			bool exec = pte_exec(*pte);
>>>  
>>> -			tlb_batch_add_one(mm, vaddr, exec, false);
>>> +			tlb_batch_add_one(mm, vaddr, exec, PAGE_SHIFT);
>>>  		}
>>>  		pte++;
>>>  		vaddr += PAGE_SIZE;
>>> @@ -209,9 +209,9 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>>>  			pte_t orig_pte = __pte(pmd_val(orig));
>>>  			bool exec = pte_exec(orig_pte);
>>>  
>>> -			tlb_batch_add_one(mm, addr, exec, true);
>>> +			tlb_batch_add_one(mm, addr, exec, REAL_HPAGE_SHIFT);
>>>  			tlb_batch_add_one(mm, addr + REAL_HPAGE_SIZE, exec,
>>> -					true);
>>> +					  REAL_HPAGE_SHIFT);
>>
>>
>> Shifts are still wrong: tlb_batch_add_one -> flush_tsb_user_page expects
>> HPAGE_SHIFT to be passed for 8M pages so that 'HUGE' tsb is flushed
>> instead of the BASE one. So, we need to pass HPAGE_SHIFT here.
>>
>> Also, I see that sun4v_tte_to_shift() should return HPAGE_SHIFT for 4MB
>> case instead of REAL_HPAGE_SHIFT (same for sun4u version).
>>
>> And finally, huge_tte_to_shift() can have the size check removed after
>> fixing huge_tte_to_shift() as above.
>>
>> Currently my test machine is having some disk issues, so I will be back
>> with results once the machine is back up.
>>
> 
> Or more simply, the check in flush_tsb_user() and flush_tsb_user_page()
> can be fixed. Below is the full patch (including your changes):
> 
> The patch is untested and my test machine is still not ready, so would
> be testing it tonight/tomorrow.
> 
> 
> 

Tested with 'make -j64' for kernel: Without it, build fails at 'LD
vmlinux.o' step (internal error - probably due to memory corruption).
With patch, build succeeds.

Nitin

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 31, 2017, 2:57 a.m. UTC | #2
From: Nitin Gupta <nitin.m.gupta@oracle.com>
Date: Thu, 30 Mar 2017 17:42:51 -0700

> diff --git a/arch/sparc/mm/tsb.c b/arch/sparc/mm/tsb.c
> index 0a04811..bedf08b 100644
> --- a/arch/sparc/mm/tsb.c
> +++ b/arch/sparc/mm/tsb.c
> @@ -122,7 +122,7 @@ void flush_tsb_user(struct tlb_batch *tb)
> 
>  	spin_lock_irqsave(&mm->context.lock, flags);
> 
> -	if (tb->hugepage_shift < HPAGE_SHIFT) {
> +	if (tb->hugepage_shift < REAL_HPAGE_SHIFT) {
>  		base = (unsigned long) mm->context.tsb_block[MM_TSB_BASE].tsb;
>  		nentries = mm->context.tsb_block[MM_TSB_BASE].tsb_nentries;
>  		if (tlb_type == cheetah_plus || tlb_type == hypervisor)
> @@ -155,7 +155,7 @@ void flush_tsb_user_page(struct mm_struct *mm,
> unsigned long vaddr,
> 
>  	spin_lock_irqsave(&mm->context.lock, flags);
> 
> -	if (hugepage_shift < HPAGE_SHIFT) {
> +	if (hugepage_shift < REAL_HPAGE_SHIFT) {
>  		base = (unsigned long) mm->context.tsb_block[MM_TSB_BASE].tsb;
>  		nentries = mm->context.tsb_block[MM_TSB_BASE].tsb_nentries;
>  		if (tlb_type == cheetah_plus || tlb_type == hypervisor)
> -- 

I think if we do it like this, it will only flush one half of the huge
page.

We really need to pass HPAGE_SHIFT down into here, so that the TSB
flush gets both REAL_HPAGE_SIZE entries.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nitin Gupta March 31, 2017, 3:12 a.m. UTC | #3
On 3/30/17 7:57 PM, David Miller wrote:
> From: Nitin Gupta <nitin.m.gupta@oracle.com>
> Date: Thu, 30 Mar 2017 17:42:51 -0700
> 
>> diff --git a/arch/sparc/mm/tsb.c b/arch/sparc/mm/tsb.c
>> index 0a04811..bedf08b 100644
>> --- a/arch/sparc/mm/tsb.c
>> +++ b/arch/sparc/mm/tsb.c
>> @@ -122,7 +122,7 @@ void flush_tsb_user(struct tlb_batch *tb)
>>
>>  	spin_lock_irqsave(&mm->context.lock, flags);
>>
>> -	if (tb->hugepage_shift < HPAGE_SHIFT) {
>> +	if (tb->hugepage_shift < REAL_HPAGE_SHIFT) {
>>  		base = (unsigned long) mm->context.tsb_block[MM_TSB_BASE].tsb;
>>  		nentries = mm->context.tsb_block[MM_TSB_BASE].tsb_nentries;
>>  		if (tlb_type == cheetah_plus || tlb_type == hypervisor)
>> @@ -155,7 +155,7 @@ void flush_tsb_user_page(struct mm_struct *mm,
>> unsigned long vaddr,
>>
>>  	spin_lock_irqsave(&mm->context.lock, flags);
>>
>> -	if (hugepage_shift < HPAGE_SHIFT) {
>> +	if (hugepage_shift < REAL_HPAGE_SHIFT) {
>>  		base = (unsigned long) mm->context.tsb_block[MM_TSB_BASE].tsb;
>>  		nentries = mm->context.tsb_block[MM_TSB_BASE].tsb_nentries;
>>  		if (tlb_type == cheetah_plus || tlb_type == hypervisor)
>> -- 
> 
> I think if we do it like this, it will only flush one half of the huge
> page.
>

Flushing only half the 8M page is the intended behavior: after the
initial allocation of 8M hugepage, the page is handled exactly as if two
independent 4M hugepages were allocated (that happen to be physically
contiguous). So, for each 4M chunk, flushing from TLB and TSB is done
independently. For instance, in set_huge_pte_at() we added special case
for (size == HPAGE_SIZE) to flush the "second half" of 8M page.
Similarly in huge_ptep_get_and_clear() and in set_pmd_at().

Nitin


> We really need to pass HPAGE_SHIFT down into here, so that the TSB
> flush gets both REAL_HPAGE_SIZE entries.
>
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 31, 2017, 3:34 a.m. UTC | #4
From: Nitin Gupta <nitin.m.gupta@oracle.com>
Date: Thu, 30 Mar 2017 20:12:03 -0700

> On 3/30/17 7:57 PM, David Miller wrote:
>> From: Nitin Gupta <nitin.m.gupta@oracle.com>
>> Date: Thu, 30 Mar 2017 17:42:51 -0700
>> 
>>> diff --git a/arch/sparc/mm/tsb.c b/arch/sparc/mm/tsb.c
>>> index 0a04811..bedf08b 100644
>>> --- a/arch/sparc/mm/tsb.c
>>> +++ b/arch/sparc/mm/tsb.c
>>> @@ -122,7 +122,7 @@ void flush_tsb_user(struct tlb_batch *tb)
>>>
>>>  	spin_lock_irqsave(&mm->context.lock, flags);
>>>
>>> -	if (tb->hugepage_shift < HPAGE_SHIFT) {
>>> +	if (tb->hugepage_shift < REAL_HPAGE_SHIFT) {
>>>  		base = (unsigned long) mm->context.tsb_block[MM_TSB_BASE].tsb;
>>>  		nentries = mm->context.tsb_block[MM_TSB_BASE].tsb_nentries;
>>>  		if (tlb_type == cheetah_plus || tlb_type == hypervisor)
>>> @@ -155,7 +155,7 @@ void flush_tsb_user_page(struct mm_struct *mm,
>>> unsigned long vaddr,
>>>
>>>  	spin_lock_irqsave(&mm->context.lock, flags);
>>>
>>> -	if (hugepage_shift < HPAGE_SHIFT) {
>>> +	if (hugepage_shift < REAL_HPAGE_SHIFT) {
>>>  		base = (unsigned long) mm->context.tsb_block[MM_TSB_BASE].tsb;
>>>  		nentries = mm->context.tsb_block[MM_TSB_BASE].tsb_nentries;
>>>  		if (tlb_type == cheetah_plus || tlb_type == hypervisor)
>>> -- 
>> 
>> I think if we do it like this, it will only flush one half of the huge
>> page.
>>
> 
> Flushing only half the 8M page is the intended behavior: after the
> initial allocation of 8M hugepage, the page is handled exactly as if two
> independent 4M hugepages were allocated (that happen to be physically
> contiguous). So, for each 4M chunk, flushing from TLB and TSB is done
> independently. For instance, in set_huge_pte_at() we added special case
> for (size == HPAGE_SIZE) to flush the "second half" of 8M page.
> Similarly in huge_ptep_get_and_clear() and in set_pmd_at().

Indeed, the set_pmd_at() code path does the same thing.

Ok so your patch is more correct.

Please submit it formally with a proper commit log message and
signoff, thanks!
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index afda3bb..ee8066c 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -154,7 +154,7 @@  static void tlb_batch_pmd_scan(struct mm_struct *mm,
unsigned long vaddr,
 		if (pte_val(*pte) & _PAGE_VALID) {
 			bool exec = pte_exec(*pte);

-			tlb_batch_add_one(mm, vaddr, exec, false);
+			tlb_batch_add_one(mm, vaddr, exec, PAGE_SHIFT);
 		}
 		pte++;
 		vaddr += PAGE_SIZE;
@@ -209,9 +209,9 @@  void set_pmd_at(struct mm_struct *mm, unsigned long
addr,
 			pte_t orig_pte = __pte(pmd_val(orig));
 			bool exec = pte_exec(orig_pte);

-			tlb_batch_add_one(mm, addr, exec, true);
+			tlb_batch_add_one(mm, addr, exec, REAL_HPAGE_SHIFT);
 			tlb_batch_add_one(mm, addr + REAL_HPAGE_SIZE, exec,
-					true);
+					  REAL_HPAGE_SHIFT);
 		} else {
 			tlb_batch_pmd_scan(mm, addr, orig);
 		}
diff --git a/arch/sparc/mm/tsb.c b/arch/sparc/mm/tsb.c
index 0a04811..bedf08b 100644
--- a/arch/sparc/mm/tsb.c
+++ b/arch/sparc/mm/tsb.c
@@ -122,7 +122,7 @@  void flush_tsb_user(struct tlb_batch *tb)

 	spin_lock_irqsave(&mm->context.lock, flags);

-	if (tb->hugepage_shift < HPAGE_SHIFT) {
+	if (tb->hugepage_shift < REAL_HPAGE_SHIFT) {
 		base = (unsigned long) mm->context.tsb_block[MM_TSB_BASE].tsb;
 		nentries = mm->context.tsb_block[MM_TSB_BASE].tsb_nentries;
 		if (tlb_type == cheetah_plus || tlb_type == hypervisor)
@@ -155,7 +155,7 @@  void flush_tsb_user_page(struct mm_struct *mm,
unsigned long vaddr,

 	spin_lock_irqsave(&mm->context.lock, flags);

-	if (hugepage_shift < HPAGE_SHIFT) {
+	if (hugepage_shift < REAL_HPAGE_SHIFT) {
 		base = (unsigned long) mm->context.tsb_block[MM_TSB_BASE].tsb;
 		nentries = mm->context.tsb_block[MM_TSB_BASE].tsb_nentries;