[03/12] powerpc/64s/hash: move POWER5 < DD2.1 slbie workaround where it is needed

Message ID 20180914153056.3644-4-npiggin@gmail.com
State New
Headers show
Series
  • SLB miss conversion to C, and SLB optimisations
Related show

Checks

Context Check Description
snowpatch_ozlabs/checkpatch success Test checkpatch on branch next
snowpatch_ozlabs/apply_patch success next/apply_patch Successfully applied

Commit Message

Nicholas Piggin Sept. 14, 2018, 3:30 p.m.
The POWER5 < DD2.1 issue is that slbie needs to be issued more than
once. It came in with this change:

ChangeSet@1.1608, 2004-04-29 07:12:31-07:00, david@gibson.dropbear.id.au
  [PATCH] POWER5 erratum workaround

  Early POWER5 revisions (<DD2.1) have a problem requiring slbie
  instructions to be repeated under some circumstances.  The patch below
  adds a workaround (patch made by Anton Blanchard).

The extra slbie in switch_slb is done even for the case where slbia is
called (slb_flush_and_rebolt). I don't believe that is required
because there are other slb_flush_and_rebolt callers which do not
issue the workaround slbie, which would be broken if it was required.

It also seems to be fine inside the isync with the first slbie, as it
is in the kernel stack switch code.

So move this workaround to where it is required. This is not much of
an optimisation because this is the fast path, but it makes the code
more understandable and neater.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/slb.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

Comments

kbuild test robot Sept. 16, 2018, 10:06 p.m. | #1
Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.19-rc3 next-20180913]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Nicholas-Piggin/SLB-miss-conversion-to-C-and-SLB-optimisations/20180917-015458
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

Note: the linux-review/Nicholas-Piggin/SLB-miss-conversion-to-C-and-SLB-optimisations/20180917-015458 HEAD b26bd44c74488169a0fd19eef43ea3db189a207d builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

   arch/powerpc/mm/slb.c: In function 'switch_slb':
>> arch/powerpc/mm/slb.c:258:4: error: 'slbie_data' may be used uninitialized in this function [-Werror=maybe-uninitialized]
       asm volatile("slbie %0" : : "r" (slbie_data));
       ^~~
   cc1: all warnings being treated as errors

vim +/slbie_data +258 arch/powerpc/mm/slb.c

465ccab9 arch/powerpc/mm/slb.c will schmidt     2007-10-31  224  
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  225  /* Flush all user entries from the segment table of the current processor. */
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  226  void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  227  {
9c1e1052 arch/powerpc/mm/slb.c Paul Mackerras   2009-08-17  228  	unsigned long offset;
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  229  	unsigned long pc = KSTK_EIP(tsk);
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  230  	unsigned long stack = KSTK_ESP(tsk);
de4376c2 arch/powerpc/mm/slb.c Anton Blanchard  2009-07-13  231  	unsigned long exec_base;
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  232  
9c1e1052 arch/powerpc/mm/slb.c Paul Mackerras   2009-08-17  233  	/*
9c1e1052 arch/powerpc/mm/slb.c Paul Mackerras   2009-08-17  234  	 * We need interrupts hard-disabled here, not just soft-disabled,
9c1e1052 arch/powerpc/mm/slb.c Paul Mackerras   2009-08-17  235  	 * so that a PMU interrupt can't occur, which might try to access
9c1e1052 arch/powerpc/mm/slb.c Paul Mackerras   2009-08-17  236  	 * user memory (to get a stack trace) and possible cause an SLB miss
9c1e1052 arch/powerpc/mm/slb.c Paul Mackerras   2009-08-17  237  	 * which would update the slb_cache/slb_cache_ptr fields in the PACA.
9c1e1052 arch/powerpc/mm/slb.c Paul Mackerras   2009-08-17  238  	 */
9c1e1052 arch/powerpc/mm/slb.c Paul Mackerras   2009-08-17  239  	hard_irq_disable();
9c1e1052 arch/powerpc/mm/slb.c Paul Mackerras   2009-08-17  240  	offset = get_paca()->slb_cache_ptr;
44ae3ab3 arch/powerpc/mm/slb.c Matt Evans       2011-04-06  241  	if (!mmu_has_feature(MMU_FTR_NO_SLBIE_B) &&
f66bce5e arch/powerpc/mm/slb.c Olof Johansson   2007-10-16  242  	    offset <= SLB_CACHE_ENTRIES) {
6697d605 arch/powerpc/mm/slb.c Nicholas Piggin  2018-09-15  243  		unsigned long slbie_data;
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  244  		int i;
6697d605 arch/powerpc/mm/slb.c Nicholas Piggin  2018-09-15  245  
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  246  		asm volatile("isync" : : : "memory");
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  247  		for (i = 0; i < offset; i++) {
1189be65 arch/powerpc/mm/slb.c Paul Mackerras   2007-10-11  248  			slbie_data = (unsigned long)get_paca()->slb_cache[i]
1189be65 arch/powerpc/mm/slb.c Paul Mackerras   2007-10-11  249  				<< SID_SHIFT; /* EA */
1189be65 arch/powerpc/mm/slb.c Paul Mackerras   2007-10-11  250  			slbie_data |= user_segment_size(slbie_data)
1189be65 arch/powerpc/mm/slb.c Paul Mackerras   2007-10-11  251  				<< SLBIE_SSIZE_SHIFT;
1189be65 arch/powerpc/mm/slb.c Paul Mackerras   2007-10-11  252  			slbie_data |= SLBIE_C; /* C set for user addresses */
1189be65 arch/powerpc/mm/slb.c Paul Mackerras   2007-10-11  253  			asm volatile("slbie %0" : : "r" (slbie_data));
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  254  		}
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  255  
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  256  		/* Workaround POWER5 < DD2.1 issue */
6697d605 arch/powerpc/mm/slb.c Nicholas Piggin  2018-09-15  257  		if (!cpu_has_feature(CPU_FTR_ARCH_207S) && offset == 1)
1189be65 arch/powerpc/mm/slb.c Paul Mackerras   2007-10-11 @258  			asm volatile("slbie %0" : : "r" (slbie_data));
6697d605 arch/powerpc/mm/slb.c Nicholas Piggin  2018-09-15  259  
6697d605 arch/powerpc/mm/slb.c Nicholas Piggin  2018-09-15  260  		asm volatile("isync" : : : "memory");
6697d605 arch/powerpc/mm/slb.c Nicholas Piggin  2018-09-15  261  	} else {
6697d605 arch/powerpc/mm/slb.c Nicholas Piggin  2018-09-15  262  		__slb_flush_and_rebolt();
c15c9670 arch/powerpc/mm/slb.c Nicholas Piggin  2018-09-15  263  	}
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  264  
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  265  	get_paca()->slb_cache_ptr = 0;
52b1e665 arch/powerpc/mm/slb.c Aneesh Kumar K.V 2017-03-22  266  	copy_mm_to_paca(mm);
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  267  
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  268  	/*
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  269  	 * preload some userspace segments into the SLB.
de4376c2 arch/powerpc/mm/slb.c Anton Blanchard  2009-07-13  270  	 * Almost all 32 and 64bit PowerPC executables are linked at
de4376c2 arch/powerpc/mm/slb.c Anton Blanchard  2009-07-13  271  	 * 0x10000000 so it makes sense to preload this segment.
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  272  	 */
de4376c2 arch/powerpc/mm/slb.c Anton Blanchard  2009-07-13  273  	exec_base = 0x10000000;
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  274  
5eb9bac0 arch/powerpc/mm/slb.c Anton Blanchard  2009-07-13  275  	if (is_kernel_addr(pc) || is_kernel_addr(stack) ||
de4376c2 arch/powerpc/mm/slb.c Anton Blanchard  2009-07-13  276  	    is_kernel_addr(exec_base))
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  277  		return;
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  278  
5eb9bac0 arch/powerpc/mm/slb.c Anton Blanchard  2009-07-13  279  	slb_allocate(pc);
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  280  
5eb9bac0 arch/powerpc/mm/slb.c Anton Blanchard  2009-07-13  281  	if (!esids_match(pc, stack))
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  282  		slb_allocate(stack);
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  283  
de4376c2 arch/powerpc/mm/slb.c Anton Blanchard  2009-07-13  284  	if (!esids_match(pc, exec_base) &&
de4376c2 arch/powerpc/mm/slb.c Anton Blanchard  2009-07-13  285  	    !esids_match(stack, exec_base))
de4376c2 arch/powerpc/mm/slb.c Anton Blanchard  2009-07-13  286  		slb_allocate(exec_base);
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  287  }
^1da177e arch/ppc64/mm/slb.c   Linus Torvalds   2005-04-16  288  

:::::: The code at line 258 was first introduced by commit
:::::: 1189be6508d45183013ddb82b18f4934193de274 [POWERPC] Use 1TB segments

:::::: TO: Paul Mackerras <paulus@samba.org>
:::::: CC: Paul Mackerras <paulus@samba.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Aneesh Kumar K.V Sept. 17, 2018, 6 a.m. | #2
Nicholas Piggin <npiggin@gmail.com> writes:

> The POWER5 < DD2.1 issue is that slbie needs to be issued more than
> once. It came in with this change:
>
> ChangeSet@1.1608, 2004-04-29 07:12:31-07:00, david@gibson.dropbear.id.au
>   [PATCH] POWER5 erratum workaround
>
>   Early POWER5 revisions (<DD2.1) have a problem requiring slbie
>   instructions to be repeated under some circumstances.  The patch below
>   adds a workaround (patch made by Anton Blanchard).

Thanks for extracting this. Can we add this to the code? Also I am not
sure what is repeated here? Is it that we just need one slb extra(hence
only applicable to offset == 1) or is it that we need to make sure there
is always one slb extra? The code does the former.  Do you a have link for
that email patch?


>
> The extra slbie in switch_slb is done even for the case where slbia is
> called (slb_flush_and_rebolt). I don't believe that is required
> because there are other slb_flush_and_rebolt callers which do not
> issue the workaround slbie, which would be broken if it was required.
>
> It also seems to be fine inside the isync with the first slbie, as it
> is in the kernel stack switch code.
>
> So move this workaround to where it is required. This is not much of
> an optimisation because this is the fast path, but it makes the code
> more understandable and neater.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/mm/slb.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
> index 1c7128c63a4b..d952ece3abf7 100644
> --- a/arch/powerpc/mm/slb.c
> +++ b/arch/powerpc/mm/slb.c
> @@ -226,7 +226,6 @@ static inline int esids_match(unsigned long addr1, unsigned long addr2)
>  void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
>  {
>  	unsigned long offset;
> -	unsigned long slbie_data = 0;
>  	unsigned long pc = KSTK_EIP(tsk);
>  	unsigned long stack = KSTK_ESP(tsk);
>  	unsigned long exec_base;
> @@ -241,7 +240,9 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
>  	offset = get_paca()->slb_cache_ptr;
>  	if (!mmu_has_feature(MMU_FTR_NO_SLBIE_B) &&
>  	    offset <= SLB_CACHE_ENTRIES) {
> +		unsigned long slbie_data;
>  		int i;
> +
>  		asm volatile("isync" : : : "memory");
>  		for (i = 0; i < offset; i++) {
>  			slbie_data = (unsigned long)get_paca()->slb_cache[i]
> @@ -251,15 +252,14 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
>  			slbie_data |= SLBIE_C; /* C set for user addresses */
>  			asm volatile("slbie %0" : : "r" (slbie_data));
>  		}
> -		asm volatile("isync" : : : "memory");
> -	} else {
> -		__slb_flush_and_rebolt();
> -	}
>  
> -	if (!cpu_has_feature(CPU_FTR_ARCH_207S)) {
>  		/* Workaround POWER5 < DD2.1 issue */
> -		if (offset == 1 || offset > SLB_CACHE_ENTRIES)
> +		if (!cpu_has_feature(CPU_FTR_ARCH_207S) && offset == 1)
>  			asm volatile("slbie %0" : : "r" (slbie_data));
> +
> +		asm volatile("isync" : : : "memory");
> +	} else {
> +		__slb_flush_and_rebolt();
>  	}
>  
>  	get_paca()->slb_cache_ptr = 0;
> -- 
> 2.18.0
Nicholas Piggin Sept. 17, 2018, 8:39 a.m. | #3
On Mon, 17 Sep 2018 11:30:16 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:

> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> > The POWER5 < DD2.1 issue is that slbie needs to be issued more than
> > once. It came in with this change:
> >
> > ChangeSet@1.1608, 2004-04-29 07:12:31-07:00, david@gibson.dropbear.id.au
> >   [PATCH] POWER5 erratum workaround
> >
> >   Early POWER5 revisions (<DD2.1) have a problem requiring slbie
> >   instructions to be repeated under some circumstances.  The patch below
> >   adds a workaround (patch made by Anton Blanchard).  
> 
> Thanks for extracting this. Can we add this to the code?

The comment? Sure.

> Also I am not
> sure what is repeated here? Is it that we just need one slb extra(hence
> only applicable to offset == 1) or is it that we need to make sure there
> is always one slb extra? The code does the former.

Yeah it has always done the former, so my assumption is that you just
need more than one slbie. I don't think we need to bother revisiting
that assumption unless someone can pull up something definitive.

What I did change is that slbia no longer has the additional slbie, but
I think there are strong reasons not to need that.

>  Do you a have link for
> that email patch?

I tried looking through the archives around that date but could not
find it. That came from a bitkeeper log.

Thanks,
Nick

Patch

diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 1c7128c63a4b..d952ece3abf7 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -226,7 +226,6 @@  static inline int esids_match(unsigned long addr1, unsigned long addr2)
 void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
 {
 	unsigned long offset;
-	unsigned long slbie_data = 0;
 	unsigned long pc = KSTK_EIP(tsk);
 	unsigned long stack = KSTK_ESP(tsk);
 	unsigned long exec_base;
@@ -241,7 +240,9 @@  void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
 	offset = get_paca()->slb_cache_ptr;
 	if (!mmu_has_feature(MMU_FTR_NO_SLBIE_B) &&
 	    offset <= SLB_CACHE_ENTRIES) {
+		unsigned long slbie_data;
 		int i;
+
 		asm volatile("isync" : : : "memory");
 		for (i = 0; i < offset; i++) {
 			slbie_data = (unsigned long)get_paca()->slb_cache[i]
@@ -251,15 +252,14 @@  void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
 			slbie_data |= SLBIE_C; /* C set for user addresses */
 			asm volatile("slbie %0" : : "r" (slbie_data));
 		}
-		asm volatile("isync" : : : "memory");
-	} else {
-		__slb_flush_and_rebolt();
-	}
 
-	if (!cpu_has_feature(CPU_FTR_ARCH_207S)) {
 		/* Workaround POWER5 < DD2.1 issue */
-		if (offset == 1 || offset > SLB_CACHE_ENTRIES)
+		if (!cpu_has_feature(CPU_FTR_ARCH_207S) && offset == 1)
 			asm volatile("slbie %0" : : "r" (slbie_data));
+
+		asm volatile("isync" : : : "memory");
+	} else {
+		__slb_flush_and_rebolt();
 	}
 
 	get_paca()->slb_cache_ptr = 0;