diff mbox

powerpc/kernel: improve FP and vector registers restoration

Message ID 1496439810-11240-1-git-send-email-leitao@debian.org (mailing list archive)
State Accepted
Commit 1195892c091a15cc862f4e202482a36adc924e12
Headers show

Commit Message

Breno Leitao June 2, 2017, 9:43 p.m. UTC
Currently tsk->thread->load_vec and load_fp are not initialized during a
task creation, which set garbage to these variables (non-zero value).

These variables will be checked later at restore_math() to validate if the
FP and vectors are being utilized. Since these values might be non-zero,
the restore_math() will continue to save the FP and vectors even if they
were never utilized before the userspace application. load_fp and load_vec
counters will then overflow and the FP and Altivec will be finally
disabled, but before that condition is reached (counter overflow) several
context switches restored FP and vector registers without need, causing a
performance degradation.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Gustavo Romero <gusbromero@gmail.com>
---
 arch/powerpc/kernel/process.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Anton Blanchard June 2, 2017, 10:04 p.m. UTC | #1
Hi Breno,

> Currently tsk->thread->load_vec and load_fp are not initialized
> during a task creation, which set garbage to these variables
> (non-zero value).

Nice catch! It seems like we should zero load_tm too though?

Acked-by: Anton Blanchard <anton@samba.org>

Anton

> These variables will be checked later at restore_math() to validate
> if the FP and vectors are being utilized. Since these values might be
> non-zero, the restore_math() will continue to save the FP and vectors
> even if they were never utilized before the userspace application.
> load_fp and load_vec counters will then overflow and the FP and
> Altivec will be finally disabled, but before that condition is
> reached (counter overflow) several context switches restored FP and
> vector registers without need, causing a performance degradation.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Signed-off-by: Gustavo Romero <gusbromero@gmail.com>
> ---
>  arch/powerpc/kernel/process.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/process.c
> b/arch/powerpc/kernel/process.c index baae104b16c7..a9435397eab8
> 100644 --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1666,6 +1666,7 @@ void start_thread(struct pt_regs *regs,
> unsigned long start, unsigned long sp) #ifdef CONFIG_VSX
>  	current->thread.used_vsr = 0;
>  #endif
> +	current->thread.load_fp = 0;
>  	memset(&current->thread.fp_state, 0,
> sizeof(current->thread.fp_state)); current->thread.fp_save_area =
> NULL; #ifdef CONFIG_ALTIVEC
> @@ -1674,6 +1675,7 @@ void start_thread(struct pt_regs *regs,
> unsigned long start, unsigned long sp) current->thread.vr_save_area =
> NULL; current->thread.vrsave = 0;
>  	current->thread.used_vr = 0;
> +	current->thread.load_vec = 0;
>  #endif /* CONFIG_ALTIVEC */
>  #ifdef CONFIG_SPE
>  	memset(current->thread.evr, 0, sizeof(current->thread.evr));
Breno Leitao June 3, 2017, 10:42 p.m. UTC | #2
Hi Anton,

On Sat, Jun 03, 2017 at 08:04:11AM +1000, Anton Blanchard wrote:
> Hi Breno,
> 
> > Currently tsk->thread->load_vec and load_fp are not initialized
> > during a task creation, which set garbage to these variables
> > (non-zero value).
> 
> Nice catch! It seems like we should zero load_tm too though?

Yes, it seems we need to zero load_tm also, since it does not seem to be
zeroed anywhere else.

But I did some tests, and load_tm is always zero after start_thread()
is being called.

In fact, start_thread() is being called and pt_regs->load_tm is already
zero since the function start.

I also wrote a SystemTap script[1] to investigate it better, and I've
never seen a single load_tm != 0 in a my machine. I tested on both
POWER8 bare metal and KVM guests. (load_vec and load_fp happened to have
garbage all the time)

Any idea if this is just occasional event, or, if there is someone
zeroing it in an obscure code?

[1] https://github.com/leitao/htm_torture/blob/master/systemtap/load_tm_at_start_thread.stap
Anton Blanchard June 4, 2017, 1:38 a.m. UTC | #3
On Sat, 3 Jun 2017 19:42:14 -0300
Breno Leitao <leitao@debian.org> wrote:

> Hi Anton,
> 
> On Sat, Jun 03, 2017 at 08:04:11AM +1000, Anton Blanchard wrote:
> > Hi Breno,
> >   
> > > Currently tsk->thread->load_vec and load_fp are not initialized
> > > during a task creation, which set garbage to these variables
> > > (non-zero value).  
> > 
> > Nice catch! It seems like we should zero load_tm too though?  
> 
> Yes, it seems we need to zero load_tm also, since it does not seem to
> be zeroed anywhere else.
> 
> But I did some tests, and load_tm is always zero after start_thread()
> is being called.
> 
> In fact, start_thread() is being called and pt_regs->load_tm is
> already zero since the function start.
> 
> I also wrote a SystemTap script[1] to investigate it better, and I've
> never seen a single load_tm != 0 in a my machine. I tested on both
> POWER8 bare metal and KVM guests. (load_vec and load_fp happened to
> have garbage all the time)
> 
> Any idea if this is just occasional event, or, if there is someone
> zeroing it in an obscure code?

Quite likely no one uses TM :) Try:

#include <unistd.h>

int main(void)
{
        __builtin_tbegin(0);
        execlp("/bin/true", "/bin/true", NULL);
}

Anton
Breno Leitao June 4, 2017, 2:34 p.m. UTC | #4
On Sun, Jun 04, 2017 at 11:38:14AM +1000, Anton Blanchard wrote:
> On Sat, 3 Jun 2017 19:42:14 -0300
> Breno Leitao <leitao@debian.org> wrote:
> 
> > Hi Anton,
> > 
> > On Sat, Jun 03, 2017 at 08:04:11AM +1000, Anton Blanchard wrote:
> > > Hi Breno,
> > >   
> > > > Currently tsk->thread->load_vec and load_fp are not initialized
> > > > during a task creation, which set garbage to these variables
> > > > (non-zero value).  
> > > 
> > > Nice catch! It seems like we should zero load_tm too though?  
> > 
> > Yes, it seems we need to zero load_tm also, since it does not seem to
> > be zeroed anywhere else.
> > 
> > But I did some tests, and load_tm is always zero after start_thread()
> > is being called.
> > 
> > In fact, start_thread() is being called and pt_regs->load_tm is
> > already zero since the function start.
> > 
> > I also wrote a SystemTap script[1] to investigate it better, and I've
> > never seen a single load_tm != 0 in a my machine. I tested on both
> > POWER8 bare metal and KVM guests. (load_vec and load_fp happened to
> > have garbage all the time)
> > 
> > Any idea if this is just occasional event, or, if there is someone
> > zeroing it in an obscure code?
> 
> Quite likely no one uses TM :) Try:

In fact, I had tested with TM[1] and haven't seen any issue, but I was not
calling a nested application (through execve() syscall). Somehow if I
call  "$ ./tm_application ; /bin/true", I do not see a non-zero load_tm
in the new task->thread.

On the other side, I see the corruption with your test case, mainly if I
sleep after 'tbegin.' and before execlp(), giving a chance to have
load_tm incremented, and this value is being inherited in the new
task->thread.

This is obviously wrong, I will send a patch to have it fixed.

Thanks for the guidance!

[1] https://github.com/leitao/htm_torture
Michael Ellerman June 5, 2017, 5:59 a.m. UTC | #5
Breno Leitao <leitao@debian.org> writes:

> Currently tsk->thread->load_vec and load_fp are not initialized during a
> task creation, which set garbage to these variables (non-zero value).
>
> These variables will be checked later at restore_math() to validate if the
> FP and vectors are being utilized. Since these values might be non-zero,
> the restore_math() will continue to save the FP and vectors even if they
> were never utilized before the userspace application. load_fp and load_vec
> counters will then overflow and the FP and Altivec will be finally
> disabled, but before that condition is reached (counter overflow) several
> context switches restored FP and vector registers without need, causing a
> performance degradation.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Signed-off-by: Gustavo Romero <gusbromero@gmail.com>

Thanks, I tweaked the wording a little and added:

Fixes: 70fe3d980f5f ("powerpc: Restore FPU/VEC/VSX if previously used")
Cc: stable@vger.kernel.org # v4.6+

cheers
Michael Ellerman June 8, 2017, 4:05 a.m. UTC | #6
On Fri, 2017-06-02 at 21:43:30 UTC, Breno Leitao wrote:
> Currently tsk->thread->load_vec and load_fp are not initialized during a
> task creation, which set garbage to these variables (non-zero value).
> 
> These variables will be checked later at restore_math() to validate if the
> FP and vectors are being utilized. Since these values might be non-zero,
> the restore_math() will continue to save the FP and vectors even if they
> were never utilized before the userspace application. load_fp and load_vec
> counters will then overflow and the FP and Altivec will be finally
> disabled, but before that condition is reached (counter overflow) several
> context switches restored FP and vector registers without need, causing a
> performance degradation.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Signed-off-by: Gustavo Romero <gusbromero@gmail.com>
> Acked-by: Anton Blanchard <anton@samba.org>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/1195892c091a15cc862f4e202482a3

cheers
diff mbox

Patch

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index baae104b16c7..a9435397eab8 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1666,6 +1666,7 @@  void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
 #ifdef CONFIG_VSX
 	current->thread.used_vsr = 0;
 #endif
+	current->thread.load_fp = 0;
 	memset(&current->thread.fp_state, 0, sizeof(current->thread.fp_state));
 	current->thread.fp_save_area = NULL;
 #ifdef CONFIG_ALTIVEC
@@ -1674,6 +1675,7 @@  void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
 	current->thread.vr_save_area = NULL;
 	current->thread.vrsave = 0;
 	current->thread.used_vr = 0;
+	current->thread.load_vec = 0;
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_SPE
 	memset(current->thread.evr, 0, sizeof(current->thread.evr));