From patchwork Wed Aug  5 01:06:00 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: David Miller <davem@davemloft.net>
X-Patchwork-Id: 503859
X-Patchwork-Delegate: davem@davemloft.net
Return-Path: <sparclinux-owner@vger.kernel.org>
X-Original-To: patchwork-incoming@ozlabs.org
Delivered-To: patchwork-incoming@ozlabs.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by ozlabs.org (Postfix) with ESMTP id E2171140316
	for <patchwork-incoming@ozlabs.org>;
	Wed,  5 Aug 2015 11:06:25 +1000 (AEST)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751000AbbHEBGW (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);
	Tue, 4 Aug 2015 21:06:22 -0400
Received: from shards.monkeyblade.net ([149.20.54.216]:56898 "EHLO
	shards.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751188AbbHEBGD (ORCPT
	<rfc822; sparclinux@vger.kernel.org>); Tue, 4 Aug 2015 21:06:03 -0400
Received: from localhost (74-93-104-98-Washington.hfc.comcastbusiness.net
	[74.93.104.98]) (Authenticated sender: davem-davemloft)
	by shards.monkeyblade.net (Postfix) with ESMTPSA id 61EBB5910D6;
	Tue,  4 Aug 2015 18:06:02 -0700 (PDT)
Date: Tue, 04 Aug 2015 18:06:00 -0700 (PDT)
Message-Id: <20150804.180600.584466976241090832.davem@davemloft.net>
To: jyknight@google.com
Cc: debian-sparc@lists.debian.org, sparclinux@vger.kernel.org,
	aurelien@aurel32.net
Subject: Re: Linux Sparc FPU register corruption
From: David Miller <davem@davemloft.net>
In-Reply-To: <20150610.132207.1606053216447272806.davem@davemloft.net>
References: <20150609.003414.731851656168946673.davem@davemloft.net>
	<20150609.134537.10408041080684182.davem@davemloft.net>
	<20150610.132207.1606053216447272806.davem@davemloft.net>
X-Mailer: Mew version 6.6 on Emacs 24.5 / Mule 6.0 (HANACHIRUSATO)
Mime-Version: 1.0
X-Greylist: Sender succeeded SMTP AUTH, not delayed by
	milter-greylist-4.5.12 (shards.monkeyblade.net
	[149.20.54.216]); Tue, 04 Aug 2015 18:06:02 -0700 (PDT)
Sender: sparclinux-owner@vger.kernel.org
Precedence: bulk
List-ID: <sparclinux.vger.kernel.org>
X-Mailing-List: sparclinux@vger.kernel.org

From: David Miller <davem@davemloft.net>
Date: Wed, 10 Jun 2015 13:22:07 -0700 (PDT)

> Good news, I can reproduce it on my T3.
> 
> I'll try to debug this.

Just wanted to give a status update on this.  I know what is going
wrong and am working on coming up with a fix.  This is a two decades
old bug.

I wrote a sophisticated tracing mechanism to load all of the FPU state
saves and restores and a special software trap that dumps kernel FPU
saving state which is triggerable from the testcase.

What's really important is to generate this state dump exactly when
the FPU corruption first occurs.  So in the testcase I made it jump
out to the error reporting code from within the inner loop of stores.
So it looks like this:
---
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

====================
#if USE_BLKSTORE
  int *mem;
  for (mem = (int*)res; mem < (int*)((char*)res + 0x140000); mem += 16) {
    // 0xf0 == ASI_BLK_P                                                                                                                                                       
    asm volatile("stda %%f0, [%0] 0xf0" : : "r"(mem) : "memory");
    asm volatile("std %%f0, [%0]" : : "r" (&xx[0]) : "memory");
    if (xx[0] != 0x0102030405060708)
            goto do_bug;
  }
#else
 ...
do_bug:
  get_fp_regs(xx);

  int i;
  for(i = 0; i < 32; ++i) {
    if (xx[i] != 0x0102030405060708) {
      // Modifying a low-fp register causes the usleep to not restore the values.                                                                                              
      // asm volatile("fsrc2 %f32, %f0");                                                                                                                                      
      // Calling usleep causes the fp registers to regain their proper values.                                                                                                 
      //usleep(10);                                                                                                                                                            
      __asm__ __volatile__("ta 0x0f");
====================

The "ta 0x0f" is the special software trap that dumps state into the kernel
log.  Part of what gets dumped is a ring buffer of FPU save/restore events,
it looks like this:

FPU fpsaved[0x7] gsr[0x6] xfsr[0x0]
FREGS0 [0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708]
FREGS1 [0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708]
FREGS2 [0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708]
FREGS3 [0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708]
FREGS4 [0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000]
FREGS5 [0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000]
FREGS6 [0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000]
FREGS7 [0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000]
FPU_HIST ent(7) [41 03 09 4a 09 4b 08 08]

It dumps the fsr/gsr saved at the top level of the FPU state stack, and the FPU registers
from the top two FPU state stack slots.  Then it dumps the FPU event history log.

The above log encodes to:

====================
        ETRAP
                VIS_ENTRY(fprs=0x4)
                VIS_EXIT
                RTRAP
                RTRAP_KFPU_RESTORE(fpu_saved=0x4)
        RTRAP
        RTRAP_UFPU(fprs=0x4)
ETRAP
====================

The final ETRAP is the one generated by the application for the "ta 0x0f"
instruction.

At a high level the FPU saving code is trying to minimize the amount
of register saves we do when the kernel wants to use the FPU
internally _or_ we're doing a context switch.  The cpu helps us by
having two dirty bits in the %fprs register, one for the lower half of
the fpu (FPRS_DL) and one for the upper half (FPRS_DU).  This register
also has a "float enable" bit (FPRS_FEF).

The only complex aspect of this dirty register state mechanism is that
there is no dirty bit for the %gsr and %fsr registers, if the enable
bit is set at all we have to assume that both of these register might
be dirty.

Each context of execution needs a place to store the FPU registers,
and there needs to be an event which triggers the restoration of the
saved FPU register state.  These two jobs are done by trap entry and
exit.

Trap entry allocates FPU save area slots by incrementing a counter
(current_thread_info()->fpdepth).  Trap return looks to see if any FPU
state was saved during that trap handler and if so writes that saved
FPU state back into the chip.

The problem occurs when we have a situation like trapping from
userspace with just the FPRS_FEF bit set in the %fprs.  This says
that no registers (besides %fsr and %gsr) are dirty.

If we then trap again into the kernel, and use the FPU, we won't
have to save any of the FPU register set, just %fsr and %gsr.
So on trap return the code only restores %fsr and %gsr, because
that is what we determined needed to be saved.

Then we return back into userspace, without restoring the user FPU
registers from the top-level FPU state saving stack.

The simplest fix is to save all registers when we are using anything
deeper than the top-most FPU state saving stack slot.  A patch
implementing that is below and works for the test case.

There are some related similar issues, especially in VISenterhalf,
that I want to address as well since it could cause similar
corruptions.

I'll post when I have a final patch for everything, thanks for your
patience.

diff --git a/arch/sparc/lib/VISsave.S b/arch/sparc/lib/VISsave.S
index b320ae9..eb11136 100644
--- a/arch/sparc/lib/VISsave.S
+++ b/arch/sparc/lib/VISsave.S
@@ -44,9 +44,8 @@ vis1:	ldub		[%g6 + TI_FPSAVED], %g3
 
 	 stx		%g3, [%g6 + TI_GSR]
 2:	add		%g6, %g1, %g3
-	cmp		%o5, FPRS_DU
-	be,pn		%icc, 6f
-	 sll		%g1, 3, %g1
+	mov		FPRS_DU | FPRS_DL | FPRS_FEF, %o5
+	sll		%g1, 3, %g1
 	stb		%o5, [%g3 + TI_FPSAVED]
 	rd		%gsr, %g2
 	add		%g6, %g1, %g3
@@ -81,25 +80,6 @@ vis1:	ldub		[%g6 + TI_FPSAVED], %g3
 80:	jmpl		%g7 + %g0, %g0
 	 nop
 
-6:	ldub		[%g3 + TI_FPSAVED], %o5
-	or		%o5, FPRS_DU, %o5
-	add		%g6, TI_FPREGS+0x80, %g2
-	stb		%o5, [%g3 + TI_FPSAVED]
-
-	sll		%g1, 5, %g1
-	add		%g6, TI_FPREGS+0xc0, %g3
-	wr		%g0, FPRS_FEF, %fprs
-	membar		#Sync
-	stda		%f32, [%g2 + %g1] ASI_BLK_P
-	stda		%f48, [%g3 + %g1] ASI_BLK_P
-	membar		#Sync
-	ba,pt		%xcc, 80f
-	 nop
-
-	.align		32
-80:	jmpl		%g7 + %g0, %g0
-	 nop
-
 	.align		32
 VISenterhalf:
 	ldub		[%g6 + TI_FPDEPTH], %g1