From patchwork Wed Aug 5 01:06:00 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Miller X-Patchwork-Id: 503859 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id E2171140316 for ; Wed, 5 Aug 2015 11:06:25 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751000AbbHEBGW (ORCPT ); Tue, 4 Aug 2015 21:06:22 -0400 Received: from shards.monkeyblade.net ([149.20.54.216]:56898 "EHLO shards.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751188AbbHEBGD (ORCPT ); Tue, 4 Aug 2015 21:06:03 -0400 Received: from localhost (74-93-104-98-Washington.hfc.comcastbusiness.net [74.93.104.98]) (Authenticated sender: davem-davemloft) by shards.monkeyblade.net (Postfix) with ESMTPSA id 61EBB5910D6; Tue, 4 Aug 2015 18:06:02 -0700 (PDT) Date: Tue, 04 Aug 2015 18:06:00 -0700 (PDT) Message-Id: <20150804.180600.584466976241090832.davem@davemloft.net> To: jyknight@google.com Cc: debian-sparc@lists.debian.org, sparclinux@vger.kernel.org, aurelien@aurel32.net Subject: Re: Linux Sparc FPU register corruption From: David Miller In-Reply-To: <20150610.132207.1606053216447272806.davem@davemloft.net> References: <20150609.003414.731851656168946673.davem@davemloft.net> <20150609.134537.10408041080684182.davem@davemloft.net> <20150610.132207.1606053216447272806.davem@davemloft.net> X-Mailer: Mew version 6.6 on Emacs 24.5 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.5.12 (shards.monkeyblade.net [149.20.54.216]); Tue, 04 Aug 2015 18:06:02 -0700 (PDT) Sender: sparclinux-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: sparclinux@vger.kernel.org From: David Miller Date: Wed, 10 Jun 2015 13:22:07 -0700 (PDT) > Good news, I can reproduce it on my T3. > > I'll try to debug this. Just wanted to give a status update on this. I know what is going wrong and am working on coming up with a fix. This is a two decades old bug. I wrote a sophisticated tracing mechanism to load all of the FPU state saves and restores and a special software trap that dumps kernel FPU saving state which is triggerable from the testcase. What's really important is to generate this state dump exactly when the FPU corruption first occurs. So in the testcase I made it jump out to the error reporting code from within the inner loop of stores. So it looks like this: --- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ==================== #if USE_BLKSTORE int *mem; for (mem = (int*)res; mem < (int*)((char*)res + 0x140000); mem += 16) { // 0xf0 == ASI_BLK_P asm volatile("stda %%f0, [%0] 0xf0" : : "r"(mem) : "memory"); asm volatile("std %%f0, [%0]" : : "r" (&xx[0]) : "memory"); if (xx[0] != 0x0102030405060708) goto do_bug; } #else ... do_bug: get_fp_regs(xx); int i; for(i = 0; i < 32; ++i) { if (xx[i] != 0x0102030405060708) { // Modifying a low-fp register causes the usleep to not restore the values. // asm volatile("fsrc2 %f32, %f0"); // Calling usleep causes the fp registers to regain their proper values. //usleep(10); __asm__ __volatile__("ta 0x0f"); ==================== The "ta 0x0f" is the special software trap that dumps state into the kernel log. Part of what gets dumped is a ring buffer of FPU save/restore events, it looks like this: FPU fpsaved[0x7] gsr[0x6] xfsr[0x0] FREGS0 [0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708] FREGS1 [0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708] FREGS2 [0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708] FREGS3 [0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708] FREGS4 [0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000] FREGS5 [0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000] FREGS6 [0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000] FREGS7 [0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000] FPU_HIST ent(7) [41 03 09 4a 09 4b 08 08] It dumps the fsr/gsr saved at the top level of the FPU state stack, and the FPU registers from the top two FPU state stack slots. Then it dumps the FPU event history log. The above log encodes to: ==================== ETRAP VIS_ENTRY(fprs=0x4) VIS_EXIT RTRAP RTRAP_KFPU_RESTORE(fpu_saved=0x4) RTRAP RTRAP_UFPU(fprs=0x4) ETRAP ==================== The final ETRAP is the one generated by the application for the "ta 0x0f" instruction. At a high level the FPU saving code is trying to minimize the amount of register saves we do when the kernel wants to use the FPU internally _or_ we're doing a context switch. The cpu helps us by having two dirty bits in the %fprs register, one for the lower half of the fpu (FPRS_DL) and one for the upper half (FPRS_DU). This register also has a "float enable" bit (FPRS_FEF). The only complex aspect of this dirty register state mechanism is that there is no dirty bit for the %gsr and %fsr registers, if the enable bit is set at all we have to assume that both of these register might be dirty. Each context of execution needs a place to store the FPU registers, and there needs to be an event which triggers the restoration of the saved FPU register state. These two jobs are done by trap entry and exit. Trap entry allocates FPU save area slots by incrementing a counter (current_thread_info()->fpdepth). Trap return looks to see if any FPU state was saved during that trap handler and if so writes that saved FPU state back into the chip. The problem occurs when we have a situation like trapping from userspace with just the FPRS_FEF bit set in the %fprs. This says that no registers (besides %fsr and %gsr) are dirty. If we then trap again into the kernel, and use the FPU, we won't have to save any of the FPU register set, just %fsr and %gsr. So on trap return the code only restores %fsr and %gsr, because that is what we determined needed to be saved. Then we return back into userspace, without restoring the user FPU registers from the top-level FPU state saving stack. The simplest fix is to save all registers when we are using anything deeper than the top-most FPU state saving stack slot. A patch implementing that is below and works for the test case. There are some related similar issues, especially in VISenterhalf, that I want to address as well since it could cause similar corruptions. I'll post when I have a final patch for everything, thanks for your patience. diff --git a/arch/sparc/lib/VISsave.S b/arch/sparc/lib/VISsave.S index b320ae9..eb11136 100644 --- a/arch/sparc/lib/VISsave.S +++ b/arch/sparc/lib/VISsave.S @@ -44,9 +44,8 @@ vis1: ldub [%g6 + TI_FPSAVED], %g3 stx %g3, [%g6 + TI_GSR] 2: add %g6, %g1, %g3 - cmp %o5, FPRS_DU - be,pn %icc, 6f - sll %g1, 3, %g1 + mov FPRS_DU | FPRS_DL | FPRS_FEF, %o5 + sll %g1, 3, %g1 stb %o5, [%g3 + TI_FPSAVED] rd %gsr, %g2 add %g6, %g1, %g3 @@ -81,25 +80,6 @@ vis1: ldub [%g6 + TI_FPSAVED], %g3 80: jmpl %g7 + %g0, %g0 nop -6: ldub [%g3 + TI_FPSAVED], %o5 - or %o5, FPRS_DU, %o5 - add %g6, TI_FPREGS+0x80, %g2 - stb %o5, [%g3 + TI_FPSAVED] - - sll %g1, 5, %g1 - add %g6, TI_FPREGS+0xc0, %g3 - wr %g0, FPRS_FEF, %fprs - membar #Sync - stda %f32, [%g2 + %g1] ASI_BLK_P - stda %f48, [%g3 + %g1] ASI_BLK_P - membar #Sync - ba,pt %xcc, 80f - nop - - .align 32 -80: jmpl %g7 + %g0, %g0 - nop - .align 32 VISenterhalf: ldub [%g6 + TI_FPDEPTH], %g1