====================
#if USE_BLKSTORE
int *mem;
for (mem = (int*)res; mem < (int*)((char*)res + 0x140000); mem += 16) {
// 0xf0 == ASI_BLK_P
asm volatile("stda %%f0, [%0] 0xf0" : : "r"(mem) : "memory");
asm volatile("std %%f0, [%0]" : : "r" (&xx[0]) : "memory");
if (xx[0] != 0x0102030405060708)
goto do_bug;
}
#else
...
do_bug:
get_fp_regs(xx);
int i;
for(i = 0; i < 32; ++i) {
if (xx[i] != 0x0102030405060708) {
// Modifying a low-fp register causes the usleep to not restore the values.
// asm volatile("fsrc2 %f32, %f0");
// Calling usleep causes the fp registers to regain their proper values.
//usleep(10);
__asm__ __volatile__("ta 0x0f");
====================
The "ta 0x0f" is the special software trap that dumps state into the kernel
log. Part of what gets dumped is a ring buffer of FPU save/restore events,
it looks like this:
FPU fpsaved[0x7] gsr[0x6] xfsr[0x0]
FREGS0 [0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708]
FREGS1 [0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708]
FREGS2 [0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708]
FREGS3 [0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708:0102030405060708]
FREGS4 [0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000]
FREGS5 [0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000]
FREGS6 [0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000]
FREGS7 [0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000:0000000000000000]
FPU_HIST ent(7) [41 03 09 4a 09 4b 08 08]
It dumps the fsr/gsr saved at the top level of the FPU state stack, and the FPU registers
from the top two FPU state stack slots. Then it dumps the FPU event history log.
The above log encodes to:
====================
ETRAP
VIS_ENTRY(fprs=0x4)
VIS_EXIT
RTRAP
RTRAP_KFPU_RESTORE(fpu_saved=0x4)
RTRAP
RTRAP_UFPU(fprs=0x4)
ETRAP
====================
The final ETRAP is the one generated by the application for the "ta 0x0f"
instruction.
At a high level the FPU saving code is trying to minimize the amount
of register saves we do when the kernel wants to use the FPU
internally _or_ we're doing a context switch. The cpu helps us by
having two dirty bits in the %fprs register, one for the lower half of
the fpu (FPRS_DL) and one for the upper half (FPRS_DU). This register
also has a "float enable" bit (FPRS_FEF).
The only complex aspect of this dirty register state mechanism is that
there is no dirty bit for the %gsr and %fsr registers, if the enable
bit is set at all we have to assume that both of these register might
be dirty.
Each context of execution needs a place to store the FPU registers,
and there needs to be an event which triggers the restoration of the
saved FPU register state. These two jobs are done by trap entry and
exit.
Trap entry allocates FPU save area slots by incrementing a counter
(current_thread_info()->fpdepth). Trap return looks to see if any FPU
state was saved during that trap handler and if so writes that saved
FPU state back into the chip.
The problem occurs when we have a situation like trapping from
userspace with just the FPRS_FEF bit set in the %fprs. This says
that no registers (besides %fsr and %gsr) are dirty.
If we then trap again into the kernel, and use the FPU, we won't
have to save any of the FPU register set, just %fsr and %gsr.
So on trap return the code only restores %fsr and %gsr, because
that is what we determined needed to be saved.
Then we return back into userspace, without restoring the user FPU
registers from the top-level FPU state saving stack.
The simplest fix is to save all registers when we are using anything
deeper than the top-most FPU state saving stack slot. A patch
implementing that is below and works for the test case.
There are some related similar issues, especially in VISenterhalf,
that I want to address as well since it could cause similar
corruptions.
I'll post when I have a final patch for everything, thanks for your
patience.
@@ -44,9 +44,8 @@ vis1: ldub [%g6 + TI_FPSAVED], %g3
stx %g3, [%g6 + TI_GSR]
2: add %g6, %g1, %g3
- cmp %o5, FPRS_DU
- be,pn %icc, 6f
- sll %g1, 3, %g1
+ mov FPRS_DU | FPRS_DL | FPRS_FEF, %o5
+ sll %g1, 3, %g1
stb %o5, [%g3 + TI_FPSAVED]
rd %gsr, %g2
add %g6, %g1, %g3
@@ -81,25 +80,6 @@ vis1: ldub [%g6 + TI_FPSAVED], %g3
80: jmpl %g7 + %g0, %g0
nop
-6: ldub [%g3 + TI_FPSAVED], %o5
- or %o5, FPRS_DU, %o5
- add %g6, TI_FPREGS+0x80, %g2
- stb %o5, [%g3 + TI_FPSAVED]
-
- sll %g1, 5, %g1
- add %g6, TI_FPREGS+0xc0, %g3
- wr %g0, FPRS_FEF, %fprs
- membar #Sync
- stda %f32, [%g2 + %g1] ASI_BLK_P
- stda %f48, [%g3 + %g1] ASI_BLK_P
- membar #Sync
- ba,pt %xcc, 80f
- nop
-
- .align 32
-80: jmpl %g7 + %g0, %g0
- nop
-
.align 32
VISenterhalf:
ldub [%g6 + TI_FPDEPTH], %g1