diff mbox series

[committed] amdgcn: align TImode registers

Message ID eb78dcd6-ef63-40b1-a7e2-8ec2422dcefb@codesourcery.com
State New
Headers show
Series [committed] amdgcn: align TImode registers | expand

Commit Message

Andrew Stubbs Sept. 11, 2020, 10:17 a.m. UTC
This patch fixes an execution failure in which the compiler would 
corrupt TImode values due to missed early clobber problems with 
partially overlapping register allocations.  In fact, adding early 
clobber constraints does not fix the issue because IRA doesn't check 
that for move instructions properly in all circumstances anyway (the 
constraint causes an ICE in postreload).

This patch fixes the problem by ensuring that TImode values are always 
aligned to 4-register boundaries, meaning that inputs and outputs will 
either overlap completely, or not at all, neither of which have 
early-clobber issues.

This is an artificial restriction the hardware not present in hardware , 
but it is the same solution we use for DImode values where we had a lot 
of the same problems.

With the patch I see the following new test passes:

PASS: gfortran.dg/PR95331.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
PASS: gfortran.dg/PR95331.f90   -O3 -g  execution test
PASS: gfortran.dg/gamma_5.f90   -O0  execution test
PASS: gfortran.dg/gamma_5.f90   -O1  execution test
PASS: gfortran.dg/gamma_5.f90   -O2  execution test
PASS: gfortran.dg/gamma_5.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
PASS: gfortran.dg/gamma_5.f90   -O3 -g  execution test
PASS: gfortran.dg/gamma_5.f90   -Os  execution test
PASS: gfortran.dg/intrinsic_pack_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
PASS: gfortran.dg/intrinsic_pack_1.f90   -O3 -g  execution test
PASS: gfortran.dg/optional_absent_5.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
PASS: gfortran.dg/optional_absent_5.f90   -O3 -g  execution test

Andrew

Comments

Andrew Stubbs Sept. 11, 2020, 2:08 p.m. UTC | #1
This is now backported to GCC 10.

Andrew

On 11/09/2020 11:17, Andrew Stubbs wrote:
> This patch fixes an execution failure in which the compiler would 
> corrupt TImode values due to missed early clobber problems with 
> partially overlapping register allocations.  In fact, adding early 
> clobber constraints does not fix the issue because IRA doesn't check 
> that for move instructions properly in all circumstances anyway (the 
> constraint causes an ICE in postreload).
> 
> This patch fixes the problem by ensuring that TImode values are always 
> aligned to 4-register boundaries, meaning that inputs and outputs will 
> either overlap completely, or not at all, neither of which have 
> early-clobber issues.
> 
> This is an artificial restriction the hardware not present in hardware , 
> but it is the same solution we use for DImode values where we had a lot 
> of the same problems.
> 
> With the patch I see the following new test passes:
> 
> PASS: gfortran.dg/PR95331.f90   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  execution test
> PASS: gfortran.dg/PR95331.f90   -O3 -g  execution test
> PASS: gfortran.dg/gamma_5.f90   -O0  execution test
> PASS: gfortran.dg/gamma_5.f90   -O1  execution test
> PASS: gfortran.dg/gamma_5.f90   -O2  execution test
> PASS: gfortran.dg/gamma_5.f90   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  execution test
> PASS: gfortran.dg/gamma_5.f90   -O3 -g  execution test
> PASS: gfortran.dg/gamma_5.f90   -Os  execution test
> PASS: gfortran.dg/intrinsic_pack_1.f90   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> PASS: gfortran.dg/intrinsic_pack_1.f90   -O3 -g  execution test
> PASS: gfortran.dg/optional_absent_5.f90   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> PASS: gfortran.dg/optional_absent_5.f90   -O3 -g  execution test
> 
> Andrew
diff mbox series

Patch

amdgcn: align TImode registers

This prevents execution failures caused by partially overlapping input and
output registers.  This is the same solution already used for DImode.

gcc/ChangeLog:

	* config/gcn/gcn.c (gcn_hard_regno_mode_ok): Align TImode registers.
	* config/gcn/gcn.md: Assert that TImode registers do not early clobber.

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 8b3c4544dd5..84d1fd9a354 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -475,7 +475,8 @@  gcn_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
     return (vgpr_1reg_mode_p (mode)
 	    || (!((regno - FIRST_VGPR_REG) & 1) && vgpr_2reg_mode_p (mode))
 	    /* TImode is used by DImode compare_and_swap.  */
-	    || mode == TImode);
+	    || (mode == TImode
+		&& !((regno - FIRST_VGPR_REG) & 3)));
   return false;
 }
 
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index aeb25fbb931..0e73fea93cf 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -677,6 +677,8 @@  (define_insn_and_split "*movti_insn"
    (set (match_dup 4) (match_dup 5))
    (set (match_dup 6) (match_dup 7))]
   {
+    gcc_assert (rtx_equal_p (operands[0], operands[1])
+		|| !reg_overlap_mentioned_p (operands[0], operands[1]));
     operands[6] = gcn_operand_part (TImode, operands[0], 3);
     operands[7] = gcn_operand_part (TImode, operands[1], 3);
     operands[4] = gcn_operand_part (TImode, operands[0], 2);