diff mbox

[rs6000] Fix PR target/67808, LRA ICE on double to long double conversion

Message ID 20151005171222.GA21123@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner Oct. 5, 2015, 5:12 p.m. UTC
On Fri, Oct 02, 2015 at 02:04:48PM -0500, Peter Bergner wrote:
> PR67808 exposes a problem with the constraints in the *extenddftf2_internal
> pattern, in that it allows TFmode operands to occupy Altivec registers
> which they are not allowed to do.  Reload was able to work around the
> problem, but LRA is more pedantic and it caused it to go into an infinite
> spill loop until it ICEd.  The following patch from Mike changes the TFmode
> output operand to use the "d" constraint instead of "ws".  It also allows
> using the "ws" constraint for the two input operands, since that is allowed
> for DFmode operands.
> 
> This passed bootstraps (with reload on by default and lra on by default)
> and shows no testsuite regressions.  Is this ok for trunk?
> 
> The bug is also present in the FSF 5 branch (4.9 is ok), is this ok for
> that too, assuming my bootstrap/regtesting there are clean?
> 
> Peter
> 
> 
> gcc/
> 	PR target/67808
> 	* config/rs6000/rs6000.md (*extenddftf2_internal): Fix constraints.
> 
> gcc/testsuite/
> 
> 	* gcc.target/powerpc/pr67808.c: New test.
> 

In looking at the constraints in more detail, after the patch we have the
following alternatives:

  #1:	op0 = m,  op1 = ws, op2 = ws
  #2:	op0 = Y,  op1 = r,  op2 = r
  #3:	op0 = d,  op1 = md, op2 = j
  #4:	op0 = d,  op1 = md, op2 = m
  #5:	op0 = &d, op1 = md, op2 = ws

I.e.

  #1:	Store result, input in VSX register, 0.0 in VSX register (VSX only)
  #2:	Store result, input in GPR register, 0.0 in GPR register
  #3:	Result in FPR register, input in FPR or memory, 0.0 direct (VSX only)
  #4:	Result in FPR register, input in FPR or memory, 0.0 in memory
  #5:	Result in FPR reg (no overlap), input in FPR/memory, 0.0 in VSX reg

So, the non-VSX case (were ws is NO_REGS) only deals with alternatives #2 and
#4.

I think (but I don't have a test case) that alternative #1 is potentially a
problem if the input register is ever allocated to an Altivec register and the
address mode is reg+offset (in which case we would not be able to form the
address after the insn is split post-reload.

I have attached a better version of the patch.

This gives the constraints:

  #1:	op0 = m,  op1 = d,  op2 = d
  #2:	op0 = Y,  op1 = r,  op2 = r
  #3:	op0 = d,  op1 = ws, op2 = j
  #4:	op0 = d,  op1 = md, op2 = m
  #5:	op0 = &d, op1 = m,  op2 = md

I.e.

  #1:	Store result, input in FPR register, 0.0 in FPR register
  #2:	Store result, input in GPR register, 0.0 in GPR register
  #3:	Result in FPR reg, input in VSX reg, 0.0 direct (VSX only)
  #4:	Result in FPR reg, input in FPR/memory, 0.0 in memory
  #5:	Result in FPR reg, input in FPR/memory, 0.0 in FPR/memory (no overlap)

[gcc]
2015-10-05  Peter Bergner <bergner@vnet.ibm.com>
	    Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/67808
	* config/rs6000/rs6000.md (extenddftf2_internal): Fix up
	constraints.

[gcc/testsuite]
2015-10-05  Peter Bergner <bergner@vnet.ibm.com>

	PR target/67808
	* gcc.target/powerpc/pr67808.c: New test.

Comments

Peter Bergner Oct. 5, 2015, 5:39 p.m. UTC | #1
On Mon, 2015-10-05 at 13:12 -0400, Michael Meissner wrote:
> I have attached a better version of the patch.

I'll note that I have not committed the earlier patch and will hold
off while we sort out what is best here.



> This gives the constraints:
> 
>   #1:	op0 = m,  op1 = d,  op2 = d
>   #2:	op0 = Y,  op1 = r,  op2 = r
>   #3:	op0 = d,  op1 = ws, op2 = j
>   #4:	op0 = d,  op1 = md, op2 = m
>   #5:	op0 = &d, op1 = m,  op2 = md
> 
> I.e.
> 
>   #1:	Store result, input in FPR register, 0.0 in FPR register
>   #2:	Store result, input in GPR register, 0.0 in GPR register
>   #3:	Result in FPR reg, input in VSX reg, 0.0 direct (VSX only)
>   #4:	Result in FPR reg, input in FPR/memory, 0.0 in memory
>   #5:	Result in FPR reg, input in FPR/memory, 0.0 in FPR/memory (no overlap)

As we discussed on IRC, this alt #3 now does not accept memory as an
input operand and that is the only alternative that allows us to
generate a xxlxor to create a zero fp value.  Either we can change #3's
opt1 from "ws" to "mws" or just create another alternative.  I'll let
you decide what works best.

Since this test is testing whether we ICE when -mlra -mvsx is enabled,
how about if we verify we're also getting the xxlxor too, with the
addition of:

/* { dg-final { scan-assembler-times "xxlxor" 1 } } */

to the test case?

Peter
diff mbox

Patch

Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 228495)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -6505,9 +6505,9 @@  (define_expand "extenddftf2_fprs"
 })
 
 (define_insn_and_split "*extenddftf2_internal"
-  [(set (match_operand:TF 0 "nonimmediate_operand" "=m,Y,ws,d,&d")
-       (float_extend:TF (match_operand:DF 1 "input_operand" "d,r,md,md,md")))
-   (use (match_operand:DF 2 "zero_reg_mem_operand" "d,r,j,m,d"))]
+  [(set (match_operand:TF 0 "nonimmediate_operand" "=m,Y,d,d,&d")
+       (float_extend:TF (match_operand:DF 1 "input_operand" "d,r,ws,md,md")))
+   (use (match_operand:DF 2 "zero_reg_mem_operand" "d,r,j,m,md"))]
   "!TARGET_IEEEQUAD
    && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT 
    && TARGET_LONG_DOUBLE_128"
Index: gcc/testsuite/gcc.target/powerpc/pr67808.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr67808.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr67808.c	(working copy)
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O1 -mvsx -mlra" } */
+
+/* PR 67808: LRA ICEs on simple double to long double conversion test case */
+
+void
+foo (long double *ldb1, double *db1)
+{
+  *ldb1 = *db1;
+}