diff mbox

[ia64] Patch for PR target/44583 to fix signed zeros

Message ID 201007292045.o6TKjaU26678@lucas.cup.hp.com
State New
Headers show

Commit Message

Steve Ellcey July 29, 2010, 8:45 p.m. UTC
Here is a patch to fix c-c++-common/torture/complex-sign-add.c and
c-c++-common/torture/complex-sign-sub.c on IA64.  The fix is local to
the IA64 config files but I wanted to send it out before checkin to see
if anyone had comments or issues with it before checking it in.

The change is to restrict the use of f0 (a read-only register containing
zero).  If we want to preserve the proper sign of zeros we can't use f0
as the second argument in an fadd or fsub or as the third argument in an
fma or fms.  This patch will check flag_signed_zeros and continue to
allow the use of f0 if this flag is false but not if the flag is true.

Tested on IA64 Linux and HP-UX.

Steve Ellcey
sje@cup.hp.com


2010-07-29  Steve Ellcey  <sje@cup.hp.com>

	* config/ia64/constraints.md (Z): New.
	* config/ia64/predicates.md (fr_reg_or_signed_fp01_operand): New.
	(xfreg_or_signed_fp01_operand): New.
	* config/ia64/ia64.md (addsf3): Replace fr_reg_or_fp01_operand
	with fr_reg_or_signed_fp01_operand and constraint G with Z.
	(subsf3): Ditto.
	(*maddsf4): Ditto.
	(*msubsf4): Ditto.
	(adddf3): Ditto.
	(adddf3_trunc): Ditto.
	(subdf3): Ditto.
	(*subdf3_trunc): Ditto.
	(*madddf4): Ditto.
	(*madddf4_trunc): Ditto.
	(*msubdf4): Ditto.
	(*msubdf4_trunc): Ditto.
	(addxf3): Replace xfreg_or_fp01_operand with
	xfreg_or_signed_fp01_operand and constraint G with Z.
	(*addxf3_truncsf): Ditto.
	(*addxf3_truncdf): Ditto.
	(subxf3): Ditto.
	(*subxf3_truncsf): Ditto.
	(*subxf3_truncdf): Ditto.
	(*maddxf4): Ditto.
	(*maddxf4_truncsf): Ditto.
	(*maddxf4_truncdf): Ditto.
	(*msubxf4): Ditto.
	(*msubxf4_truncsf): Ditto.
	(*msubxf4_truncdf): Ditto.

Comments

Richard Henderson July 30, 2010, 6:27 p.m. UTC | #1
On 07/29/2010 01:45 PM, Steve Ellcey wrote:
> +(define_constraint "Z"
> +  "1.0 or (0.0 and !flag_signed_zeros)"
> +  (and (match_code "const_double")
> +       (ior (match_test "op == CONST1_RTX (mode)")
> +	    (and (match_test "op == CONST0_RTX (mode)")
> +		 (match_test "!flag_signed_zeros")))))

That can't be right.  The pointer equality there, as in "G",
should mean that we know we really have a positive zero,
which *can* use FP0.

It would seem you're pessimizing every instance of "x = y + 0".

Which operation is it, specifically, that's failing?


r~
Steve Ellcey July 30, 2010, 8:32 p.m. UTC | #2
On Fri, 2010-07-30 at 11:27 -0700, Richard Henderson wrote:
> On 07/29/2010 01:45 PM, Steve Ellcey wrote:
> > +(define_constraint "Z"
> > +  "1.0 or (0.0 and !flag_signed_zeros)"
> > +  (and (match_code "const_double")
> > +       (ior (match_test "op == CONST1_RTX (mode)")
> > +	    (and (match_test "op == CONST0_RTX (mode)")
> > +		 (match_test "!flag_signed_zeros")))))
> 
> That can't be right.  The pointer equality there, as in "G",
> should mean that we know we really have a positive zero,
> which *can* use FP0.
> 
> It would seem you're pessimizing every instance of "x = y + 0".
> 
> Which operation is it, specifically, that's failing?

It is an fadd with f0 that is failing.  If I compile the example in
comment #8 from PR 44583 it passes with my change but not with
the current ToT.  

The ToT code generated for foo (without the patch) is:

foo:
        fadd.s f8 = f8, f0
        fmerge.s f8 = f8, f1
        br.ret.sptk.many b0
        .endp foo#

The new code (using my patch) is:

foo:
        mov f6 = f0
        fadd.s f8 = f8, f6
        fmerge.s f8 = f8, f1
        br.ret.sptk.many b0
        .endp foo#

The old code causes an abort, the new code works.  My understanding of
the problem is that if the hardware sees that the second argument to
fadd (or fsub) is f0 it doesn't do the add operation and passes the
first argument straight through.  Thus if f8 happens to have -0 in it
you get -0 out.  But if the second argument to fadd is a normal register
it does do the add and if the first argument is -0 we add (-0) to (+0)
and get +0 which is the correct answer.

Steve Ellcey
sje@cup.hp.com
Richard Henderson July 30, 2010, 9:11 p.m. UTC | #3
On 07/30/2010 01:32 PM, Steve Ellcey wrote:
> foo:
>         fadd.s f8 = f8, f0
>         fmerge.s f8 = f8, f1
>         br.ret.sptk.many b0
>         .endp foo#
> 
> The new code (using my patch) is:
> 
> foo:
>         mov f6 = f0
>         fadd.s f8 = f8, f6
>         fmerge.s f8 = f8, f1
>         br.ret.sptk.many b0
>         .endp foo#
> 
> The old code causes an abort, the new code works.  My understanding of
> the problem is that if the hardware sees that the second argument to
> fadd (or fsub) is f0 it doesn't do the add operation and passes the
> first argument straight through.

Oh wow, that's a bizarre and unexpected outcome.  I suppose it has to
do with only *really* having fma in the instruction set and doing 
special things with fp[01] as inputs to get fmpy and fadd.

I must withdraw the objection, except to say that your last sentence
above belongs in a comment somewhere.


r~
Richard Biener July 31, 2010, 10:10 a.m. UTC | #4
On Fri, Jul 30, 2010 at 11:11 PM, Richard Henderson <rth@redhat.com> wrote:
> On 07/30/2010 01:32 PM, Steve Ellcey wrote:
>> foo:
>>         fadd.s f8 = f8, f0
>>         fmerge.s f8 = f8, f1
>>         br.ret.sptk.many b0
>>         .endp foo#
>>
>> The new code (using my patch) is:
>>
>> foo:
>>         mov f6 = f0
>>         fadd.s f8 = f8, f6

Could it have used

             fadd.s f8 = f0, f8

?  ISTR that f0 is only problematic as 2nd operand and fadd should
be commutative?  If so I wonder why the register allocator didn't
choose that variant.

Richard.

>>         fmerge.s f8 = f8, f1
>>         br.ret.sptk.many b0
>>         .endp foo#
>>
>> The old code causes an abort, the new code works.  My understanding of
>> the problem is that if the hardware sees that the second argument to
>> fadd (or fsub) is f0 it doesn't do the add operation and passes the
>> first argument straight through.
>
> Oh wow, that's a bizarre and unexpected outcome.  I suppose it has to
> do with only *really* having fma in the instruction set and doing
> special things with fp[01] as inputs to get fmpy and fadd.
>
> I must withdraw the objection, except to say that your last sentence
> above belongs in a comment somewhere.
>
>
> r~
>
Steve Ellcey Aug. 2, 2010, 4:54 p.m. UTC | #5
> 
> ?  ISTR that f0 is only problematic as 2nd operand and fadd should
> be commutative?  If so I wonder why the register allocator didn't
> choose that variant.
> 
> Richard.

You are right about it only being an issue as the 2nd operand.
It looks like the floating point add instruction is missing the %
communitive symbol.  Up until now the two operands had the same
constraints so it wasn't needed.  I'll add that change to my patch.

Steve Ellcey
sje@cup.hp.com
diff mbox

Patch

Index: config/ia64/constraints.md
===================================================================
--- config/ia64/constraints.md	(revision 162667)
+++ config/ia64/constraints.md	(working copy)
@@ -95,6 +95,13 @@  (define_constraint "G"
   (and (match_code "const_double")
        (match_test "op == CONST0_RTX (mode) || op == CONST1_RTX (mode)")))
 
+(define_constraint "Z"
+  "1.0 or (0.0 and !flag_signed_zeros)"
+  (and (match_code "const_double")
+       (ior (match_test "op == CONST1_RTX (mode)")
+	    (and (match_test "op == CONST0_RTX (mode)")
+		 (match_test "!flag_signed_zeros")))))
+
 (define_constraint "H"
   "0.0"
   (and (match_code "const_double")
Index: config/ia64/predicates.md
===================================================================
--- config/ia64/predicates.md	(revision 162667)
+++ config/ia64/predicates.md	(working copy)
@@ -542,6 +542,19 @@  (define_predicate "xfreg_or_fp01_operand
   (and (match_operand 0 "fr_reg_or_fp01_operand")
        (not (match_code "subreg"))))
 
+;; Like fr_reg_or_fp01_operand, but don't allow 0 if flag_signed_zero is set.
+;; Using f0 as the second arg to fadd or fsub, or as the third arg to fma or
+;; fms can cause a zero result to have the wrong sign.
+(define_predicate "fr_reg_or_signed_fp01_operand"
+  (ior (match_operand 0 "fr_register_operand")
+       (and (match_code "const_double")
+	    (match_test "satisfies_constraint_Z (op)"))))
+
+;; Like fr_reg_or_signed_fp01_operand, but don't allow any SUBREGs.
+(define_predicate "xfreg_or_signed_fp01_operand"
+  (and (match_operand 0 "fr_reg_or_signed_fp01_operand")
+       (not (match_code "subreg"))))
+
 ;; True if OP is a constant zero, or a register.
 (define_predicate "fr_reg_or_0_operand"
   (ior (match_operand 0 "fr_register_operand")
Index: config/ia64/ia64.md
===================================================================
--- config/ia64/ia64.md	(revision 162667)
+++ config/ia64/ia64.md	(working copy)
@@ -2664,7 +2664,7 @@  (define_insn_and_split "negti2"
 (define_insn "addsf3"
   [(set (match_operand:SF 0 "fr_register_operand" "=f")
 	(plus:SF (match_operand:SF 1 "fr_reg_or_fp01_operand" "fG")
-		 (match_operand:SF 2 "fr_reg_or_fp01_operand" "fG")))]
+		 (match_operand:SF 2 "fr_reg_or_signed_fp01_operand" "fZ")))]
   ""
   "fadd.s %0 = %F1, %F2"
   [(set_attr "itanium_class" "fmac")])
@@ -2672,7 +2672,7 @@  (define_insn "addsf3"
 (define_insn "subsf3"
   [(set (match_operand:SF 0 "fr_register_operand" "=f")
 	(minus:SF (match_operand:SF 1 "fr_reg_or_fp01_operand" "fG")
-		  (match_operand:SF 2 "fr_reg_or_fp01_operand" "fG")))]
+		  (match_operand:SF 2 "fr_reg_or_signed_fp01_operand" "fZ")))]
   ""
   "fsub.s %0 = %F1, %F2"
   [(set_attr "itanium_class" "fmac")])
@@ -2744,7 +2744,7 @@  (define_insn "*maddsf4"
   [(set (match_operand:SF 0 "fr_register_operand" "=f")
 	(plus:SF (mult:SF (match_operand:SF 1 "fr_reg_or_fp01_operand" "fG")
 			  (match_operand:SF 2 "fr_reg_or_fp01_operand" "fG"))
-		 (match_operand:SF 3 "fr_reg_or_fp01_operand" "fG")))]
+		 (match_operand:SF 3 "fr_reg_or_signed_fp01_operand" "fZ")))]
   "TARGET_FUSED_MADD"
   "fma.s %0 = %F1, %F2, %F3"
   [(set_attr "itanium_class" "fmac")])
@@ -2753,7 +2753,7 @@  (define_insn "*msubsf4"
   [(set (match_operand:SF 0 "fr_register_operand" "=f")
 	(minus:SF (mult:SF (match_operand:SF 1 "fr_reg_or_fp01_operand" "fG")
 			   (match_operand:SF 2 "fr_reg_or_fp01_operand" "fG"))
-		  (match_operand:SF 3 "fr_reg_or_fp01_operand" "fG")))]
+		  (match_operand:SF 3 "fr_reg_or_signed_fp01_operand" "fZ")))]
   "TARGET_FUSED_MADD"
   "fms.s %0 = %F1, %F2, %F3"
   [(set_attr "itanium_class" "fmac")])
@@ -2784,7 +2784,7 @@  (define_insn "*nmaddsf4"
 (define_insn "adddf3"
   [(set (match_operand:DF 0 "fr_register_operand" "=f")
 	(plus:DF (match_operand:DF 1 "fr_reg_or_fp01_operand" "fG")
-		 (match_operand:DF 2 "fr_reg_or_fp01_operand" "fG")))]
+		 (match_operand:DF 2 "fr_reg_or_signed_fp01_operand" "fZ")))]
   ""
   "fadd.d %0 = %F1, %F2"
   [(set_attr "itanium_class" "fmac")])
@@ -2793,7 +2793,7 @@  (define_insn "*adddf3_trunc"
   [(set (match_operand:SF 0 "fr_register_operand" "=f")
 	(float_truncate:SF
 	  (plus:DF (match_operand:DF 1 "fr_reg_or_fp01_operand" "fG")
-		   (match_operand:DF 2 "fr_reg_or_fp01_operand" "fG"))))]
+		   (match_operand:DF 2 "fr_reg_or_signed_fp01_operand" "fZ"))))]
   ""
   "fadd.s %0 = %F1, %F2"
   [(set_attr "itanium_class" "fmac")])
@@ -2801,7 +2801,7 @@  (define_insn "*adddf3_trunc"
 (define_insn "subdf3"
   [(set (match_operand:DF 0 "fr_register_operand" "=f")
 	(minus:DF (match_operand:DF 1 "fr_reg_or_fp01_operand" "fG")
-		  (match_operand:DF 2 "fr_reg_or_fp01_operand" "fG")))]
+		  (match_operand:DF 2 "fr_reg_or_signed_fp01_operand" "fZ")))]
   ""
   "fsub.d %0 = %F1, %F2"
   [(set_attr "itanium_class" "fmac")])
@@ -2810,7 +2810,7 @@  (define_insn "*subdf3_trunc"
   [(set (match_operand:SF 0 "fr_register_operand" "=f")
 	(float_truncate:SF
 	  (minus:DF (match_operand:DF 1 "fr_reg_or_fp01_operand" "fG")
-		    (match_operand:DF 2 "fr_reg_or_fp01_operand" "fG"))))]
+		    (match_operand:DF 2 "fr_reg_or_signed_fp01_operand" "fZ"))))]
   ""
   "fsub.s %0 = %F1, %F2"
   [(set_attr "itanium_class" "fmac")])
@@ -2891,7 +2891,7 @@  (define_insn "*madddf4"
   [(set (match_operand:DF 0 "fr_register_operand" "=f")
 	(plus:DF (mult:DF (match_operand:DF 1 "fr_reg_or_fp01_operand" "fG")
 			  (match_operand:DF 2 "fr_reg_or_fp01_operand" "fG"))
-		 (match_operand:DF 3 "fr_reg_or_fp01_operand" "fG")))]
+		 (match_operand:DF 3 "fr_reg_or_signed_fp01_operand" "fZ")))]
   "TARGET_FUSED_MADD"
   "fma.d %0 = %F1, %F2, %F3"
   [(set_attr "itanium_class" "fmac")])
@@ -2901,7 +2901,7 @@  (define_insn "*madddf4_trunc"
 	(float_truncate:SF
 	  (plus:DF (mult:DF (match_operand:DF 1 "fr_reg_or_fp01_operand" "fG")
 			    (match_operand:DF 2 "fr_reg_or_fp01_operand" "fG"))
-		   (match_operand:DF 3 "fr_reg_or_fp01_operand" "fG"))))]
+		   (match_operand:DF 3 "fr_reg_or_signed_fp01_operand" "fZ"))))]
   "TARGET_FUSED_MADD"
   "fma.s %0 = %F1, %F2, %F3"
   [(set_attr "itanium_class" "fmac")])
@@ -2910,7 +2910,7 @@  (define_insn "*msubdf4"
   [(set (match_operand:DF 0 "fr_register_operand" "=f")
 	(minus:DF (mult:DF (match_operand:DF 1 "fr_reg_or_fp01_operand" "fG")
 			   (match_operand:DF 2 "fr_reg_or_fp01_operand" "fG"))
-		  (match_operand:DF 3 "fr_reg_or_fp01_operand" "fG")))]
+		  (match_operand:DF 3 "fr_reg_or_signed_fp01_operand" "fZ")))]
   "TARGET_FUSED_MADD"
   "fms.d %0 = %F1, %F2, %F3"
   [(set_attr "itanium_class" "fmac")])
@@ -2920,7 +2920,7 @@  (define_insn "*msubdf4_trunc"
 	(float_truncate:SF
 	  (minus:DF (mult:DF (match_operand:DF 1 "fr_reg_or_fp01_operand" "fG")
 			     (match_operand:DF 2 "fr_reg_or_fp01_operand" "fG"))
-		    (match_operand:DF 3 "fr_reg_or_fp01_operand" "fG"))))]
+		    (match_operand:DF 3 "fr_reg_or_signed_fp01_operand" "fZ"))))]
   "TARGET_FUSED_MADD"
   "fms.s %0 = %F1, %F2, %F3"
   [(set_attr "itanium_class" "fmac")])
@@ -2970,7 +2970,7 @@  (define_insn "*nmadddf4_truncsf"
 (define_insn "addxf3"
   [(set (match_operand:XF 0 "fr_register_operand" "=f")
 	(plus:XF (match_operand:XF 1 "xfreg_or_fp01_operand" "fG")
-		 (match_operand:XF 2 "xfreg_or_fp01_operand" "fG")))]
+		 (match_operand:XF 2 "xfreg_or_signed_fp01_operand" "fZ")))]
   ""
   "fadd %0 = %F1, %F2"
   [(set_attr "itanium_class" "fmac")])
@@ -2979,7 +2979,7 @@  (define_insn "*addxf3_truncsf"
   [(set (match_operand:SF 0 "fr_register_operand" "=f")
 	(float_truncate:SF
 	  (plus:XF (match_operand:XF 1 "xfreg_or_fp01_operand" "fG")
-		   (match_operand:XF 2 "xfreg_or_fp01_operand" "fG"))))]
+		   (match_operand:XF 2 "xfreg_or_signed_fp01_operand" "fZ"))))]
   ""
   "fadd.s %0 = %F1, %F2"
   [(set_attr "itanium_class" "fmac")])
@@ -2988,7 +2988,7 @@  (define_insn "*addxf3_truncdf"
   [(set (match_operand:DF 0 "fr_register_operand" "=f")
 	(float_truncate:DF
 	  (plus:XF (match_operand:XF 1 "xfreg_or_fp01_operand" "fG")
-		   (match_operand:XF 2 "xfreg_or_fp01_operand" "fG"))))]
+		   (match_operand:XF 2 "xfreg_or_signed_fp01_operand" "fZ"))))]
   ""
   "fadd.d %0 = %F1, %F2"
   [(set_attr "itanium_class" "fmac")])
@@ -2996,7 +2996,7 @@  (define_insn "*addxf3_truncdf"
 (define_insn "subxf3"
   [(set (match_operand:XF 0 "fr_register_operand" "=f")
 	(minus:XF (match_operand:XF 1 "xfreg_or_fp01_operand" "fG")
-		  (match_operand:XF 2 "xfreg_or_fp01_operand" "fG")))]
+		  (match_operand:XF 2 "xfreg_or_signed_fp01_operand" "fZ")))]
   ""
   "fsub %0 = %F1, %F2"
   [(set_attr "itanium_class" "fmac")])
@@ -3005,7 +3005,7 @@  (define_insn "*subxf3_truncsf"
   [(set (match_operand:SF 0 "fr_register_operand" "=f")
 	(float_truncate:SF
 	  (minus:XF (match_operand:XF 1 "xfreg_or_fp01_operand" "fG")
-		    (match_operand:XF 2 "xfreg_or_fp01_operand" "fG"))))]
+		    (match_operand:XF 2 "xfreg_or_signed_fp01_operand" "fZ"))))]
   ""
   "fsub.s %0 = %F1, %F2"
   [(set_attr "itanium_class" "fmac")])
@@ -3014,7 +3014,7 @@  (define_insn "*subxf3_truncdf"
   [(set (match_operand:DF 0 "fr_register_operand" "=f")
 	(float_truncate:DF
 	  (minus:XF (match_operand:XF 1 "xfreg_or_fp01_operand" "fG")
-		    (match_operand:XF 2 "xfreg_or_fp01_operand" "fG"))))]
+		    (match_operand:XF 2 "xfreg_or_signed_fp01_operand" "fZ"))))]
   ""
   "fsub.d %0 = %F1, %F2"
   [(set_attr "itanium_class" "fmac")])
@@ -3104,7 +3104,7 @@  (define_insn "*maddxf4"
   [(set (match_operand:XF 0 "fr_register_operand" "=f")
 	(plus:XF (mult:XF (match_operand:XF 1 "xfreg_or_fp01_operand" "fG")
 			  (match_operand:XF 2 "xfreg_or_fp01_operand" "fG"))
-		 (match_operand:XF 3 "xfreg_or_fp01_operand" "fG")))]
+		 (match_operand:XF 3 "xfreg_or_signed_fp01_operand" "fZ")))]
   "TARGET_FUSED_MADD"
   "fma %0 = %F1, %F2, %F3"
   [(set_attr "itanium_class" "fmac")])
@@ -3114,7 +3114,7 @@  (define_insn "*maddxf4_truncsf"
 	(float_truncate:SF
 	  (plus:XF (mult:XF (match_operand:XF 1 "xfreg_or_fp01_operand" "fG")
 			    (match_operand:XF 2 "xfreg_or_fp01_operand" "fG"))
-		   (match_operand:XF 3 "xfreg_or_fp01_operand" "fG"))))]
+		   (match_operand:XF 3 "xfreg_or_signed_fp01_operand" "fZ"))))]
   "TARGET_FUSED_MADD"
   "fma.s %0 = %F1, %F2, %F3"
   [(set_attr "itanium_class" "fmac")])
@@ -3124,7 +3124,7 @@  (define_insn "*maddxf4_truncdf"
 	(float_truncate:DF
 	  (plus:XF (mult:XF (match_operand:XF 1 "xfreg_or_fp01_operand" "fG")
 			    (match_operand:XF 2 "xfreg_or_fp01_operand" "fG"))
-		   (match_operand:XF 3 "xfreg_or_fp01_operand" "fG"))))]
+		   (match_operand:XF 3 "xfreg_or_signed_fp01_operand" "fZ"))))]
   "TARGET_FUSED_MADD"
   "fma.d %0 = %F1, %F2, %F3"
   [(set_attr "itanium_class" "fmac")])
@@ -3133,7 +3133,7 @@  (define_insn "*msubxf4"
   [(set (match_operand:XF 0 "fr_register_operand" "=f")
 	(minus:XF (mult:XF (match_operand:XF 1 "xfreg_or_fp01_operand" "fG")
 			   (match_operand:XF 2 "xfreg_or_fp01_operand" "fG"))
-		  (match_operand:XF 3 "xfreg_or_fp01_operand" "fG")))]
+		  (match_operand:XF 3 "xfreg_or_signed_fp01_operand" "fZ")))]
   "TARGET_FUSED_MADD"
   "fms %0 = %F1, %F2, %F3"
   [(set_attr "itanium_class" "fmac")])
@@ -3143,7 +3143,7 @@  (define_insn "*msubxf4_truncsf"
 	(float_truncate:SF
 	  (minus:XF (mult:XF (match_operand:XF 1 "xfreg_or_fp01_operand" "fG")
 			     (match_operand:XF 2 "xfreg_or_fp01_operand" "fG"))
-		    (match_operand:XF 3 "xfreg_or_fp01_operand" "fG"))))]
+		    (match_operand:XF 3 "xfreg_or_signed_fp01_operand" "fZ"))))]
   "TARGET_FUSED_MADD"
   "fms.s %0 = %F1, %F2, %F3"
   [(set_attr "itanium_class" "fmac")])
@@ -3153,7 +3153,7 @@  (define_insn "*msubxf4_truncdf"
 	(float_truncate:DF
 	  (minus:XF (mult:XF (match_operand:XF 1 "xfreg_or_fp01_operand" "fG")
 			     (match_operand:XF 2 "xfreg_or_fp01_operand" "fG"))
-		    (match_operand:XF 3 "xfreg_or_fp01_operand" "fG"))))]
+		    (match_operand:XF 3 "xfreg_or_signed_fp01_operand" "fZ"))))]
   "TARGET_FUSED_MADD"
   "fms.d %0 = %F1, %F2, %F3"
   [(set_attr "itanium_class" "fmac")])