From patchwork Tue Jun 22 01:46:48 2010
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sandra Loosemore <sandra@codesourcery.com>
X-Patchwork-Id: 56386
Return-Path: 
 <gcc-patches-return-265885-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id 6DFA6B6F0C
	for <incoming@patchwork.ozlabs.org>;
	Tue, 22 Jun 2010 11:45:37 +1000 (EST)
Received: (qmail 17053 invoked by alias); 22 Jun 2010 01:45:36 -0000
Received: (qmail 17037 invoked by uid 22791); 22 Jun 2010 01:45:33 -0000
X-SWARE-Spam-Status: No, hits=-0.3 required=5.0	tests=AWL, BAYES_50, TW_MF,
	T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from mail.codesourcery.com (HELO mail.codesourcery.com)
	(38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with
	ESMTP; Tue, 22 Jun 2010 01:45:24 +0000
Received: (qmail 15993 invoked from network); 22 Jun 2010 01:45:22 -0000
Received: from unknown (HELO ?192.168.2.3?) (sandra@127.0.0.2) by
	mail.codesourcery.com with ESMTPA; 22 Jun 2010 01:45:22 -0000
Message-ID: <4C201608.5060401@codesourcery.com>
Date: Mon, 21 Jun 2010 21:46:48 -0400
From: Sandra Loosemore <sandra@codesourcery.com>
User-Agent: Thunderbird 2.0.0.23 (X11/20090817)
MIME-Version: 1.0
To: gcc-patches@gcc.gnu.org
CC: Julian Brown <julian@codesourcery.com>
Subject: [PATCH,
	ARM]: don't use NEON floating-point without -ffast-math (PR43703)
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

This patch fixes PR43703.  To sum up the problem, the autovectorizer currently 
generates NEON instructions for floating-point operations, but this is incorrect 
because the NEON floating-point unit doesn't fully support IEEE arithmetic.  In 
particular, denormalized numbers are rounded to zero, leading to precision loss 
in some cases.  So, the solution is, essentially, "don't do that".  ;-)  This 
patch adds checks for flag_unsafe_math_operations in the places where NEON 
instructions would currently otherwise be generated from canonical RTL.

To give credit where credit was due, this is really Julian's patch -- my 
contribution is only in refactoring it a bit to pull out the part that affected 
the patch I was already preparing to provide canonical RTL for some of the 
affected NEON instructions.  I posted it earlier this evening here:

http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02101.html

I think the patch I'm posting here has some dependencies on the one I've linked 
to above, though.  I tested the two patches together on an arm-none-eabi build 
using a simulator target, with both NEON and non-NEON compilation options. 
Assuming the other one is approved, is this one OK to commit too?

-Sandra


2010-06-21  Julian Brown  <julian@codesourcery.com>
	    Sandra Loosemore <sandra@codesourcery.com>

	PR target/43703

	gcc/
	* config/arm/vec-common.md (add<mode>3, sub<mode>3, smin<mode>3)
	(smax<mode>3): Disable for NEON float modes when
	flag_unsafe_math_optimizations is false.
	* config/arm/neon.md (*add<mode>3_neon, *sub<mode>3_neon)
	(*mul<mode>3_neon)
	(mul<mode>3add<mode>_neon, mul<mode>3neg<mode>add<mode>_neon)
	(reduc_splus_<mode>, reduc_smin_<mode>, reduc_smax_<mode>): Disable
	for NEON float modes when flag_unsafe_math_optimizations is false.
	(quad_halves_<code>v4sf): Only enable if flag_unsafe_math_optimizations
	is true.
	* doc/invoke.texi (ARM Options): Add note about floating point
	vectorization requiring -funsafe-math-optimizations.

	gcc/testsuite/
	* gcc.dg/vect/vect.exp: Add -ffast-math for NEON.
	* gcc.dg/vect/vect-reduc-6.c: Add XFAIL for NEON.

Index: gcc/config/arm/vec-common.md
===================================================================
--- gcc/config/arm/vec-common.md	(revision 161038)
+++ gcc/config/arm/vec-common.md	(working copy)
@@ -57,7 +57,8 @@
   [(set (match_operand:VALL 0 "s_register_operand" "")
         (plus:VALL (match_operand:VALL 1 "s_register_operand" "")
                    (match_operand:VALL 2 "s_register_operand" "")))]
-  "TARGET_NEON
+  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
+		    || flag_unsafe_math_optimizations))
    || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
 {
 })
@@ -66,7 +67,8 @@
   [(set (match_operand:VALL 0 "s_register_operand" "")
         (minus:VALL (match_operand:VALL 1 "s_register_operand" "")
                     (match_operand:VALL 2 "s_register_operand" "")))]
-  "TARGET_NEON
+  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
+		    || flag_unsafe_math_optimizations))
    || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
 {
 })
@@ -75,7 +77,9 @@
   [(set (match_operand:VALLW 0 "s_register_operand" "")
         (mult:VALLW (match_operand:VALLW 1 "s_register_operand" "")
 		    (match_operand:VALLW 2 "s_register_operand" "")))]
-  "TARGET_NEON || (<MODE>mode == V4HImode && TARGET_REALLY_IWMMXT)"
+  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
+		    || flag_unsafe_math_optimizations))
+   || (<MODE>mode == V4HImode && TARGET_REALLY_IWMMXT)"
 {
 })
 
@@ -83,7 +87,8 @@
   [(set (match_operand:VALLW 0 "s_register_operand" "")
 	(smin:VALLW (match_operand:VALLW 1 "s_register_operand" "")
 		    (match_operand:VALLW 2 "s_register_operand" "")))]
-  "TARGET_NEON
+  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
+		    || flag_unsafe_math_optimizations))
    || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
 {
 })
@@ -101,7 +106,8 @@
   [(set (match_operand:VALLW 0 "s_register_operand" "")
 	(smax:VALLW (match_operand:VALLW 1 "s_register_operand" "")
 		    (match_operand:VALLW 2 "s_register_operand" "")))]
-  "TARGET_NEON
+  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
+		    || flag_unsafe_math_optimizations))
    || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
 {
 })
Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(revision 161038)
+++ gcc/config/arm/neon.md	(working copy)
@@ -833,7 +833,7 @@
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (plus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
 		  (match_operand:VDQ 2 "s_register_operand" "w")))]
-  "TARGET_NEON"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
   "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set (attr "neon_type")
       (if_then_else (ne (symbol_ref "<Is_float_mode>") (const_int 0))
@@ -847,7 +847,7 @@
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
                    (match_operand:VDQ 2 "s_register_operand" "w")))]
-  "TARGET_NEON"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
   "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set (attr "neon_type")
       (if_then_else (ne (symbol_ref "<Is_float_mode>") (const_int 0))
@@ -861,7 +861,7 @@
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (mult:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
                   (match_operand:VDQ 2 "s_register_operand" "w")))]
-  "TARGET_NEON"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
   "vmul.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set (attr "neon_type")
       (if_then_else (ne (symbol_ref "<Is_float_mode>") (const_int 0))
@@ -883,7 +883,7 @@
         (plus:VDQ (mult:VDQ (match_operand:VDQ 2 "s_register_operand" "w")
                             (match_operand:VDQ 3 "s_register_operand" "w"))
 		  (match_operand:VDQ 1 "s_register_operand" "0")))]
-  "TARGET_NEON"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
   "vmla.<V_if_elem>\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
   [(set (attr "neon_type")
       (if_then_else (ne (symbol_ref "<Is_float_mode>") (const_int 0))
@@ -905,7 +905,7 @@
         (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "0")
                    (mult:VDQ (match_operand:VDQ 2 "s_register_operand" "w")
                              (match_operand:VDQ 3 "s_register_operand" "w"))))]
-  "TARGET_NEON"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
   "vmls.<V_if_elem>\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
   [(set (attr "neon_type")
       (if_then_else (ne (symbol_ref "<Is_float_mode>") (const_int 0))
@@ -1320,7 +1320,7 @@
                            (parallel [(const_int 0) (const_int 1)]))
           (vec_select:V2SF (match_dup 1)
                            (parallel [(const_int 2) (const_int 3)]))))]
-  "TARGET_NEON"
+  "TARGET_NEON && flag_unsafe_math_optimizations"
   "<VQH_mnem>.f32\t%P0, %e1, %f1"
   [(set_attr "vqh_mnem" "<VQH_mnem>")
    (set (attr "neon_type")
@@ -1455,7 +1455,7 @@
 (define_expand "reduc_splus_<mode>"
   [(match_operand:VD 0 "s_register_operand" "")
    (match_operand:VD 1 "s_register_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
 {
   neon_pairwise_reduce (operands[0], operands[1], <MODE>mode,
 			&gen_neon_vpadd_internal<mode>);
@@ -1465,7 +1465,7 @@
 (define_expand "reduc_splus_<mode>"
   [(match_operand:VQ 0 "s_register_operand" "")
    (match_operand:VQ 1 "s_register_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
   rtx res_d = gen_reg_rtx (<V_HALF>mode);
@@ -1500,7 +1500,7 @@
 (define_expand "reduc_smin_<mode>"
   [(match_operand:VD 0 "s_register_operand" "")
    (match_operand:VD 1 "s_register_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
 {
   neon_pairwise_reduce (operands[0], operands[1], <MODE>mode,
 			&gen_neon_vpsmin<mode>);
@@ -1510,7 +1510,7 @@
 (define_expand "reduc_smin_<mode>"
   [(match_operand:VQ 0 "s_register_operand" "")
    (match_operand:VQ 1 "s_register_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
   rtx res_d = gen_reg_rtx (<V_HALF>mode);
@@ -1525,7 +1525,7 @@
 (define_expand "reduc_smax_<mode>"
   [(match_operand:VD 0 "s_register_operand" "")
    (match_operand:VD 1 "s_register_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
 {
   neon_pairwise_reduce (operands[0], operands[1], <MODE>mode,
 			&gen_neon_vpsmax<mode>);
@@ -1535,7 +1535,7 @@
 (define_expand "reduc_smax_<mode>"
   [(match_operand:VQ 0 "s_register_operand" "")
    (match_operand:VQ 1 "s_register_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
   rtx res_d = gen_reg_rtx (<V_HALF>mode);
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 161038)
+++ gcc/doc/invoke.texi	(working copy)
@@ -9960,6 +9960,14 @@ of GCC@.
 If @option{-msoft-float} is specified this specifies the format of
 floating point values.
 
+If the selected floating-point hardware includes the NEON extension
+(e.g. @option{-mfpu}=@samp{neon}), note that floating-point
+operations will not be used by GCC's auto-vectorization pass unless
+@option{-funsafe-math-optimizations} is also specified.  This is
+because NEON hardware does not fully implement the IEEE 754 standard for
+floating-point arithmetic (in particular denormal values are treated as
+zero), so the use of NEON instructions may lead to a loss of precision.
+
 @item -mfp16-format=@var{name}
 @opindex mfp16-format
 Specify the format of the @code{__fp16} half-precision floating-point type.
Index: gcc/testsuite/gcc.dg/vect/vect.exp
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect.exp	(revision 161038)
+++ gcc/testsuite/gcc.dg/vect/vect.exp	(working copy)
@@ -105,6 +105,10 @@ if  [istarget "powerpc-*paired*"]  {
     set dg-do-what-default run
 } elseif [is-effective-target arm_neon_ok] {
     eval lappend DEFAULT_VECTCFLAGS [add_options_for_arm_neon ""]
+    # NEON does not support denormals, so is not used for vectorization by
+    # default to avoid loss of precision.  We must pass -ffast-math to test
+    # vectorization of float operations.
+    lappend DEFAULT_VECTCFLAGS "-ffast-math"
     if [is-effective-target arm_neon_hw] {
       set dg-do-what-default run
     } else {
Index: gcc/testsuite/gcc.dg/vect/vect-reduc-6.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-reduc-6.c	(revision 161038)
+++ gcc/testsuite/gcc.dg/vect/vect-reduc-6.c	(working copy)
@@ -49,5 +49,6 @@ int main (void)
 }
 
 /* need -ffast-math to vectorizer these loops.  */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */
+/* ARM NEON passes -ffast-math to these tests, so expect this to fail.  */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail arm_neon_ok } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */