From patchwork Tue Jun 22 01:46:48 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sandra Loosemore X-Patchwork-Id: 56386 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 6DFA6B6F0C for ; Tue, 22 Jun 2010 11:45:37 +1000 (EST) Received: (qmail 17053 invoked by alias); 22 Jun 2010 01:45:36 -0000 Received: (qmail 17037 invoked by uid 22791); 22 Jun 2010 01:45:33 -0000 X-SWARE-Spam-Status: No, hits=-0.3 required=5.0 tests=AWL, BAYES_50, TW_MF, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 22 Jun 2010 01:45:24 +0000 Received: (qmail 15993 invoked from network); 22 Jun 2010 01:45:22 -0000 Received: from unknown (HELO ?192.168.2.3?) (sandra@127.0.0.2) by mail.codesourcery.com with ESMTPA; 22 Jun 2010 01:45:22 -0000 Message-ID: <4C201608.5060401@codesourcery.com> Date: Mon, 21 Jun 2010 21:46:48 -0400 From: Sandra Loosemore User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: gcc-patches@gcc.gnu.org CC: Julian Brown Subject: [PATCH, ARM]: don't use NEON floating-point without -ffast-math (PR43703) Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org This patch fixes PR43703. To sum up the problem, the autovectorizer currently generates NEON instructions for floating-point operations, but this is incorrect because the NEON floating-point unit doesn't fully support IEEE arithmetic. In particular, denormalized numbers are rounded to zero, leading to precision loss in some cases. So, the solution is, essentially, "don't do that". ;-) This patch adds checks for flag_unsafe_math_operations in the places where NEON instructions would currently otherwise be generated from canonical RTL. To give credit where credit was due, this is really Julian's patch -- my contribution is only in refactoring it a bit to pull out the part that affected the patch I was already preparing to provide canonical RTL for some of the affected NEON instructions. I posted it earlier this evening here: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02101.html I think the patch I'm posting here has some dependencies on the one I've linked to above, though. I tested the two patches together on an arm-none-eabi build using a simulator target, with both NEON and non-NEON compilation options. Assuming the other one is approved, is this one OK to commit too? -Sandra 2010-06-21 Julian Brown Sandra Loosemore PR target/43703 gcc/ * config/arm/vec-common.md (add3, sub3, smin3) (smax3): Disable for NEON float modes when flag_unsafe_math_optimizations is false. * config/arm/neon.md (*add3_neon, *sub3_neon) (*mul3_neon) (mul3add_neon, mul3negadd_neon) (reduc_splus_, reduc_smin_, reduc_smax_): Disable for NEON float modes when flag_unsafe_math_optimizations is false. (quad_halves_v4sf): Only enable if flag_unsafe_math_optimizations is true. * doc/invoke.texi (ARM Options): Add note about floating point vectorization requiring -funsafe-math-optimizations. gcc/testsuite/ * gcc.dg/vect/vect.exp: Add -ffast-math for NEON. * gcc.dg/vect/vect-reduc-6.c: Add XFAIL for NEON. Index: gcc/config/arm/vec-common.md =================================================================== --- gcc/config/arm/vec-common.md (revision 161038) +++ gcc/config/arm/vec-common.md (working copy) @@ -57,7 +57,8 @@ [(set (match_operand:VALL 0 "s_register_operand" "") (plus:VALL (match_operand:VALL 1 "s_register_operand" "") (match_operand:VALL 2 "s_register_operand" "")))] - "TARGET_NEON + "(TARGET_NEON && ((mode != V2SFmode && mode != V4SFmode) + || flag_unsafe_math_optimizations)) || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))" { }) @@ -66,7 +67,8 @@ [(set (match_operand:VALL 0 "s_register_operand" "") (minus:VALL (match_operand:VALL 1 "s_register_operand" "") (match_operand:VALL 2 "s_register_operand" "")))] - "TARGET_NEON + "(TARGET_NEON && ((mode != V2SFmode && mode != V4SFmode) + || flag_unsafe_math_optimizations)) || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))" { }) @@ -75,7 +77,9 @@ [(set (match_operand:VALLW 0 "s_register_operand" "") (mult:VALLW (match_operand:VALLW 1 "s_register_operand" "") (match_operand:VALLW 2 "s_register_operand" "")))] - "TARGET_NEON || (mode == V4HImode && TARGET_REALLY_IWMMXT)" + "(TARGET_NEON && ((mode != V2SFmode && mode != V4SFmode) + || flag_unsafe_math_optimizations)) + || (mode == V4HImode && TARGET_REALLY_IWMMXT)" { }) @@ -83,7 +87,8 @@ [(set (match_operand:VALLW 0 "s_register_operand" "") (smin:VALLW (match_operand:VALLW 1 "s_register_operand" "") (match_operand:VALLW 2 "s_register_operand" "")))] - "TARGET_NEON + "(TARGET_NEON && ((mode != V2SFmode && mode != V4SFmode) + || flag_unsafe_math_optimizations)) || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))" { }) @@ -101,7 +106,8 @@ [(set (match_operand:VALLW 0 "s_register_operand" "") (smax:VALLW (match_operand:VALLW 1 "s_register_operand" "") (match_operand:VALLW 2 "s_register_operand" "")))] - "TARGET_NEON + "(TARGET_NEON && ((mode != V2SFmode && mode != V4SFmode) + || flag_unsafe_math_optimizations)) || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))" { }) Index: gcc/config/arm/neon.md =================================================================== --- gcc/config/arm/neon.md (revision 161038) +++ gcc/config/arm/neon.md (working copy) @@ -833,7 +833,7 @@ [(set (match_operand:VDQ 0 "s_register_operand" "=w") (plus:VDQ (match_operand:VDQ 1 "s_register_operand" "w") (match_operand:VDQ 2 "s_register_operand" "w")))] - "TARGET_NEON" + "TARGET_NEON && (! || flag_unsafe_math_optimizations)" "vadd.\t%0, %1, %2" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "") (const_int 0)) @@ -847,7 +847,7 @@ [(set (match_operand:VDQ 0 "s_register_operand" "=w") (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w") (match_operand:VDQ 2 "s_register_operand" "w")))] - "TARGET_NEON" + "TARGET_NEON && (! || flag_unsafe_math_optimizations)" "vsub.\t%0, %1, %2" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "") (const_int 0)) @@ -861,7 +861,7 @@ [(set (match_operand:VDQ 0 "s_register_operand" "=w") (mult:VDQ (match_operand:VDQ 1 "s_register_operand" "w") (match_operand:VDQ 2 "s_register_operand" "w")))] - "TARGET_NEON" + "TARGET_NEON && (! || flag_unsafe_math_optimizations)" "vmul.\t%0, %1, %2" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "") (const_int 0)) @@ -883,7 +883,7 @@ (plus:VDQ (mult:VDQ (match_operand:VDQ 2 "s_register_operand" "w") (match_operand:VDQ 3 "s_register_operand" "w")) (match_operand:VDQ 1 "s_register_operand" "0")))] - "TARGET_NEON" + "TARGET_NEON && (! || flag_unsafe_math_optimizations)" "vmla.\t%0, %2, %3" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "") (const_int 0)) @@ -905,7 +905,7 @@ (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "0") (mult:VDQ (match_operand:VDQ 2 "s_register_operand" "w") (match_operand:VDQ 3 "s_register_operand" "w"))))] - "TARGET_NEON" + "TARGET_NEON && (! || flag_unsafe_math_optimizations)" "vmls.\t%0, %2, %3" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "") (const_int 0)) @@ -1320,7 +1320,7 @@ (parallel [(const_int 0) (const_int 1)])) (vec_select:V2SF (match_dup 1) (parallel [(const_int 2) (const_int 3)]))))] - "TARGET_NEON" + "TARGET_NEON && flag_unsafe_math_optimizations" ".f32\t%P0, %e1, %f1" [(set_attr "vqh_mnem" "") (set (attr "neon_type") @@ -1455,7 +1455,7 @@ (define_expand "reduc_splus_" [(match_operand:VD 0 "s_register_operand" "") (match_operand:VD 1 "s_register_operand" "")] - "TARGET_NEON" + "TARGET_NEON && (! || flag_unsafe_math_optimizations)" { neon_pairwise_reduce (operands[0], operands[1], mode, &gen_neon_vpadd_internal); @@ -1465,7 +1465,7 @@ (define_expand "reduc_splus_" [(match_operand:VQ 0 "s_register_operand" "") (match_operand:VQ 1 "s_register_operand" "")] - "TARGET_NEON" + "TARGET_NEON && (! || flag_unsafe_math_optimizations)" { rtx step1 = gen_reg_rtx (mode); rtx res_d = gen_reg_rtx (mode); @@ -1500,7 +1500,7 @@ (define_expand "reduc_smin_" [(match_operand:VD 0 "s_register_operand" "") (match_operand:VD 1 "s_register_operand" "")] - "TARGET_NEON" + "TARGET_NEON && (! || flag_unsafe_math_optimizations)" { neon_pairwise_reduce (operands[0], operands[1], mode, &gen_neon_vpsmin); @@ -1510,7 +1510,7 @@ (define_expand "reduc_smin_" [(match_operand:VQ 0 "s_register_operand" "") (match_operand:VQ 1 "s_register_operand" "")] - "TARGET_NEON" + "TARGET_NEON && (! || flag_unsafe_math_optimizations)" { rtx step1 = gen_reg_rtx (mode); rtx res_d = gen_reg_rtx (mode); @@ -1525,7 +1525,7 @@ (define_expand "reduc_smax_" [(match_operand:VD 0 "s_register_operand" "") (match_operand:VD 1 "s_register_operand" "")] - "TARGET_NEON" + "TARGET_NEON && (! || flag_unsafe_math_optimizations)" { neon_pairwise_reduce (operands[0], operands[1], mode, &gen_neon_vpsmax); @@ -1535,7 +1535,7 @@ (define_expand "reduc_smax_" [(match_operand:VQ 0 "s_register_operand" "") (match_operand:VQ 1 "s_register_operand" "")] - "TARGET_NEON" + "TARGET_NEON && (! || flag_unsafe_math_optimizations)" { rtx step1 = gen_reg_rtx (mode); rtx res_d = gen_reg_rtx (mode); Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 161038) +++ gcc/doc/invoke.texi (working copy) @@ -9960,6 +9960,14 @@ of GCC@. If @option{-msoft-float} is specified this specifies the format of floating point values. +If the selected floating-point hardware includes the NEON extension +(e.g. @option{-mfpu}=@samp{neon}), note that floating-point +operations will not be used by GCC's auto-vectorization pass unless +@option{-funsafe-math-optimizations} is also specified. This is +because NEON hardware does not fully implement the IEEE 754 standard for +floating-point arithmetic (in particular denormal values are treated as +zero), so the use of NEON instructions may lead to a loss of precision. + @item -mfp16-format=@var{name} @opindex mfp16-format Specify the format of the @code{__fp16} half-precision floating-point type. Index: gcc/testsuite/gcc.dg/vect/vect.exp =================================================================== --- gcc/testsuite/gcc.dg/vect/vect.exp (revision 161038) +++ gcc/testsuite/gcc.dg/vect/vect.exp (working copy) @@ -105,6 +105,10 @@ if [istarget "powerpc-*paired*"] { set dg-do-what-default run } elseif [is-effective-target arm_neon_ok] { eval lappend DEFAULT_VECTCFLAGS [add_options_for_arm_neon ""] + # NEON does not support denormals, so is not used for vectorization by + # default to avoid loss of precision. We must pass -ffast-math to test + # vectorization of float operations. + lappend DEFAULT_VECTCFLAGS "-ffast-math" if [is-effective-target arm_neon_hw] { set dg-do-what-default run } else { Index: gcc/testsuite/gcc.dg/vect/vect-reduc-6.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-reduc-6.c (revision 161038) +++ gcc/testsuite/gcc.dg/vect/vect-reduc-6.c (working copy) @@ -49,5 +49,6 @@ int main (void) } /* need -ffast-math to vectorizer these loops. */ -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */ +/* ARM NEON passes -ffast-math to these tests, so expect this to fail. */ +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail arm_neon_ok } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */