From patchwork Tue Jul 28 12:08:01 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 501185 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 20E3D140E10 for ; Tue, 28 Jul 2015 22:08:59 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=osKlkyJE; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=W5LHNoHEAJJT78+gK lleugugKEb3ThVMTLWoRDh3jUKCeiMttINMu9GjYaC/yNWX9OpmPVm/udDN+wWBg w34TvoSTay4tLyplE+2RLQwkosLn4TDRXxzl8snaS8Ox8LKPpV57iKzERWfxsoul QnOqWrz+CIdimDtzohmLq/A/+c= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=D4QPqB29HP9/J6sPhhHdmpc tuwc=; b=osKlkyJEFX90NV2dlizq+CzZrvKUEgiSYOHYPhqLIBHjKtATxBpYn+j K4aBqqg6Ythl8JF25jbmIhggo5XZJqkPziRQk3yNZ4nFttl7dxrc8MUOIotQKSWY q854ucCT/H1QskkrReT7ajUeEEds9FEXiuZCiqaiPbyWaDeoM5/A= Received: (qmail 95488 invoked by alias); 28 Jul 2015 12:08:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 95477 invoked by uid 89); 28 Jul 2015 12:08:51 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 X-HELO: fencepost.gnu.org Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (208.118.235.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 28 Jul 2015 12:08:49 +0000 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48319) by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ZK3gd-00033T-4H for gcc-patches@gnu.org; Tue, 28 Jul 2015 08:08:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZK3gX-0004fs-BJ for gcc-patches@gnu.org; Tue, 28 Jul 2015 08:08:46 -0400 Received: from relay1.mentorg.com ([192.94.38.131]:40110) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZK3gX-0004fX-2S for gcc-patches@gnu.org; Tue, 28 Jul 2015 08:08:41 -0400 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-02.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1ZK3gV-0006t1-7h from Tom_deVries@mentor.com ; Tue, 28 Jul 2015 05:08:39 -0700 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-02.mgc.mentorg.com (137.202.0.106) with Microsoft SMTP Server id 14.3.224.2; Tue, 28 Jul 2015 13:08:22 +0100 Message-ID: <55B770A1.6010308@mentor.com> Date: Tue, 28 Jul 2015 14:08:01 +0200 From: Tom de Vries User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Richard Biener CC: "gcc-patches@gnu.org" Subject: Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1 References: <55B24E1F.1070705@mentor.com> In-Reply-To: X-detected-operating-system: by eggs.gnu.org: Windows NT kernel [generic] [fuzzy] X-Received-From: 192.94.38.131 On 28/07/15 09:59, Richard Biener wrote: > On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries wrote: >> Hi, >> >> this patch allows parallelization and vectorization of reduction operators >> that are guaranteed to not overflow (such as min and max operators), >> independent of the overflow behaviour of the type. >> >> Bootstrapped and reg-tested on x86_64. >> >> OK for trunk? > > Hmm, I don't like that no_overflow_tree_code function. We have a much more > clear understanding which codes may overflow or trap. Thus please add > a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED} like > Done. > bool > operation_overflow_traps (tree type, enum tree_code code) > { > if (!ANY_INTEGRAL_TYPE_P (type) I've changed this test into a gcc_checking_assert. > || !TYPE_OVERFLOW_TRAPS (type)) > return false; > switch (code) > { > case PLUS_EXPR: > case MINUS_EXPR: > case MULT_EXPR: > case LSHIFT_EXPR: > /* Can overflow in various ways */ > case TRUNC_DIV_EXPR: > case EXACT_DIV_EXPR: > case FLOOR_DIV_EXPR: > case CEIL_DIV_EXPR: > /* For INT_MIN / -1 */ > case NEGATE_EXPR: > case ABS_EXPR: > /* For -INT_MIN */ > return true; > default: > return false; > } > } > > and similar variants for _wraps and _undefined. I think we decided at > some point > the compiler should not take advantage of the fact that lshift or > *_div have undefined > behavior on signed integer overflow, similar we only take advantage of > integral-type > overflow behavior, not vector or complex. So we could reduce the > number of cases > the functions return true if we document that it returns true only for > the cases where > the compiler needs to / may assume wrapping behavior does not take place. > As for _traps for example we only have optabs and libfuncs for > plus,minus,mult,negate > and abs. I've tried to capture all of this in the three new functions: - operation_overflows_and_traps - operation_no_overflow_or_wraps - operation_overflows_and_undefined (unused atm) I've also added the graphite bit. OK for trunk, if bootstrap and reg-test succeeds? Thanks, - Tom Allow non-overflow ops in vect_is_simple_reduction_1 2015-07-28 Tom de Vries * tree.c (operation_overflows_and_traps, operation_no_overflow_or_wraps) (operation_overflows_and_undefined): New function. * tree.h (operation_overflows_and_traps, operation_no_overflow_or_wraps) (operation_overflows_and_undefined): Declare. * tree-vect-loop.c (vect_is_simple_reduction_1): Use operation_overflows_and_traps and operation_overflows_and_wraps. * graphite-sese-to-poly.c (is_reduction_operation_p): Same. * gcc.dg/autopar/reduc-2char.c (init_arrays): Mark with attribute optimize ("-ftree-parallelize-loops=0"). Add successful scans for 2 detected reductions. Add xfail scans for 3 detected reductions. * gcc.dg/autopar/reduc-2short.c: Same. * gcc.dg/autopar/reduc-8.c (init_arrays): Mark with attribute optimize ("-ftree-parallelize-loops=0"). Add successful scans for 2 detected reductions. * gcc.dg/vect/trapv-vect-reduc-4.c: Update scan to match vectorized min and max reductions. --- gcc/graphite-sese-to-poly.c | 6 +- gcc/testsuite/gcc.dg/autopar/reduc-2char.c | 10 +- gcc/testsuite/gcc.dg/autopar/reduc-2short.c | 10 +- gcc/testsuite/gcc.dg/autopar/reduc-8.c | 7 +- gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c | 2 +- gcc/tree-vect-loop.c | 5 +- gcc/tree.c | 125 +++++++++++++++++++++++++ gcc/tree.h | 3 + 8 files changed, 153 insertions(+), 15 deletions(-) diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c index c583f16..b57dc9c 100644 --- a/gcc/graphite-sese-to-poly.c +++ b/gcc/graphite-sese-to-poly.c @@ -2614,8 +2614,10 @@ is_reduction_operation_p (gimple stmt) if (FLOAT_TYPE_P (type)) return flag_associative_math; - return (INTEGRAL_TYPE_P (type) - && TYPE_OVERFLOW_WRAPS (type)); + if (ANY_INTEGRAL_TYPE_P (type)) + return operation_no_overflow_or_wraps (type, code); + + return false; } /* Returns true when PHI contains an argument ARG. */ diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c index 14867f3..a2dad44 100644 --- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c +++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c @@ -39,8 +39,9 @@ void main1 (signed char x, signed char max_result, signed char min_result) abort (); } - __attribute__((noinline)) - void init_arrays () +void __attribute__((noinline)) + __attribute__((optimize ("-ftree-parallelize-loops=0"))) +init_arrays () { int i; @@ -60,7 +61,10 @@ int main (void) } -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */ +/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */ + +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */ /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c index 7c19cc5..a50e14f 100644 --- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c +++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c @@ -38,8 +38,9 @@ void main1 (short x, short max_result, short min_result) abort (); } - __attribute__((noinline)) - void init_arrays () +void __attribute__((noinline)) + __attribute__((optimize ("-ftree-parallelize-loops=0"))) +init_arrays () { int i; @@ -58,7 +59,8 @@ int main (void) return 0; } +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */ +/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */ -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */ /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */ - diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c b/gcc/testsuite/gcc.dg/autopar/reduc-8.c index 1d05c48..18ba03d 100644 --- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c +++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c @@ -40,7 +40,8 @@ testmin (const T *c, T init, T result) abort (); } -int main (void) +int __attribute__((optimize ("-ftree-parallelize-loops=0"))) +main (void) { static signed char A[N] = { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, @@ -84,5 +85,5 @@ int main (void) } -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */ -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */ +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c index 2129717..86f9b90 100644 --- a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c +++ b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c @@ -46,4 +46,4 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index c31bfbd..38eab77 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2615,7 +2615,7 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi, } else if (INTEGRAL_TYPE_P (type) && check_reduction) { - if (TYPE_OVERFLOW_TRAPS (type)) + if (operation_overflows_and_traps (type, code)) { /* Changing the order of operations changes the semantics. */ if (dump_enabled_p ()) @@ -2624,7 +2624,8 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi, " (overflow traps): "); return NULL; } - if (need_wrapping_integral_overflow && !TYPE_OVERFLOW_WRAPS (type)) + if (need_wrapping_integral_overflow + && !operation_no_overflow_or_wraps (type, code)) { /* Changing the order of operations changes the semantics. */ if (dump_enabled_p ()) diff --git a/gcc/tree.c b/gcc/tree.c index 94263af..8708727 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -7597,6 +7597,131 @@ commutative_ternary_tree_code (enum tree_code code) return false; } +/* Returns true if CODE operating on operands of type TYPE can overflow, and + fwrapv generates trapping insns for CODE. */ + +bool +operation_overflows_and_traps (tree type, enum tree_code code) +{ + gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type)); + + /* We don't take advantage of integral type overflow behaviour for complex and + vector types. */ + if (!INTEGRAL_TYPE_P (type)) + return true; + + if (!TYPE_OVERFLOW_TRAPS (type)) + return false; + + switch (code) + { + case PLUS_EXPR: + case MINUS_EXPR: + case MULT_EXPR: + case LSHIFT_EXPR: + /* Can overflow in various ways. */ + return true; + case TRUNC_DIV_EXPR: + case EXACT_DIV_EXPR: + case FLOOR_DIV_EXPR: + case CEIL_DIV_EXPR: + /* These operators can overflow, but -fwrapv only generates trapping code + for addition, subtraction and multiplication operations. */ + return false; + case NEGATE_EXPR: + case ABS_EXPR: + /* For -INT_MIN. */ + return true; + default: + return false; + } +} + +/* Returns true if CODE operating on operands of type TYPE cannot overflow, or + wraps on overflow. */ + +bool +operation_no_overflow_or_wraps (tree type, enum tree_code code) +{ + gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type)); + + /* We don't take advantage of integral type overflow behaviour for complex and + vector types. */ + if (!INTEGRAL_TYPE_P (type)) + return false; + + if (TYPE_OVERFLOW_WRAPS (type)) + return true; + + switch (code) + { + case PLUS_EXPR: + case MINUS_EXPR: + case MULT_EXPR: + case LSHIFT_EXPR: + /* Can overflow in various ways. */ + return false; + case TRUNC_DIV_EXPR: + case EXACT_DIV_EXPR: + case FLOOR_DIV_EXPR: + case CEIL_DIV_EXPR: + /* For INT_MIN / -1. */ + return false; + case NEGATE_EXPR: + case ABS_EXPR: + /* For -INT_MIN. */ + return false; + default: + return true; + } +} + +/* Returns true if CODE operating on operands of type TYPE can overflow, and + overflow is undefined. */ + +bool +operation_overflow_and_undefined (tree type, enum tree_code code) +{ + gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type)); + + /* We don't take advantage of integral type overflow behaviour for complex and + vector types. */ + if (!INTEGRAL_TYPE_P (type)) + return false; + + if (!TYPE_OVERFLOW_UNDEFINED (type)) + return false; + + switch (code) + { + case LSHIFT_EXPR: + /* LSHIFT_EXPR can overflow, but we don't take advantage of that: + GCC manual, C Implementation-defined behavior, Integers implementation: + GCC does not use the latitude given in C99 and C11 only to treat + certain aspects of signed << as undefined, but this is subject to + change. */ + return false; + case TRUNC_DIV_EXPR: + case EXACT_DIV_EXPR: + case FLOOR_DIV_EXPR: + case CEIL_DIV_EXPR: + /* These operators can overflow, but we don't take advantage of that. + FIXME: where has this been documented? */ + return false; + case PLUS_EXPR: + case MINUS_EXPR: + case MULT_EXPR: + /* Can overflow in various ways. */ + return true; + case NEGATE_EXPR: + case ABS_EXPR: + /* For -INT_MIN. */ + return true; + default: + return false; + } +} + namespace inchash { diff --git a/gcc/tree.h b/gcc/tree.h index 6df2217..1e44f55 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -4369,6 +4369,9 @@ extern int type_num_arguments (const_tree); extern bool associative_tree_code (enum tree_code); extern bool commutative_tree_code (enum tree_code); extern bool commutative_ternary_tree_code (enum tree_code); +extern bool operation_overflows_and_traps (tree, enum tree_code); +extern bool operation_no_overflow_or_wraps (tree, enum tree_code); +extern bool operation_overflows_and_undefined (tree, enum tree_code); extern tree upper_bound_in_type (tree, tree); extern tree lower_bound_in_type (tree, tree); extern int operand_equal_for_phi_arg_p (const_tree, const_tree); -- 1.9.1