From patchwork Wed Jul 29 11:22:00 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 501665 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 98727140306 for ; Wed, 29 Jul 2015 21:22:51 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=wpkcI1GN; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=ei54TuN3176N59zIO spI9NmYMZFyND8uRslV6ilRoS0eIK691BcOlNw6DxveoqIDFMVcge2aLwOVjgWys F4mOAipCu4WbCruf8QbE3Ca3hbltQouH6NAdTmmdSLL022gTuyGMsHKAycZ7g9n7 cncAkL0nUDwy5X2wNd26HSMJAA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=jjq2K2BUcW88JDqe1pyO1Pf ijT0=; b=wpkcI1GNQ9yB8hkvouYH/0XICw58h1eCRUNlsro7FJKQWoKCbt3C2KD AgRSt7bw17pSwpufOeI5UQikiDXaVZzgidO3OW4932Q2yLlzf9K3/72iATjbNCR9 WfDgTFxK6o5gFVrT90fkVPE1/q9t6cfu9aNydwqJc0j4U25J6TIQ= Received: (qmail 125953 invoked by alias); 29 Jul 2015 11:22:44 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 125375 invoked by uid 89); 29 Jul 2015 11:22:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 X-HELO: fencepost.gnu.org Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (208.118.235.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 29 Jul 2015 11:22:37 +0000 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34662) by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ZKPRR-0006ps-PA for gcc-patches@gnu.org; Wed, 29 Jul 2015 07:22:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZKPRL-0005bz-Cu for gcc-patches@gnu.org; Wed, 29 Jul 2015 07:22:31 -0400 Received: from relay1.mentorg.com ([192.94.38.131]:47044) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZKPRL-0005bJ-0g for gcc-patches@gnu.org; Wed, 29 Jul 2015 07:22:27 -0400 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-02.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1ZKPRI-0004ah-45 from Tom_deVries@mentor.com ; Wed, 29 Jul 2015 04:22:24 -0700 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-02.mgc.mentorg.com (137.202.0.106) with Microsoft SMTP Server id 14.3.224.2; Wed, 29 Jul 2015 12:22:22 +0100 Message-ID: <55B8B758.8010902@mentor.com> Date: Wed, 29 Jul 2015 13:22:00 +0200 From: Tom de Vries User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Richard Biener CC: "gcc-patches@gnu.org" Subject: Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1 References: <55B24E1F.1070705@mentor.com> <55B770A1.6010308@mentor.com> In-Reply-To: X-detected-operating-system: by eggs.gnu.org: Windows NT kernel [generic] [fuzzy] X-Received-From: 192.94.38.131 On 29/07/15 10:09, Richard Biener wrote: > On Tue, Jul 28, 2015 at 2:08 PM, Tom de Vries wrote: >> On 28/07/15 09:59, Richard Biener wrote: >>> >>> On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries >>> wrote: >>>> >>>> Hi, >>>> >>>> this patch allows parallelization and vectorization of reduction >>>> operators >>>> that are guaranteed to not overflow (such as min and max operators), >>>> independent of the overflow behaviour of the type. >>>> >>>> Bootstrapped and reg-tested on x86_64. >>>> >>>> OK for trunk? >>> >>> >>> Hmm, I don't like that no_overflow_tree_code function. We have a much >>> more >>> clear understanding which codes may overflow or trap. Thus please add >>> a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED} like >>> >> >> Done. >> >>> bool >>> operation_overflow_traps (tree type, enum tree_code code) >>> { >>> if (!ANY_INTEGRAL_TYPE_P (type) >> >> >> I've changed this test into a gcc_checking_assert. >> >> >>> || !TYPE_OVERFLOW_TRAPS (type)) >>> return false; >>> switch (code) >>> { >>> case PLUS_EXPR: >>> case MINUS_EXPR: >>> case MULT_EXPR: >>> case LSHIFT_EXPR: >>> /* Can overflow in various ways */ >>> case TRUNC_DIV_EXPR: >>> case EXACT_DIV_EXPR: >>> case FLOOR_DIV_EXPR: >>> case CEIL_DIV_EXPR: >>> /* For INT_MIN / -1 */ >>> case NEGATE_EXPR: >>> case ABS_EXPR: >>> /* For -INT_MIN */ >>> return true; >>> default: >>> return false; >>> } >>> } >>> >>> and similar variants for _wraps and _undefined. I think we decided at >>> some point >>> the compiler should not take advantage of the fact that lshift or >>> *_div have undefined >>> behavior on signed integer overflow, similar we only take advantage of >>> integral-type >>> overflow behavior, not vector or complex. So we could reduce the >>> number of cases >>> the functions return true if we document that it returns true only for >>> the cases where >>> the compiler needs to / may assume wrapping behavior does not take place. >>> As for _traps for example we only have optabs and libfuncs for >>> plus,minus,mult,negate >>> and abs. >> >> >> I've tried to capture all of this in the three new functions: >> - operation_overflows_and_traps >> - operation_no_overflow_or_wraps >> - operation_overflows_and_undefined (unused atm) >> >> I've also added the graphite bit. >> >> OK for trunk, if bootstrap and reg-test succeeds? > > +/* Returns true if CODE operating on operands of type TYPE can overflow, and > + fwrapv generates trapping insns for CODE. */ > > ftrapv > Done. > +bool > +operation_overflows_and_traps (tree type, enum tree_code code) > +{ > > operation_overflow_traps > > is better wording. Meaning that when the operation overflows then it traps. > AFAIU, the purpose of the function is to enable optimizations when it returns false, that is: - if the operation doesn't overflow, or - if the operation overflows, but doesn't trap. The name operation_overflow_traps does not make clear what it returns when the operation doesn't overflow. If the name doesn't make it clear, you need to be conservative, that is, return true. Which defies the purpose of the function. I've changed the name to operation_no_trapping_overflow (and inverted logic in the function). But perhaps you want operation_overflow_traps with a conservative return for non-overflow operations, and use it like this: ... else if (INTEGRAL_TYPE_P (type) && check_reduction) { if (operation_overflows (type, code) && operation_overflow_traps (type, code)) { /* Changing the order of operations changes the semantics. */ ... ? > + /* We don't take advantage of integral type overflow behaviour for > complex and > + vector types. */ > > We don't generate instructions that trap on overflow for complex or vector types > Done. > + if (!INTEGRAL_TYPE_P (type)) > + return true; > > + switch (code) > + { > + case PLUS_EXPR: > + case MINUS_EXPR: > + case MULT_EXPR: > + case LSHIFT_EXPR: > + /* Can overflow in various ways. */ > > we don't have a trapping optab for lshift > > + case TRUNC_DIV_EXPR: > + case EXACT_DIV_EXPR: > + case FLOOR_DIV_EXPR: > + case CEIL_DIV_EXPR: > > nor division. See optabs.c:optab_for_tree_code. I suggest to only return true > for those. > Before the logic inversion, we return false for these (And also for operators that do not overflow). > +/* Returns true if CODE operating on operands of type TYPE cannot overflow, or > + wraps on overflow. */ > + > +bool > +operation_no_overflow_or_wraps (tree type, enum tree_code code) > +{ > + gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type)); > > operation_overflow_wraps () > > is still my preferred name. > The name operation_overflow_wraps doesn't make clear what it returns if the operation doesn't overflow. And I didn't manage to come up with a better name sofar. Btw, I wonder about something like vector max operation. The current implementation of operation_no_overflow_or_wraps returns false. Could we do: ... /* We don't take advantage of integral type overflow behaviour for complex and vector types. */ if (!INTEGRAL_TYPE_P (type)) return !operation_overflows (type, code); ... ? > +bool > +operation_overflow_and_undefined (tree type, enum tree_code code) > +{ > + gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type)); > + > > operation_overflow_undefined () > The name operation_overflow_undefined doesn't make clear what it returns if the operation doesn't overflow. I've changed it into operation_undefined_overflow. > If you like to keep an explicit operation_can_overflow then there is the > opportunity to split out the switch statement from operation_overflow_wraps > and operation_overflow_undefined. > Done. OK for trunk if bootstrap and reg-test succeeds? Thanks, - Tom Allow non-overflow ops in reductions 2015-07-28 Tom de Vries * tree.c (operations_overflows, operation_no_trapping_overflow) (operation_no_overflow_or_wraps, operation_undefined_overflow): New function. * tree.h (operations_overflows, operation_no_trapping_overflow) (operation_no_overflow_or_wraps, operation_undefined_overflow): Declare. * tree-vect-loop.c (vect_is_simple_reduction_1): Use operation_no_trapping_overflow and operation_no_overflow_or_wraps. * graphite-sese-to-poly.c (is_reduction_operation_p): Use operation_no_overflow_or_wraps. * gcc.dg/autopar/reduc-2char.c (init_arrays): Mark with attribute optimize ("-ftree-parallelize-loops=0"). Add successful scans for 2 detected reductions. Add xfail scans for 3 detected reductions. * gcc.dg/autopar/reduc-2short.c: Same. * gcc.dg/autopar/reduc-8.c (init_arrays): Mark with attribute optimize ("-ftree-parallelize-loops=0"). Add successful scans for 2 detected reductions. * gcc.dg/vect/trapv-vect-reduc-4.c: Update scan to match vectorized min and max reductions. --- gcc/graphite-sese-to-poly.c | 6 +- gcc/testsuite/gcc.dg/autopar/reduc-2char.c | 10 +- gcc/testsuite/gcc.dg/autopar/reduc-2short.c | 10 +- gcc/testsuite/gcc.dg/autopar/reduc-8.c | 7 +- gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c | 2 +- gcc/tree-vect-loop.c | 5 +- gcc/tree.c | 127 +++++++++++++++++++++++++ gcc/tree.h | 4 + 8 files changed, 156 insertions(+), 15 deletions(-) diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c index c583f16..b57dc9c 100644 --- a/gcc/graphite-sese-to-poly.c +++ b/gcc/graphite-sese-to-poly.c @@ -2614,8 +2614,10 @@ is_reduction_operation_p (gimple stmt) if (FLOAT_TYPE_P (type)) return flag_associative_math; - return (INTEGRAL_TYPE_P (type) - && TYPE_OVERFLOW_WRAPS (type)); + if (ANY_INTEGRAL_TYPE_P (type)) + return operation_no_overflow_or_wraps (type, code); + + return false; } /* Returns true when PHI contains an argument ARG. */ diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c index 14867f3..a2dad44 100644 --- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c +++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c @@ -39,8 +39,9 @@ void main1 (signed char x, signed char max_result, signed char min_result) abort (); } - __attribute__((noinline)) - void init_arrays () +void __attribute__((noinline)) + __attribute__((optimize ("-ftree-parallelize-loops=0"))) +init_arrays () { int i; @@ -60,7 +61,10 @@ int main (void) } -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */ +/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */ + +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */ /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c index 7c19cc5..a50e14f 100644 --- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c +++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c @@ -38,8 +38,9 @@ void main1 (short x, short max_result, short min_result) abort (); } - __attribute__((noinline)) - void init_arrays () +void __attribute__((noinline)) + __attribute__((optimize ("-ftree-parallelize-loops=0"))) +init_arrays () { int i; @@ -58,7 +59,8 @@ int main (void) return 0; } +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */ +/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */ -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */ /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */ - diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c b/gcc/testsuite/gcc.dg/autopar/reduc-8.c index 1d05c48..18ba03d 100644 --- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c +++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c @@ -40,7 +40,8 @@ testmin (const T *c, T init, T result) abort (); } -int main (void) +int __attribute__((optimize ("-ftree-parallelize-loops=0"))) +main (void) { static signed char A[N] = { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, @@ -84,5 +85,5 @@ int main (void) } -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */ -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */ +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c index 2129717..86f9b90 100644 --- a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c +++ b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c @@ -46,4 +46,4 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index c31bfbd..b89c09b 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2615,7 +2615,7 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi, } else if (INTEGRAL_TYPE_P (type) && check_reduction) { - if (TYPE_OVERFLOW_TRAPS (type)) + if (!operation_no_trapping_overflow (type, code)) { /* Changing the order of operations changes the semantics. */ if (dump_enabled_p ()) @@ -2624,7 +2624,8 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi, " (overflow traps): "); return NULL; } - if (need_wrapping_integral_overflow && !TYPE_OVERFLOW_WRAPS (type)) + if (need_wrapping_integral_overflow + && !operation_no_overflow_or_wraps (type, code)) { /* Changing the order of operations changes the semantics. */ if (dump_enabled_p ()) diff --git a/gcc/tree.c b/gcc/tree.c index 94263af..08fba14 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -7597,6 +7597,133 @@ commutative_ternary_tree_code (enum tree_code code) return false; } +/* Returns true if CODE operating on operands of type TYPE can overflow. */ + +bool +operation_overflows (tree type, enum tree_code code) +{ + gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type)); + + switch (code) + { + case PLUS_EXPR: + case MINUS_EXPR: + case MULT_EXPR: + case LSHIFT_EXPR: + /* Can overflow in various ways. */ + return true; + case TRUNC_DIV_EXPR: + case EXACT_DIV_EXPR: + case FLOOR_DIV_EXPR: + case CEIL_DIV_EXPR: + /* For INT_MIN / -1. */ + return true; + case NEGATE_EXPR: + case ABS_EXPR: + /* For -INT_MIN. */ + return true; + default: + /* These operators cannot overflow. */ + return false; + } +} + +/* Returns true if CODE operating on operands of type TYPE doesn't overflow, or + ftrapv doesn't generate trapping insns for CODE. */ + +bool +operation_no_trapping_overflow (tree type, enum tree_code code) +{ + gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type)); + + /* We don't generate instructions that trap on overflow for complex or vector + types. */ + if (!INTEGRAL_TYPE_P (type)) + return true; + + if (!TYPE_OVERFLOW_TRAPS (type)) + return true; + + switch (code) + { + case PLUS_EXPR: + case MINUS_EXPR: + case MULT_EXPR: + case NEGATE_EXPR: + case ABS_EXPR: + /* These operators can overflow, and -ftrapv generates trapping code for + these. */ + return false; + case TRUNC_DIV_EXPR: + case EXACT_DIV_EXPR: + case FLOOR_DIV_EXPR: + case CEIL_DIV_EXPR: + case LSHIFT_EXPR: + /* These operators can overflow, but -ftrapv does not generate trapping + code for these. */ + return true; + default: + /* These operators cannot overflow. */ + return true; + } +} + +/* Returns true if CODE operating on operands of type TYPE cannot overflow, or + wraps on overflow. */ + +bool +operation_no_overflow_or_wraps (tree type, enum tree_code code) +{ + gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type)); + + /* We don't take advantage of integral type overflow behaviour for complex and + vector types. */ + if (!INTEGRAL_TYPE_P (type)) + return false; + + if (TYPE_OVERFLOW_WRAPS (type)) + return true; + + return !operation_overflows (type, code); +} + +/* Returns true if CODE operating on operands of type TYPE can overflow, and + overflow is undefined. */ + +bool +operation_undefined_overflow (tree type, enum tree_code code) +{ + gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type)); + + /* We don't take advantage of integral type overflow behaviour for complex and + vector types. */ + if (!INTEGRAL_TYPE_P (type)) + return false; + + if (!TYPE_OVERFLOW_UNDEFINED (type)) + return false; + + switch (code) + { + case LSHIFT_EXPR: + /* LSHIFT_EXPR can overflow, but we don't take advantage of that: + GCC manual, C Implementation-defined behavior, Integers implementation: + GCC does not use the latitude given in C99 and C11 only to treat + certain aspects of signed << as undefined, but this is subject to + change. */ + return false; + case TRUNC_DIV_EXPR: + case EXACT_DIV_EXPR: + case FLOOR_DIV_EXPR: + case CEIL_DIV_EXPR: + /* These operators can overflow, but we don't take advantage of that. + FIXME: where has this been documented? */ + return false; + default: + return operation_overflows (type, code); + } +} + namespace inchash { diff --git a/gcc/tree.h b/gcc/tree.h index 6df2217..76a004e 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -4369,6 +4369,10 @@ extern int type_num_arguments (const_tree); extern bool associative_tree_code (enum tree_code); extern bool commutative_tree_code (enum tree_code); extern bool commutative_ternary_tree_code (enum tree_code); +extern bool operation_overflows (tree, enum tree_code); +extern bool operation_no_trapping_overflow (tree, enum tree_code); +extern bool operation_no_overflow_or_wraps (tree, enum tree_code); +extern bool operation_undefined_overflow (tree, enum tree_code); extern tree upper_bound_in_type (tree, tree); extern tree lower_bound_in_type (tree, tree); extern int operand_equal_for_phi_arg_p (const_tree, const_tree); -- 1.9.1