From patchwork Fri Jul 31 02:27:51 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aldy Hernandez X-Patchwork-Id: 502385 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 666891402CD for ; Fri, 31 Jul 2015 12:28:08 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=mAL9ZcHH; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=gHNiF/SkAYggDB4vx zTC/10pQ37uz4M1j6PQeBXrEy/gwWnWWUwyayFhFxAnUbjn6jIIZNnc59JbHTKT3 tghrAdE5bahQT45jf86EyTBltRwO7BCzZ6t4+gKoLp+r9vzaiEJ6HPXCjJqprN06 sM0OfVWsn5ITM7KBJVcSnuyXr8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; s=default; bh=bhlmvcK39ilSWvO24wrduTS aqa8=; b=mAL9ZcHHa1Og9nxLMmdXWayBCRHNips0ldcGddm2sFYXH1FqR03JdJO RAwtt+3kPbJxBJKX0WVxvcw5frf/DZJO5eyf+mF+5hSaYhp3xC329ddue9nBqcyG pTIDts+2nmhYpwBsXHFVzdm3G/1z24Qe5Vg7n5bf0y1z7iBrtY9g= Received: (qmail 78590 invoked by alias); 31 Jul 2015 02:27:59 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 78577 invoked by uid 89); 31 Jul 2015 02:27:58 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-6.2 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Fri, 31 Jul 2015 02:27:55 +0000 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (Postfix) with ESMTPS id E94868CF6B for ; Fri, 31 Jul 2015 02:27:53 +0000 (UTC) Received: from reynosa.quesejoda.com (vpn-56-213.rdu2.redhat.com [10.10.56.213]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t6V2RqF7009299; Thu, 30 Jul 2015 22:27:52 -0400 Subject: Re: [gomp4.1] fold ordered depend(sink) clauses To: Jakub Jelinek References: <55B96647.20805@redhat.com> <20150730100114.GO1780@tucnak.redhat.com> Cc: gcc-patches From: Aldy Hernandez Message-ID: <55BADD27.1080402@redhat.com> Date: Thu, 30 Jul 2015 19:27:51 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20150730100114.GO1780@tucnak.redhat.com> On 07/30/2015 03:01 AM, Jakub Jelinek wrote: > On Wed, Jul 29, 2015 at 04:48:23PM -0700, Aldy Hernandez wrote: >> @@ -7490,8 +7503,12 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) >> == TREE_VEC_LENGTH (OMP_FOR_COND (for_stmt))); >> gcc_assert (TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)) >> == TREE_VEC_LENGTH (OMP_FOR_INCR (for_stmt))); >> - gimplify_omp_ctxp->iter_vars.create (TREE_VEC_LENGTH >> - (OMP_FOR_INIT (for_stmt))); >> + gimplify_omp_ctxp->loop_iter_var.create (TREE_VEC_LENGTH >> + (OMP_FOR_INIT (for_stmt))); >> + gimplify_omp_ctxp->loop_dir.create (TREE_VEC_LENGTH >> + (OMP_FOR_INIT (for_stmt))); >> + gimplify_omp_ctxp->loop_const_step.create (TREE_VEC_LENGTH >> + (OMP_FOR_INIT (for_stmt))); >> for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)); i++) >> { >> t = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), i); > > I think the above should be guarded with > tree c = find_omp_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_ORDERED); > if (c && OMP_CLAUSE_ORDERED_EXPR (c)) > The most common case is that ordered(n) is not present, so we should > optimize for that. Done. > >> @@ -7501,10 +7518,10 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) >> gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (decl)) >> || POINTER_TYPE_P (TREE_TYPE (decl))); >> if (TREE_CODE (for_stmt) == OMP_FOR && OMP_FOR_ORIG_DECLS (for_stmt)) >> - gimplify_omp_ctxp->iter_vars.quick_push >> + gimplify_omp_ctxp->loop_iter_var.quick_push >> (TREE_VEC_ELT (OMP_FOR_ORIG_DECLS (for_stmt), i)); >> else >> - gimplify_omp_ctxp->iter_vars.quick_push (decl); >> + gimplify_omp_ctxp->loop_iter_var.quick_push (decl); >> >> /* Make sure the iteration variable is private. */ >> tree c = NULL_TREE; > > And all these etc. pushes too, simply remember is_doacross in some bool > variable. Not applicable. I've moved the code to omp-low, as suggested, where the omp_for_data is available, thus simplifying everything. > >> @@ -8387,6 +8435,228 @@ gimplify_transaction (tree *expr_p, gimple_seq *pre_p) >> return GS_ALL_DONE; >> } >> > > Function comment missing. Fixed. > >> +static gimple >> +gimplify_omp_ordered (tree expr, gimple_seq body) >> +{ > >> + The basic algorithm is to create a sink vector whose first >> + element is the GCD of all the first elements, and whose remaining >> + elements are the minimum of the subsequent columns. >> + >> + We ignore dependence vectors whose first element is zero because >> + such dependencies are known to be executed by the same thread. >> + >> + ?? ^^^^ Is this only applicable for collapse(1) loops? If so, how >> + ?? to handle collapse(N) loops where N > 1? > > For collapse(N) N > 1, you can't ignore first iter var with 0 offset, you can only > ignore if N first iter vars have 0 offset. > > Pretty much for the purpose of the algorithm, you compute "first element" > for collapse(N) N > 1 and ordered(5-N) > for (iv1 = ..; iv1 < ..; iv1 += step1) > for (iv2 = ..; iv2 < ..; iv2 += step2) > for (iv3 = ..; iv3 < ..; iv3 += step3) > for (iv4 = ..; iv4 < ..; iv4 += step4) > depend(iv1 + off1, iv2 + off2, iv3 + off3, iv4 + off4) > as: off(N) + off(N-1)*step(N) + iv(N-2)*step(N)*step(N-1)... > (to be checked the case if the directions differ). > So basically, you want to precompute if you'd add some loop counter > in between loop(N) and loop(N+1) that would be initially zero and incremented > by step(N). The GCD is then performed on this, it is compared to zero, > and then finally split again into the several offsets. > The "is this invalid iteration" check (is offN divisible by stepN) > needs to be done before the merging of course. > > Except, now that I think, it is not that easy. Because we have another > test for "is this invalid iteration", in particularly one performed at > runtime on each of the depends. Say if offN is negative and loop(N) > is forward loop (thus stepN is positive (note, for backward loop stepN > still might be huge positive value for unsigned iterators)), the check > would be if (ivN + offN >= initialN), for forward loop with positive offN > if (ivN + offN < endvalueN) etc. Now, not sure if by computing the GCD and > merging several depend clauses into one we preserve those tests or not. Ughh... as a followup? For now I'm going to bail on collapse > 1 and add a big FIXME. > > All in all, I think it might be really better to do the depend clause > merging during omp lowering in omp-low.c, where you can call > extract_omp_for_data and inspect the iteration variables, have the steps > computed for you etc. That is the spot where we probably want to emit some > GOMP_depend_sink call (and GOMP_depend_source) and prepare arguments for it. Excellent suggestion. Done. > >> +void >> +funk () >> +{ >> +#pragma omp parallel for ordered(2) >> + for (i=0; i < N; i++) >> + for (j=0; j < N; ++j) >> + { >> +/* We should keep the (sink:i,j-2) by virtue of it the i+0. The >> + remaining clauses get folded with a GCD of -2 for `i' and a minimum >> + of -1 for 'j'. */ > > I think we shouldn't keep the useless one (sink: i, j-2) (or keep invalid > ones). Done. How's this? Aldy commit 9c979c4c4cd53092affbf98c05ddd8c9a60915c7 Author: Aldy Hernandez Date: Wed Jul 29 13:39:06 2015 -0700 * wide-int.h (wi::gcd): New. * gimplify.c (struct gimplify_omp_ctx): Rename iter_vars to loop_iter_var. (delete_omp_context): Same. (gimplify_expr): Move code handling OMP_ORDERED into... (gimplify_omp_ordered): ...here. New. * omp-low.c (lower_omp_ordered_clauses): New. (lower_omp_ordered): Call lower_omp_ordered_clauses. testsuite/ * gcc.dg/gomp/sink-fold-1.c: New. * gcc.dg/gomp/sink-fold-2.c: New. * gcc.dg/gomp/sink-fold-3.c: New. * c-c++-common/gomp/sink-4.c: Look in omplower dump file instead. * g++.dg/gomp/sink-3.C: Have sink offset be in the opposite direction. Make variables more readable. diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 2331001..83eb6a1 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -153,7 +153,7 @@ struct gimplify_omp_ctx splay_tree variables; hash_set *privatized_types; /* Iteration variables in an OMP_FOR. */ - vec iter_vars; + vec loop_iter_var; location_t location; enum omp_clause_default_kind default_kind; enum omp_region_type region_type; @@ -392,7 +392,7 @@ delete_omp_context (struct gimplify_omp_ctx *c) { splay_tree_delete (c->variables); delete c->privatized_types; - c->iter_vars.release (); + c->loop_iter_var.release (); XDELETE (c); } @@ -7490,8 +7490,15 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) == TREE_VEC_LENGTH (OMP_FOR_COND (for_stmt))); gcc_assert (TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)) == TREE_VEC_LENGTH (OMP_FOR_INCR (for_stmt))); - gimplify_omp_ctxp->iter_vars.create (TREE_VEC_LENGTH - (OMP_FOR_INIT (for_stmt))); + + tree c = find_omp_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_ORDERED); + bool is_doacross = false; + if (c && OMP_CLAUSE_ORDERED_EXPR (c)) + { + is_doacross = true; + gimplify_omp_ctxp->loop_iter_var.create (TREE_VEC_LENGTH + (OMP_FOR_INIT (for_stmt))); + } for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)); i++) { t = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), i); @@ -7500,11 +7507,14 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) gcc_assert (DECL_P (decl)); gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (decl)) || POINTER_TYPE_P (TREE_TYPE (decl))); - if (TREE_CODE (for_stmt) == OMP_FOR && OMP_FOR_ORIG_DECLS (for_stmt)) - gimplify_omp_ctxp->iter_vars.quick_push - (TREE_VEC_ELT (OMP_FOR_ORIG_DECLS (for_stmt), i)); - else - gimplify_omp_ctxp->iter_vars.quick_push (decl); + if (is_doacross) + { + if (TREE_CODE (for_stmt) == OMP_FOR && OMP_FOR_ORIG_DECLS (for_stmt)) + gimplify_omp_ctxp->loop_iter_var.quick_push + (TREE_VEC_ELT (OMP_FOR_ORIG_DECLS (for_stmt), i)); + else + gimplify_omp_ctxp->loop_iter_var.quick_push (decl); + } /* Make sure the iteration variable is private. */ tree c = NULL_TREE; @@ -8387,6 +8397,53 @@ gimplify_transaction (tree *expr_p, gimple_seq *pre_p) return GS_ALL_DONE; } +/* Gimplify an OMP_ORDERED construct. EXPR is the tree version. BODY + is the OMP_BODY of the original EXPR (which has already been + gimplified so it's not present in the EXPR). + + Return the gimplified GIMPLE_OMP_ORDERED tuple. */ + +static gimple +gimplify_omp_ordered (tree expr, gimple_seq body) +{ + tree c, decls; + int failures = 0; + unsigned int i; + + if (gimplify_omp_ctxp) + for (c = OMP_ORDERED_CLAUSES (expr); c; c = OMP_CLAUSE_CHAIN (c)) + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND + && OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SINK) + { + bool fail = false; + for (decls = OMP_CLAUSE_DECL (c), i = 0; + decls && TREE_CODE (decls) == TREE_LIST; + decls = TREE_CHAIN (decls), ++i) + if (i < gimplify_omp_ctxp->loop_iter_var.length () + && TREE_VALUE (decls) != gimplify_omp_ctxp->loop_iter_var[i]) + { + error_at (OMP_CLAUSE_LOCATION (c), + "variable %qE is not an iteration " + "of outermost loop %d, expected %qE", + TREE_VALUE (decls), i + 1, + gimplify_omp_ctxp->loop_iter_var[i]); + fail = true; + failures++; + } + /* Avoid being too redundant. */ + if (!fail && i != gimplify_omp_ctxp->loop_iter_var.length ()) + { + error_at (OMP_CLAUSE_LOCATION (c), + "number of variables in depend(sink) " + "clause does not match number of " + "iteration variables"); + failures++; + } + } + + return gimple_build_omp_ordered (body, OMP_ORDERED_CLAUSES (expr)); +} + /* Convert the GENERIC expression tree *EXPR_P to GIMPLE. If the expression produces a value to be used as an operand inside a GIMPLE statement, the value will be stored back in *EXPR_P. This value will @@ -9200,38 +9257,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, } break; case OMP_ORDERED: - if (gimplify_omp_ctxp) - for (tree c = OMP_ORDERED_CLAUSES (*expr_p); - c; c = OMP_CLAUSE_CHAIN (c)) - if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND - && OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SINK) - { - unsigned int n = 0; - bool fail = false; - for (tree decls = OMP_CLAUSE_DECL (c); - decls && TREE_CODE (decls) == TREE_LIST; - decls = TREE_CHAIN (decls), ++n) - if (n < gimplify_omp_ctxp->iter_vars.length () - && TREE_VALUE (decls) - != gimplify_omp_ctxp->iter_vars[n]) - { - error_at (OMP_CLAUSE_LOCATION (c), - "variable %qE is not an iteration " - "of outermost loop %d, expected %qE", - TREE_VALUE (decls), n + 1, - gimplify_omp_ctxp->iter_vars[n]); - fail = true; - } - /* Avoid being too redundant. */ - if (!fail - && n != gimplify_omp_ctxp->iter_vars.length ()) - error_at (OMP_CLAUSE_LOCATION (c), - "number of variables in depend(sink) clause " - "does not match number of iteration variables"); - } - - g = gimple_build_omp_ordered (body, - OMP_ORDERED_CLAUSES (*expr_p)); + g = gimplify_omp_ordered (*expr_p, body); break; case OMP_CRITICAL: gimplify_scan_omp_clauses (&OMP_CRITICAL_CLAUSES (*expr_p), diff --git a/gcc/omp-low.c b/gcc/omp-low.c index e7d21ea..a63bf60 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -11969,6 +11969,230 @@ lower_omp_taskgroup (gimple_stmt_iterator *gsi_p, omp_context *ctx) } +/* Fold the OMP_ORDERED_CLAUSES for the OMP_ORDERED in STMT if possible. */ + +static void +lower_omp_ordered_clauses (gomp_ordered *ord_stmt, omp_context *ctx) +{ + struct omp_for_data fd; + if (!ctx->outer || gimple_code (ctx->outer->stmt) != GIMPLE_OMP_FOR) + return; + + /* ?? This is stupid. We need to call extract_omp_for_data just + to get the number of ordered loops... */ + extract_omp_for_data (as_a (ctx->outer->stmt), &fd, NULL); + if (!fd.ordered) + return; + struct omp_for_data_loop *loops + = (struct omp_for_data_loop *) + alloca (fd.ordered * sizeof (struct omp_for_data_loop)); + /* ?? ...and then again to get the actual loops. */ + extract_omp_for_data (as_a (ctx->outer->stmt), &fd, loops); + + /* Canonicalize sink dependence clauses into one folded clause if + possible. + + The basic algorithm is to create a sink vector whose first + element is the GCD of all the first elements, and whose remaining + elements are the minimum of the subsequent columns. + + We ignore dependence vectors whose first element is zero because + such dependencies are known to be executed by the same thread. + + We take into account the direction of the loop, so a minimum + becomes a maximum if the loop is iterating backwards. We also + ignore sink clauses where the loop direction is unknown, or where + the offsets are clearly invalid because they are not a multiple + of the loop increment. + + For example: + + for (i=0; i < N; ++i) + depend(sink:i-8,j-1) + depend(sink:i,j-2) // Completely ignored because i+0. + depend(sink:i-4,j+3) + depend(sink:i-6,j+2) + + Folded clause is: + + depend(sink:-gcd(8,4,6),min(-1,3,2)) + -or- + depend(sink:-2,-1) + */ + + /* FIXME: Computing GCD's where the first element is zero is + non-trivial in the presence of collapsed loops. Do this later. */ + if (fd.collapse > 1) + return; + + unsigned int len = fd.ordered; + vec folded_deps; + folded_deps.create (len); + folded_deps.quick_grow_cleared (len); + /* Bitmap representing dimensions in the final dependency vector that + have been set. */ + sbitmap folded_deps_used = sbitmap_alloc (len); + bitmap_clear (folded_deps_used); + /* TRUE if the first dimension's offset is negative. */ + bool neg_offset_p = false; + + /* ?? We need to save the original iteration variables stored in the + depend clauses, because those in fd.loops[].v have already been + gimplified. Perhaps we should use the gimplified versions. ?? */ + tree *iter_vars = (tree *) alloca (sizeof (tree) * len); + memset (iter_vars, 0, sizeof (tree) * len); + + tree *list_p = gimple_omp_ordered_clauses_ptr (ord_stmt); + tree c; + unsigned int i; + while ((c = *list_p) != NULL) + { + bool remove = false; + + if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND + || OMP_CLAUSE_DEPEND_KIND (c) != OMP_CLAUSE_DEPEND_SINK) + goto next_ordered_clause; + + tree decls; + for (decls = OMP_CLAUSE_DECL (c), i = 0; + decls && TREE_CODE (decls) == TREE_LIST; + decls = TREE_CHAIN (decls), ++i) + { + gcc_assert (i < len); + + enum { + DIR_UNKNOWN, + DIR_FORWARD, + DIR_BACKWARD + } loop_dir; + switch (fd.loops[i].cond_code) + { + case LT_EXPR: + case LE_EXPR: + loop_dir = DIR_FORWARD; + break; + case GT_EXPR: + case GE_EXPR: + loop_dir = DIR_BACKWARD; + break; + default: + loop_dir = DIR_UNKNOWN; + gcc_unreachable (); + } + + /* While the committee makes up its mind, bail if we have any + non-constant steps. */ + if (TREE_CODE (fd.loops[i].step) != INTEGER_CST) + goto gimplify_omp_ordered_ret; + + wide_int offset = TREE_PURPOSE (decls); + if (!iter_vars[i]) + iter_vars[i] = TREE_VALUE (decls); + + /* Ignore invalid offsets that are not multiples of the step. */ + if (!wi::multiple_of_p + (wi::abs (offset), wi::abs ((wide_int) fd.loops[i].step), + UNSIGNED)) + { + warning_at (OMP_CLAUSE_LOCATION (c), 0, + "ignoring sink clause with offset that is not " + "a multiple of the loop step"); + remove = true; + goto next_ordered_clause; + } + + /* Calculate the first dimension. The first dimension of + the folded dependency vector is the GCD of the first + elements, while ignoring any first elements whose offset + is 0. */ + if (i == 0) + { + /* Ignore dependence vectors whose first dimension is 0. */ + if (offset == 0) + { + remove = true; + goto next_ordered_clause; + } + else + { + neg_offset_p = + wi::neg_p (offset, + TYPE_SIGN (TREE_TYPE (TREE_PURPOSE (decls)))); + if ((loop_dir == DIR_FORWARD && !neg_offset_p) + || (loop_dir == DIR_BACKWARD && neg_offset_p)) + { + error_at (OMP_CLAUSE_LOCATION (c), + "first offset must be in opposite direction " + "of loop iterations"); + goto gimplify_omp_ordered_ret; + } + /* Initialize the first time around. */ + if (!bitmap_bit_p (folded_deps_used, 0)) + { + bitmap_set_bit (folded_deps_used, 0); + folded_deps[0] = wi::abs (offset); + } + else + folded_deps[i] = wi::gcd (folded_deps[0], offset, UNSIGNED); + } + } + /* Calculate minimum for the remaining dimensions. */ + else + { + if (!bitmap_bit_p (folded_deps_used, i)) + { + bitmap_set_bit (folded_deps_used, i); + folded_deps[i] = offset; + } + else if ((loop_dir == DIR_FORWARD + && wi::lts_p (offset, folded_deps[i])) + || (loop_dir == DIR_BACKWARD + && wi::gts_p (offset, folded_deps[i]))) + folded_deps[i] = offset; + } + } + + remove = true; + + next_ordered_clause: + if (remove) + *list_p = OMP_CLAUSE_CHAIN (c); + else + list_p = &OMP_CLAUSE_CHAIN (c); + } + + for (i = 0; i < len; ++i) + if (!bitmap_bit_p (folded_deps_used, i)) + break; + if (i == len) + { + if (neg_offset_p) + folded_deps[0] = -folded_deps[0]; + + tree vec = NULL; + i = len; + do + { + i--; + vec = tree_cons (wide_int_to_tree (TREE_TYPE (fd.loops[i].v), + folded_deps[i]), + iter_vars[i], vec); + } + while (i); + + c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_DEPEND); + OMP_CLAUSE_DEPEND_KIND (c) = OMP_CLAUSE_DEPEND_SINK; + OMP_CLAUSE_DECL (c) = vec; + OMP_CLAUSE_CHAIN (c) = gimple_omp_ordered_clauses (ord_stmt); + *gimple_omp_ordered_clauses_ptr (ord_stmt) = c; + } + + gimplify_omp_ordered_ret: + sbitmap_free (folded_deps_used); + folded_deps.release (); +} + + /* Expand code for an OpenMP ordered directive. */ static void @@ -11979,6 +12203,8 @@ lower_omp_ordered (gimple_stmt_iterator *gsi_p, omp_context *ctx) gcall *x; gbind *bind; + lower_omp_ordered_clauses (as_a (stmt), ctx); + push_gimplify_context (); block = make_node (BLOCK); diff --git a/gcc/testsuite/c-c++-common/gomp/sink-4.c b/gcc/testsuite/c-c++-common/gomp/sink-4.c index 7934de2..111178b 100644 --- a/gcc/testsuite/c-c++-common/gomp/sink-4.c +++ b/gcc/testsuite/c-c++-common/gomp/sink-4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-fopenmp -fdump-tree-gimple" } */ +/* { dg-options "-fopenmp -fdump-tree-omplower" } */ /* Test that we adjust pointer offsets for sink variables correctly. */ @@ -22,4 +22,4 @@ funk (foo *begin, foo *end) } } -/* { dg-final { scan-tree-dump-times "depend\\(sink:p\\+400\\)" 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "depend\\(sink:p\\+400.\\)" 1 "omplower" } } */ diff --git a/gcc/testsuite/g++.dg/gomp/sink-3.C b/gcc/testsuite/g++.dg/gomp/sink-3.C index 83a742e..4271d66 100644 --- a/gcc/testsuite/g++.dg/gomp/sink-3.C +++ b/gcc/testsuite/g++.dg/gomp/sink-3.C @@ -8,7 +8,7 @@ typedef struct { char stuff[400]; } foo; -foo *p, *q, *r; +foo *end, *begin, *p; template void @@ -16,7 +16,7 @@ funk () { int i,j; #pragma omp parallel for ordered(1) - for (p=q; p < q; p--) + for (p=end; p > begin; p--) { #pragma omp ordered depend(sink:p+1) void bar (); diff --git a/gcc/testsuite/gcc.dg/gomp/sink-fold-1.c b/gcc/testsuite/gcc.dg/gomp/sink-fold-1.c new file mode 100644 index 0000000..f2961b9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/sink-fold-1.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-fopenmp -fdump-tree-omplower" } */ + +/* Test depend(sink) clause folding. */ + +int i,j, N; + +extern void bar(); + +void +funk () +{ +#pragma omp parallel for ordered(2) + for (i=0; i < N; i++) + for (j=0; j < N; ++j) + { +/* We remove the (sink:i,j-2) by virtue of it the i+0. The remaining + clauses get folded with a GCD of -2 for `i' and a minimum of -1 for + 'j'. */ +#pragma omp ordered \ + depend(sink:i-8,j-1) \ + depend(sink:i, j-2) \ + depend(sink:i-4,j+3) \ + depend(sink:i-6,j+2) + bar(); +#pragma omp ordered depend(source) + } +} + +/* { dg-final { scan-tree-dump-times "omp ordered depend\\(sink:i-2,j-1\\)" 1 "omplower" } } */ diff --git a/gcc/testsuite/gcc.dg/gomp/sink-fold-2.c b/gcc/testsuite/gcc.dg/gomp/sink-fold-2.c new file mode 100644 index 0000000..b3b4ac7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/sink-fold-2.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ + +int i,j, N; + +extern void bar(); + +void +funk () +{ +#pragma omp parallel for ordered(2) + for (i=0; i < N; i += 3) + for (j=0; j < N; ++j) + { +#pragma omp ordered depend(sink:i-8,j-1) /* { dg-warning "ignoring sink clause with offset that is not a multiple" } */ +#pragma omp ordered depend(sink:i+3,j-1) /* { dg-error "first offset must be in opposite direction" } */ + bar(); +#pragma omp ordered depend(source) + } +} diff --git a/gcc/testsuite/gcc.dg/gomp/sink-fold-3.c b/gcc/testsuite/gcc.dg/gomp/sink-fold-3.c new file mode 100644 index 0000000..4d6293c --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/sink-fold-3.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-fopenmp -fdump-tree-omplower" } */ + +/* Test that we fold sink offsets correctly while taking into account + pointer sizes. */ + +typedef struct { + char stuff[400]; +} foo; + +void +funk (foo *begin, foo *end) +{ + foo *p; +#pragma omp parallel for ordered(1) + for (p=end; p > begin; p--) + { +#pragma omp ordered depend(sink:p+2) depend(sink:p+4) + void bar (); + bar(); +#pragma omp ordered depend(source) + } +} + +/* { dg-final { scan-tree-dump-times "depend\\(sink:p\\+800B\\)" 1 "omplower" } } */ diff --git a/gcc/wide-int.h b/gcc/wide-int.h index d8f7b46..c20db61 100644 --- a/gcc/wide-int.h +++ b/gcc/wide-int.h @@ -514,6 +514,7 @@ namespace wi BINARY_FUNCTION div_round (const T1 &, const T2 &, signop, bool * = 0); BINARY_FUNCTION divmod_trunc (const T1 &, const T2 &, signop, WI_BINARY_RESULT (T1, T2) *); + BINARY_FUNCTION gcd (const T1 &, const T2 &, signop = UNSIGNED); BINARY_FUNCTION mod_trunc (const T1 &, const T2 &, signop, bool * = 0); BINARY_FUNCTION smod_trunc (const T1 &, const T2 &); BINARY_FUNCTION umod_trunc (const T1 &, const T2 &); @@ -2653,6 +2654,27 @@ wi::divmod_trunc (const T1 &x, const T2 &y, signop sgn, return quotient; } +/* Compute the greatest common divisor of two numbers A and B using + Euclid's algorithm. */ +template +inline WI_BINARY_RESULT (T1, T2) +wi::gcd (const T1 &a, const T2 &b, signop sgn) +{ + T1 x, y, z; + + x = wi::abs (a); + y = wi::abs (b); + + while (gt_p (x, 0, sgn)) + { + z = mod_trunc (y, x, sgn); + y = x; + x = z; + } + + return y; +} + /* Compute X / Y, rouding towards 0, and return the remainder. Treat X and Y as having the signedness given by SGN. Indicate in *OVERFLOW if the division overflows. */