Message ID | DB5PR0801MB2742FAA88691784D0EC1331BE75D0@DB5PR0801MB2742.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | [PR82726,1/2] Revert previous fixes for PR70754 and PR79663 | expand |
On Fri, Nov 3, 2017 at 1:40 PM, Bin Cheng <Bin.Cheng@arm.com> wrote: > Hi, > As described in message of previous patch: > > This patch set fixes both PRs in the opposite way: Instead of find dominance > insertion position for root reference, we resort zero-distance references of > combined chain by their position information so that new root reference must > dominate others. This should be more efficient because we avoid function call > to stmt_dominates_stmt_p. > Bootstrap and test on x86_64 and AArch64 in patch set. Is it OK? +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */ you don't need avx2_runtime for -mavx2 so please instead use { target { x86_64-*-* i?86-*-* } } +#include <map> +#define INCLUDE_ALGORITHM /* std::sort */ can you please use GCCs own hash_map? Btw... + /* Setup UID for all statements in dominance order. */ + std::map<gimple *, int> stmts_map; + basic_block *bbs = get_loop_body_in_dom_order (loop); + for (i = 0; i < loop->num_nodes; i++) + { + int uid = 0; + basic_block bb = bbs[i]; + + for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi); + gsi_next (&bsi)) + { + gimple *stmt = gsi_stmt (bsi); + if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt)))) + stmts_map[stmt] = uid; why don't you use gimple_[set_]uid ()? Given you do a dominance check you don't even need to do this in dominance order - usually passes just number UIDs in all relevant BBs. There is a helper for that as well, renumber_gimple_stmt_uids_in_blocks which can be used on the get_loop_body result. + /* Sort all ZERO distance references by position. */ + std::sort (&ch1->refs[0], &ch1->refs[0] + j, order_drefs_by_pos); + given ch1->refs is a vec you can use the new vec::qsort_block you added instead of including algorithm and using std::sort. Richard. > Thanks, > bin > 2017-11-02 Bin Cheng <bin.cheng@arm.com> > > PR tree-optimization/82726 > PR tree-optimization/70754 > * tree-predcom.c (<map>, INCLUDE_ALGORITHM): New headers. > (order_drefs_by_pos): New function. > (combine_chains): Move code setting has_max_use_after to... > (try_combine_chains): ...here. New parameter. Sort combined chains > according to position information. > (tree_predictive_commoning_loop): Update call to above function. > (update_pos_for_combined_chains, pcom_stmt_dominates_stmt_p): New. > > gcc/testsuite > 2017-11-02 Bin Cheng <bin.cheng@arm.com> > > PR tree-optimization/82726 > * gcc.dg/tree-ssa/pr82726.c: New test.
On Tue, Nov 7, 2017 at 10:53 AM, Richard Biener <richard.guenther@gmail.com> wrote: > On Fri, Nov 3, 2017 at 1:40 PM, Bin Cheng <Bin.Cheng@arm.com> wrote: >> Hi, >> As described in message of previous patch: >> >> This patch set fixes both PRs in the opposite way: Instead of find dominance >> insertion position for root reference, we resort zero-distance references of >> combined chain by their position information so that new root reference must >> dominate others. This should be more efficient because we avoid function call >> to stmt_dominates_stmt_p. >> Bootstrap and test on x86_64 and AArch64 in patch set. Is it OK? > > +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */ > > you don't need avx2_runtime for -mavx2 so please instead use > { target { x86_64-*-* i?86-*-* } } > > +#include <map> > +#define INCLUDE_ALGORITHM /* std::sort */ > > can you please use GCCs own hash_map? Btw... > > + /* Setup UID for all statements in dominance order. */ > + std::map<gimple *, int> stmts_map; > + basic_block *bbs = get_loop_body_in_dom_order (loop); > + for (i = 0; i < loop->num_nodes; i++) > + { > + int uid = 0; > + basic_block bb = bbs[i]; > + > + for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi); > + gsi_next (&bsi)) > + { > + gimple *stmt = gsi_stmt (bsi); > + if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt)))) > + stmts_map[stmt] = uid; > > why don't you use gimple_[set_]uid ()? Given you do a dominance check > you don't even need to do this in dominance order - usually passes just > number UIDs in all relevant BBs. There is a helper for that as well, > renumber_gimple_stmt_uids_in_blocks which can be used on > the get_loop_body result. Yea, I forgot gimple_[set_]uid interface when doing this. All fixed now. > > + /* Sort all ZERO distance references by position. */ > + std::sort (&ch1->refs[0], &ch1->refs[0] + j, order_drefs_by_pos); > + > > given ch1->refs is a vec you can use the new vec::qsort_block you added > instead of including algorithm and using std::sort. Sorry, I haven't push that patch in. In this updated patch, I fall back to generic qsort so algorithm is not included. Bootstrap and test on x86_64. Is it OK? Thanks, bin 2017-11-10 Bin Cheng <bin.cheng@arm.com> PR tree-optimization/82726 PR tree-optimization/70754 * tree-predcom.c (order_drefs_by_pos): New function. (combine_chains): Move code setting has_max_use_after to... (try_combine_chains): ...here. New parameter. Sort combined chains according to position information. (tree_predictive_commoning_loop): Update call to above function. (update_pos_for_combined_chains, pcom_stmt_dominates_stmt_p): New. gcc/testsuite 2017-11-10 Bin Cheng <bin.cheng@arm.com> PR tree-optimization/82726 * gcc.dg/tree-ssa/pr82726.c: New test. > > Richard. > >> Thanks, >> bin >> 2017-11-02 Bin Cheng <bin.cheng@arm.com> >> >> PR tree-optimization/82726 >> PR tree-optimization/70754 >> * tree-predcom.c (<map>, INCLUDE_ALGORITHM): New headers. >> (order_drefs_by_pos): New function. >> (combine_chains): Move code setting has_max_use_after to... >> (try_combine_chains): ...here. New parameter. Sort combined chains >> according to position information. >> (tree_predictive_commoning_loop): Update call to above function. >> (update_pos_for_combined_chains, pcom_stmt_dominates_stmt_p): New. >> >> gcc/testsuite >> 2017-11-02 Bin Cheng <bin.cheng@arm.com> >> >> PR tree-optimization/82726 >> * gcc.dg/tree-ssa/pr82726.c: New test.
Hmm, the patch... Thanks, bin On Fri, Nov 10, 2017 at 2:13 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Tue, Nov 7, 2017 at 10:53 AM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Fri, Nov 3, 2017 at 1:40 PM, Bin Cheng <Bin.Cheng@arm.com> wrote: >>> Hi, >>> As described in message of previous patch: >>> >>> This patch set fixes both PRs in the opposite way: Instead of find dominance >>> insertion position for root reference, we resort zero-distance references of >>> combined chain by their position information so that new root reference must >>> dominate others. This should be more efficient because we avoid function call >>> to stmt_dominates_stmt_p. >>> Bootstrap and test on x86_64 and AArch64 in patch set. Is it OK? >> >> +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */ >> >> you don't need avx2_runtime for -mavx2 so please instead use >> { target { x86_64-*-* i?86-*-* } } >> >> +#include <map> >> +#define INCLUDE_ALGORITHM /* std::sort */ >> >> can you please use GCCs own hash_map? Btw... >> >> + /* Setup UID for all statements in dominance order. */ >> + std::map<gimple *, int> stmts_map; >> + basic_block *bbs = get_loop_body_in_dom_order (loop); >> + for (i = 0; i < loop->num_nodes; i++) >> + { >> + int uid = 0; >> + basic_block bb = bbs[i]; >> + >> + for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi); >> + gsi_next (&bsi)) >> + { >> + gimple *stmt = gsi_stmt (bsi); >> + if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt)))) >> + stmts_map[stmt] = uid; >> >> why don't you use gimple_[set_]uid ()? Given you do a dominance check >> you don't even need to do this in dominance order - usually passes just >> number UIDs in all relevant BBs. There is a helper for that as well, >> renumber_gimple_stmt_uids_in_blocks which can be used on >> the get_loop_body result. > Yea, I forgot gimple_[set_]uid interface when doing this. All fixed now. >> >> + /* Sort all ZERO distance references by position. */ >> + std::sort (&ch1->refs[0], &ch1->refs[0] + j, order_drefs_by_pos); >> + >> >> given ch1->refs is a vec you can use the new vec::qsort_block you added >> instead of including algorithm and using std::sort. > Sorry, I haven't push that patch in. In this updated patch, I fall > back to generic qsort so algorithm is not included. > > Bootstrap and test on x86_64. Is it OK? > Thanks, > bin > > 2017-11-10 Bin Cheng <bin.cheng@arm.com> > > PR tree-optimization/82726 > PR tree-optimization/70754 > * tree-predcom.c (order_drefs_by_pos): New function. > (combine_chains): Move code setting has_max_use_after to... > (try_combine_chains): ...here. New parameter. Sort combined chains > according to position information. > (tree_predictive_commoning_loop): Update call to above function. > (update_pos_for_combined_chains, pcom_stmt_dominates_stmt_p): New. > > gcc/testsuite > 2017-11-10 Bin Cheng <bin.cheng@arm.com> > > PR tree-optimization/82726 > * gcc.dg/tree-ssa/pr82726.c: New test. > > >> >> Richard. >> >>> Thanks, >>> bin >>> 2017-11-02 Bin Cheng <bin.cheng@arm.com> >>> >>> PR tree-optimization/82726 >>> PR tree-optimization/70754 >>> * tree-predcom.c (<map>, INCLUDE_ALGORITHM): New headers. >>> (order_drefs_by_pos): New function. >>> (combine_chains): Move code setting has_max_use_after to... >>> (try_combine_chains): ...here. New parameter. Sort combined chains >>> according to position information. >>> (tree_predictive_commoning_loop): Update call to above function. >>> (update_pos_for_combined_chains, pcom_stmt_dominates_stmt_p): New. >>> >>> gcc/testsuite >>> 2017-11-02 Bin Cheng <bin.cheng@arm.com> >>> >>> PR tree-optimization/82726 >>> * gcc.dg/tree-ssa/pr82726.c: New test. From f7b9b4ac78f33aee60ecd37ca515f2f8773f5561 Mon Sep 17 00:00:00 2001 From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com> Date: Wed, 1 Nov 2017 17:43:55 +0000 Subject: [PATCH 2/2] pr82726-20171110.txt --- gcc/testsuite/gcc.dg/tree-ssa/pr82726.c | 26 ++++++ gcc/tree-predcom.c | 158 ++++++++++++++++++++++++++++---- 2 files changed, 168 insertions(+), 16 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr82726.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c b/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c new file mode 100644 index 0000000..22bc59d --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 --param tree-reassoc-width=4" } */ +/* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */ + +#define N 40 +#define M 128 +unsigned int in[N+M]; +unsigned short out[N]; + +/* Outer-loop vectorization. */ + +void +foo (){ + int i,j; + unsigned int diff; + + for (i = 0; i < N; i++) { + diff = 0; + for (j = 0; j < M; j+=8) { + diff += in[j+i]; + } + out[i]=(unsigned short)diff; + } + + return; +} diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c index 24d7c9c..3b6c2ea 100644 --- a/gcc/tree-predcom.c +++ b/gcc/tree-predcom.c @@ -1020,6 +1020,17 @@ order_drefs (const void *a, const void *b) return (*da)->pos - (*db)->pos; } +/* Compares two drefs A and B by their position. Callback for qsort. */ + +static int +order_drefs_by_pos (const void *a, const void *b) +{ + const dref *const da = (const dref *) a; + const dref *const db = (const dref *) b; + + return (*da)->pos - (*db)->pos; +} + /* Returns root of the CHAIN. */ static inline dref @@ -2633,7 +2644,6 @@ combine_chains (chain_p ch1, chain_p ch2) bool swap = false; chain_p new_chain; unsigned i; - gimple *root_stmt; tree rslt_type = NULL_TREE; if (ch1 == ch2) @@ -2675,31 +2685,55 @@ combine_chains (chain_p ch1, chain_p ch2) new_chain->refs.safe_push (nw); } - new_chain->has_max_use_after = false; - root_stmt = get_chain_root (new_chain)->stmt; - for (i = 1; new_chain->refs.iterate (i, &nw); i++) - { - if (nw->distance == new_chain->length - && !stmt_dominates_stmt_p (nw->stmt, root_stmt)) - { - new_chain->has_max_use_after = true; - break; - } - } - ch1->combined = true; ch2->combined = true; return new_chain; } -/* Try to combine the CHAINS. */ +/* Recursively update position information of all offspring chains to ROOT + chain's position information. */ + +static void +update_pos_for_combined_chains (chain_p root) +{ + chain_p ch1 = root->ch1, ch2 = root->ch2; + dref ref, ref1, ref2; + for (unsigned j = 0; (root->refs.iterate (j, &ref) + && ch1->refs.iterate (j, &ref1) + && ch2->refs.iterate (j, &ref2)); ++j) + ref1->pos = ref2->pos = ref->pos; + + if (ch1->type == CT_COMBINATION) + update_pos_for_combined_chains (ch1); + if (ch2->type == CT_COMBINATION) + update_pos_for_combined_chains (ch2); +} + +/* Returns true if statement S1 dominates statement S2. */ + +static bool +pcom_stmt_dominates_stmt_p (gimple *s1, gimple *s2) +{ + basic_block bb1 = gimple_bb (s1), bb2 = gimple_bb (s2); + + if (!bb1 || s1 == s2) + return true; + + if (bb1 == bb2) + return gimple_uid (s1) < gimple_uid (s2); + + return dominated_by_p (CDI_DOMINATORS, bb2, bb1); +} + +/* Try to combine the CHAINS in LOOP. */ static void -try_combine_chains (vec<chain_p> *chains) +try_combine_chains (struct loop *loop, vec<chain_p> *chains) { unsigned i, j; chain_p ch1, ch2, cch; auto_vec<chain_p> worklist; + bool combined_p = false; FOR_EACH_VEC_ELT (*chains, i, ch1) if (chain_can_be_combined_p (ch1)) @@ -2721,6 +2755,98 @@ try_combine_chains (vec<chain_p> *chains) { worklist.safe_push (cch); chains->safe_push (cch); + combined_p = true; + break; + } + } + } + if (!combined_p) + return; + + /* Setup UID for all statements in dominance order. */ + basic_block *bbs = get_loop_body (loop); + for (i = 0; i < loop->num_nodes; i++) + { + unsigned uid = 0; + basic_block bb = bbs[i]; + + for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi); + gsi_next (&bsi)) + { + gimple *stmt = gsi_stmt (bsi); + if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt)))) + gimple_set_uid (stmt, uid); + } + + for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi); + gsi_next (&bsi)) + { + gimple *stmt = gsi_stmt (bsi); + if (gimple_code (stmt) != GIMPLE_LABEL && !is_gimple_debug (stmt)) + gimple_set_uid (stmt, ++uid); + } + } + free (bbs); + + /* Re-association in combined chains may generate statements different to + order of references of the original chain. We need to keep references + of combined chain in dominance order so that all uses will be inserted + after definitions. Note: + A) This is necessary for all combined chains. + B) This is only necessary for ZERO distance references because other + references inherit value from loop carried PHIs. + + We first update position information for all combined chains. */ + dref ref; + for (i = 0; chains->iterate (i, &ch1); ++i) + { + if (ch1->type != CT_COMBINATION || ch1->combined) + continue; + + for (j = 0; ch1->refs.iterate (j, &ref); ++j) + ref->pos = gimple_uid (ref->stmt); + + update_pos_for_combined_chains (ch1); + } + /* Then sort references according to newly updated position information. */ + for (i = 0; chains->iterate (i, &ch1); ++i) + { + if (ch1->type != CT_COMBINATION && !ch1->combined) + continue; + + /* Find the first reference with non-ZERO distance. */ + if (ch1->length == 0) + j = ch1->refs.length(); + else + { + for (j = 0; ch1->refs.iterate (j, &ref); ++j) + if (ref->distance != 0) + break; + } + + /* Sort all ZERO distance references by position. */ + qsort (&ch1->refs[0], j, sizeof (ch1->refs[0]), order_drefs_by_pos); + + if (ch1->combined) + continue; + + /* For ZERO length chain, has_max_use_after must be true since root + combined stmt must dominates others. */ + if (ch1->length == 0) + { + ch1->has_max_use_after = true; + continue; + } + /* Check if there is use at max distance after root for combined chains + and set flag accordingly. */ + ch1->has_max_use_after = false; + gimple *root_stmt = get_chain_root (ch1)->stmt; + for (j = 1; ch1->refs.iterate (j, &ref); ++j) + { + if (ref->distance == ch1->length + && !pcom_stmt_dominates_stmt_p (ref->stmt, root_stmt)) + { + ch1->has_max_use_after = true; break; } } @@ -3058,7 +3184,7 @@ tree_predictive_commoning_loop (struct loop *loop) loop_closed_ssa = prepare_finalizers (loop, chains); /* Try to combine the chains that are always worked with together. */ - try_combine_chains (&chains); + try_combine_chains (loop, &chains); insert_init_seqs (loop, chains);
On Fri, Nov 10, 2017 at 02:14:25PM +0000, Bin.Cheng wrote:
> Hmm, the patch...
+ /* Setup UID for all statements in dominance order. */
+ basic_block *bbs = get_loop_body (loop);
+ for (i = 0; i < loop->num_nodes; i++)
+ {
+ unsigned uid = 0;
+ basic_block bb = bbs[i];
+
+ for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
+ gsi_next (&bsi))
+ {
+ gimple *stmt = gsi_stmt (bsi);
+ if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt))))
+ gimple_set_uid (stmt, uid);
+ }
+
+ for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);
+ gsi_next (&bsi))
+ {
+ gimple *stmt = gsi_stmt (bsi);
+ if (gimple_code (stmt) != GIMPLE_LABEL && !is_gimple_debug (stmt))
+ gimple_set_uid (stmt, ++uid);
+ }
for (gimple_stmt_iterator bsi = gsi_start_nondebug_after_labels_bb (bb);
!gsi_end_p (bsi);
gsi_next_nondebug (&bsi))
gimple_set_uid (gsi_stmt (bsi), ++uid);
thanks,
+ }
+ free (bbs);
On Sat, Nov 11, 2017 at 11:19 AM, Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote: > On Fri, Nov 10, 2017 at 02:14:25PM +0000, Bin.Cheng wrote: >> Hmm, the patch... > > + /* Setup UID for all statements in dominance order. */ > + basic_block *bbs = get_loop_body (loop); > + for (i = 0; i < loop->num_nodes; i++) > + { > + unsigned uid = 0; > + basic_block bb = bbs[i]; > + > + for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi); > + gsi_next (&bsi)) > + { > + gimple *stmt = gsi_stmt (bsi); > + if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt)))) > + gimple_set_uid (stmt, uid); > + } > + > + for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi); > + gsi_next (&bsi)) > + { > + gimple *stmt = gsi_stmt (bsi); > + if (gimple_code (stmt) != GIMPLE_LABEL && !is_gimple_debug (stmt)) > + gimple_set_uid (stmt, ++uid); > + } > > for (gimple_stmt_iterator bsi = gsi_start_nondebug_after_labels_bb (bb); > !gsi_end_p (bsi); > gsi_next_nondebug (&bsi)) > gimple_set_uid (gsi_stmt (bsi), ++uid); Or even better instead of the whole loop renumber_gimple_stmt_uids_in_blocks (bbs, loop->num_nodes); Ok with that change. Thanks, Richard. > thanks, > > + } > + free (bbs); >
On Mon, Nov 13, 2017 at 1:20 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Sat, Nov 11, 2017 at 11:19 AM, Bernhard Reutner-Fischer > <rep.dot.nop@gmail.com> wrote: >> On Fri, Nov 10, 2017 at 02:14:25PM +0000, Bin.Cheng wrote: >>> Hmm, the patch... >> >> + /* Setup UID for all statements in dominance order. */ >> + basic_block *bbs = get_loop_body (loop); >> + for (i = 0; i < loop->num_nodes; i++) >> + { >> + unsigned uid = 0; >> + basic_block bb = bbs[i]; >> + >> + for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi); >> + gsi_next (&bsi)) >> + { >> + gimple *stmt = gsi_stmt (bsi); >> + if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt)))) >> + gimple_set_uid (stmt, uid); >> + } >> + >> + for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi); >> + gsi_next (&bsi)) >> + { >> + gimple *stmt = gsi_stmt (bsi); >> + if (gimple_code (stmt) != GIMPLE_LABEL && !is_gimple_debug (stmt)) >> + gimple_set_uid (stmt, ++uid); >> + } >> >> for (gimple_stmt_iterator bsi = gsi_start_nondebug_after_labels_bb (bb); >> !gsi_end_p (bsi); >> gsi_next_nondebug (&bsi)) >> gimple_set_uid (gsi_stmt (bsi), ++uid); > > Or even better instead of the whole loop > > renumber_gimple_stmt_uids_in_blocks (bbs, loop->num_nodes); > > Ok with that change. Right, here is the updated patch. Will commit it later. Thanks, bin > > Thanks, > Richard. > >> thanks, >> >> + } >> + free (bbs); >> From 28a21f4a86ed4e1b5a174b004c45bd4b8ede944f Mon Sep 17 00:00:00 2001 From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com> Date: Wed, 1 Nov 2017 17:43:55 +0000 Subject: [PATCH 2/2] pr82726-20171111.txt --- gcc/testsuite/gcc.dg/tree-ssa/pr82726.c | 26 ++++++ gcc/tree-predcom.c | 138 ++++++++++++++++++++++++++++---- 2 files changed, 148 insertions(+), 16 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr82726.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c b/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c new file mode 100644 index 0000000..22bc59d --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 --param tree-reassoc-width=4" } */ +/* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */ + +#define N 40 +#define M 128 +unsigned int in[N+M]; +unsigned short out[N]; + +/* Outer-loop vectorization. */ + +void +foo (){ + int i,j; + unsigned int diff; + + for (i = 0; i < N; i++) { + diff = 0; + for (j = 0; j < M; j+=8) { + diff += in[j+i]; + } + out[i]=(unsigned short)diff; + } + + return; +} diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c index 24d7c9c..28dac82 100644 --- a/gcc/tree-predcom.c +++ b/gcc/tree-predcom.c @@ -1020,6 +1020,17 @@ order_drefs (const void *a, const void *b) return (*da)->pos - (*db)->pos; } +/* Compares two drefs A and B by their position. Callback for qsort. */ + +static int +order_drefs_by_pos (const void *a, const void *b) +{ + const dref *const da = (const dref *) a; + const dref *const db = (const dref *) b; + + return (*da)->pos - (*db)->pos; +} + /* Returns root of the CHAIN. */ static inline dref @@ -2633,7 +2644,6 @@ combine_chains (chain_p ch1, chain_p ch2) bool swap = false; chain_p new_chain; unsigned i; - gimple *root_stmt; tree rslt_type = NULL_TREE; if (ch1 == ch2) @@ -2675,31 +2685,55 @@ combine_chains (chain_p ch1, chain_p ch2) new_chain->refs.safe_push (nw); } - new_chain->has_max_use_after = false; - root_stmt = get_chain_root (new_chain)->stmt; - for (i = 1; new_chain->refs.iterate (i, &nw); i++) - { - if (nw->distance == new_chain->length - && !stmt_dominates_stmt_p (nw->stmt, root_stmt)) - { - new_chain->has_max_use_after = true; - break; - } - } - ch1->combined = true; ch2->combined = true; return new_chain; } -/* Try to combine the CHAINS. */ +/* Recursively update position information of all offspring chains to ROOT + chain's position information. */ + +static void +update_pos_for_combined_chains (chain_p root) +{ + chain_p ch1 = root->ch1, ch2 = root->ch2; + dref ref, ref1, ref2; + for (unsigned j = 0; (root->refs.iterate (j, &ref) + && ch1->refs.iterate (j, &ref1) + && ch2->refs.iterate (j, &ref2)); ++j) + ref1->pos = ref2->pos = ref->pos; + + if (ch1->type == CT_COMBINATION) + update_pos_for_combined_chains (ch1); + if (ch2->type == CT_COMBINATION) + update_pos_for_combined_chains (ch2); +} + +/* Returns true if statement S1 dominates statement S2. */ + +static bool +pcom_stmt_dominates_stmt_p (gimple *s1, gimple *s2) +{ + basic_block bb1 = gimple_bb (s1), bb2 = gimple_bb (s2); + + if (!bb1 || s1 == s2) + return true; + + if (bb1 == bb2) + return gimple_uid (s1) < gimple_uid (s2); + + return dominated_by_p (CDI_DOMINATORS, bb2, bb1); +} + +/* Try to combine the CHAINS in LOOP. */ static void -try_combine_chains (vec<chain_p> *chains) +try_combine_chains (struct loop *loop, vec<chain_p> *chains) { unsigned i, j; chain_p ch1, ch2, cch; auto_vec<chain_p> worklist; + bool combined_p = false; FOR_EACH_VEC_ELT (*chains, i, ch1) if (chain_can_be_combined_p (ch1)) @@ -2721,6 +2755,78 @@ try_combine_chains (vec<chain_p> *chains) { worklist.safe_push (cch); chains->safe_push (cch); + combined_p = true; + break; + } + } + } + if (!combined_p) + return; + + /* Setup UID for all statements in dominance order. */ + basic_block *bbs = get_loop_body (loop); + renumber_gimple_stmt_uids_in_blocks (bbs, loop->num_nodes); + free (bbs); + + /* Re-association in combined chains may generate statements different to + order of references of the original chain. We need to keep references + of combined chain in dominance order so that all uses will be inserted + after definitions. Note: + A) This is necessary for all combined chains. + B) This is only necessary for ZERO distance references because other + references inherit value from loop carried PHIs. + + We first update position information for all combined chains. */ + dref ref; + for (i = 0; chains->iterate (i, &ch1); ++i) + { + if (ch1->type != CT_COMBINATION || ch1->combined) + continue; + + for (j = 0; ch1->refs.iterate (j, &ref); ++j) + ref->pos = gimple_uid (ref->stmt); + + update_pos_for_combined_chains (ch1); + } + /* Then sort references according to newly updated position information. */ + for (i = 0; chains->iterate (i, &ch1); ++i) + { + if (ch1->type != CT_COMBINATION && !ch1->combined) + continue; + + /* Find the first reference with non-ZERO distance. */ + if (ch1->length == 0) + j = ch1->refs.length(); + else + { + for (j = 0; ch1->refs.iterate (j, &ref); ++j) + if (ref->distance != 0) + break; + } + + /* Sort all ZERO distance references by position. */ + qsort (&ch1->refs[0], j, sizeof (ch1->refs[0]), order_drefs_by_pos); + + if (ch1->combined) + continue; + + /* For ZERO length chain, has_max_use_after must be true since root + combined stmt must dominates others. */ + if (ch1->length == 0) + { + ch1->has_max_use_after = true; + continue; + } + /* Check if there is use at max distance after root for combined chains + and set flag accordingly. */ + ch1->has_max_use_after = false; + gimple *root_stmt = get_chain_root (ch1)->stmt; + for (j = 1; ch1->refs.iterate (j, &ref); ++j) + { + if (ref->distance == ch1->length + && !pcom_stmt_dominates_stmt_p (ref->stmt, root_stmt)) + { + ch1->has_max_use_after = true; break; } } @@ -3058,7 +3164,7 @@ tree_predictive_commoning_loop (struct loop *loop) loop_closed_ssa = prepare_finalizers (loop, chains); /* Try to combine the chains that are always worked with together. */ - try_combine_chains (&chains); + try_combine_chains (loop, &chains); insert_init_seqs (loop, chains);
From 843cef544a46236e40063416cebc8037736ad18a Mon Sep 17 00:00:00 2001 From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com> Date: Wed, 1 Nov 2017 17:43:55 +0000 Subject: [PATCH 2/2] pr82726-20171102.txt --- gcc/testsuite/gcc.dg/tree-ssa/pr82726.c | 26 ++++++ gcc/tree-predcom.c | 159 ++++++++++++++++++++++++++++---- 2 files changed, 169 insertions(+), 16 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr82726.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c b/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c new file mode 100644 index 0000000..179f93a --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 --param tree-reassoc-width=4" } */ +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */ + +#define N 40 +#define M 128 +unsigned int in[N+M]; +unsigned short out[N]; + +/* Outer-loop vectorization. */ + +void +foo (){ + int i,j; + unsigned int diff; + + for (i = 0; i < N; i++) { + diff = 0; + for (j = 0; j < M; j+=8) { + diff += in[j+i]; + } + out[i]=(unsigned short)diff; + } + + return; +} diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c index 24d7c9c..a243bce 100644 --- a/gcc/tree-predcom.c +++ b/gcc/tree-predcom.c @@ -201,6 +201,8 @@ along with GCC; see the file COPYING3. If not see i * i with ii_last + 2 * i + 1), to generalize strength reduction. */ #include "config.h" +#include <map> +#define INCLUDE_ALGORITHM /* std::sort */ #include "system.h" #include "coretypes.h" #include "backend.h" @@ -1020,6 +1022,14 @@ order_drefs (const void *a, const void *b) return (*da)->pos - (*db)->pos; } +/* Compares two drefs A and B by their position. Callback for std::sort. */ + +static bool +order_drefs_by_pos (dref a, dref b) +{ + return a->pos < b->pos; +} + /* Returns root of the CHAIN. */ static inline dref @@ -2633,7 +2643,6 @@ combine_chains (chain_p ch1, chain_p ch2) bool swap = false; chain_p new_chain; unsigned i; - gimple *root_stmt; tree rslt_type = NULL_TREE; if (ch1 == ch2) @@ -2675,31 +2684,56 @@ combine_chains (chain_p ch1, chain_p ch2) new_chain->refs.safe_push (nw); } - new_chain->has_max_use_after = false; - root_stmt = get_chain_root (new_chain)->stmt; - for (i = 1; new_chain->refs.iterate (i, &nw); i++) - { - if (nw->distance == new_chain->length - && !stmt_dominates_stmt_p (nw->stmt, root_stmt)) - { - new_chain->has_max_use_after = true; - break; - } - } - ch1->combined = true; ch2->combined = true; return new_chain; } -/* Try to combine the CHAINS. */ +/* Recursively update position information of all offspring chains to ROOT + chain's position information. */ + +static void +update_pos_for_combined_chains (chain_p root) +{ + chain_p ch1 = root->ch1, ch2 = root->ch2; + dref ref, ref1, ref2; + for (unsigned j = 0; (root->refs.iterate (j, &ref) + && ch1->refs.iterate (j, &ref1) + && ch2->refs.iterate (j, &ref2)); ++j) + ref1->pos = ref2->pos = ref->pos; + + if (ch1->type == CT_COMBINATION) + update_pos_for_combined_chains (ch1); + if (ch2->type == CT_COMBINATION) + update_pos_for_combined_chains (ch2); +} + +/* Returns true if statement S1 dominates statement S2. */ + +static bool +pcom_stmt_dominates_stmt_p (std::map<gimple *, int> &stmts_map, + gimple *s1, gimple *s2) +{ + basic_block bb1 = gimple_bb (s1), bb2 = gimple_bb (s2); + + if (!bb1 || s1 == s2) + return true; + + if (bb1 == bb2) + return stmts_map[s1] < stmts_map[s2]; + + return dominated_by_p (CDI_DOMINATORS, bb2, bb1); +} + +/* Try to combine the CHAINS in LOOP. */ static void -try_combine_chains (vec<chain_p> *chains) +try_combine_chains (struct loop *loop, vec<chain_p> *chains) { unsigned i, j; chain_p ch1, ch2, cch; auto_vec<chain_p> worklist; + bool combined_p = false; FOR_EACH_VEC_ELT (*chains, i, ch1) if (chain_can_be_combined_p (ch1)) @@ -2721,6 +2755,99 @@ try_combine_chains (vec<chain_p> *chains) { worklist.safe_push (cch); chains->safe_push (cch); + combined_p = true; + break; + } + } + } + if (!combined_p) + return; + + /* Setup UID for all statements in dominance order. */ + std::map<gimple *, int> stmts_map; + basic_block *bbs = get_loop_body_in_dom_order (loop); + for (i = 0; i < loop->num_nodes; i++) + { + int uid = 0; + basic_block bb = bbs[i]; + + for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi); + gsi_next (&bsi)) + { + gimple *stmt = gsi_stmt (bsi); + if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt)))) + stmts_map[stmt] = uid; + } + + for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi); + gsi_next (&bsi)) + { + gimple *stmt = gsi_stmt (bsi); + if (gimple_code (stmt) != GIMPLE_LABEL && !is_gimple_debug (stmt)) + stmts_map[stmt] = ++uid; + } + } + free (bbs); + + /* Re-association in combined chains may generate statements different to + order of references of the original chain. We need to keep references + of combined chain in dominance order so that all uses will be inserted + after definitions. Note: + A) This is necessary for all combined chains. + B) This is only necessary for ZERO distance references because other + references inherit value from loop carried PHIs. + + We first update position information for all combined chains. */ + dref ref; + for (i = 0; chains->iterate (i, &ch1); ++i) + { + if (ch1->type != CT_COMBINATION || ch1->combined) + continue; + + for (j = 0; ch1->refs.iterate (j, &ref); ++j) + ref->pos = stmts_map[ref->stmt]; + + update_pos_for_combined_chains (ch1); + } + /* Then sort references according to newly updated position information. */ + for (i = 0; chains->iterate (i, &ch1); ++i) + { + if (ch1->type != CT_COMBINATION && !ch1->combined) + continue; + + /* Find the first reference with non-ZERO distance. */ + if (ch1->length == 0) + j = ch1->refs.length(); + else + { + for (j = 0; ch1->refs.iterate (j, &ref); ++j) + if (ref->distance != 0) + break; + } + + /* Sort all ZERO distance references by position. */ + std::sort (&ch1->refs[0], &ch1->refs[0] + j, order_drefs_by_pos); + + if (ch1->combined) + continue; + + /* For ZERO length chain, has_max_use_after must be true since root + combined stmt must dominates others. */ + if (ch1->length == 0) + { + ch1->has_max_use_after = true; + continue; + } + /* Check if there is use at max distance after root for combined chains + and set flag accordingly. */ + ch1->has_max_use_after = false; + gimple *root_stmt = get_chain_root (ch1)->stmt; + for (j = 1; ch1->refs.iterate (j, &ref); ++j) + { + if (ref->distance == ch1->length + && !pcom_stmt_dominates_stmt_p (stmts_map, ref->stmt, root_stmt)) + { + ch1->has_max_use_after = true; break; } } @@ -3058,7 +3185,7 @@ tree_predictive_commoning_loop (struct loop *loop) loop_closed_ssa = prepare_finalizers (loop, chains); /* Try to combine the chains that are always worked with together. */ - try_combine_chains (&chains); + try_combine_chains (loop, &chains); insert_init_seqs (loop, chains); -- 1.9.1