From patchwork Thu Dec 9 23:51:54 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Pop X-Patchwork-Id: 75013 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 2F869B6F1E for ; Fri, 10 Dec 2010 10:52:21 +1100 (EST) Received: (qmail 16534 invoked by alias); 9 Dec 2010 23:52:17 -0000 Received: (qmail 16438 invoked by uid 22791); 9 Dec 2010 23:52:11 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, TW_TM, T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: sourceware.org Received: from mail-yx0-f175.google.com (HELO mail-yx0-f175.google.com) (209.85.213.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 09 Dec 2010 23:52:02 +0000 Received: by yxd5 with SMTP id 5so1874685yxd.20 for ; Thu, 09 Dec 2010 15:52:00 -0800 (PST) Received: by 10.236.102.171 with SMTP id d31mr135575yhg.59.1291938719923; Thu, 09 Dec 2010 15:51:59 -0800 (PST) Received: from napoca (adsl-76-244-77-0.dsl.austtx.sbcglobal.net [76.244.77.0]) by mx.google.com with ESMTPS id e20sm1479762yhc.19.2010.12.09.15.51.57 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 09 Dec 2010 15:51:59 -0800 (PST) Received: by napoca (sSMTP sendmail emulation); Thu, 09 Dec 2010 17:51:56 -0600 From: Sebastian Pop To: gcc-patches@gcc.gnu.org Cc: rguenther@suse.de, Sebastian Pop Subject: [PATCH] Fix PR43023: fuse_partitions_with_similar_memory_accesses. Date: Thu, 9 Dec 2010 17:51:54 -0600 Message-Id: <1291938714-8015-1-git-send-email-sebpop@gmail.com> X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, This patch replaces the heuristic that aggregates the partitions of the loop distribution. The original heuristic was acting during the construction of a partition, mixing legality with data locality concerns. This patch simplifies the logic by removing this heuristic, and postponing the aggregation of partitions: with this patch we build all the legal partitions, and then we fuse the partitions that have some memory affinity. Ok for trunk after regstrap on amd64-linux? Thanks, Sebastian 2010-12-09 Sebastian Pop PR tree-optimization/43023 * tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p): Removed. (stores_zero_from_loop): Call stmt_stores_zero. * tree-data-ref.h (stmt_stores_zero): New. * tree-loop-distribution.c (generate_memset_zero): Do not return a boolean. Call gcc_assert on stride_of_unit_type_p. (generate_builtin): Call stmt_stores_zero. (rdg_flag_all_uses): Removed. (rdg_flag_similar_memory_accesses): Removed. (build_rdg_partition_for_component): Removed parameter other_stores. Removed call to rdg_flag_similar_memory_accesses. (can_generate_builtin): New. (similar_memory_accesses): New. (fuse_partitions_with_similar_memory_accesses): New. (rdg_build_partitions): Call fuse_partitions_with_similar_memory_accesses. * gfortran.dg/ldist-1.f90: Adjust pattern. * gfortran.dg/ldist-pr43023.f90: New. --- gcc/ChangeLog | 20 +++ gcc/testsuite/ChangeLog | 6 + gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c | 8 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c | 2 +- gcc/testsuite/gfortran.dg/ldist-1.f90 | 7 +- gcc/testsuite/gfortran.dg/ldist-pr43023.f90 | 31 +++++ gcc/tree-data-ref.c | 31 +---- gcc/tree-data-ref.h | 32 +++++ gcc/tree-loop-distribution.c | 191 ++++++++++++--------------- 20 files changed, 197 insertions(+), 155 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/ldist-pr43023.f90 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 9b3e799..1d1d3f0 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,23 @@ +2010-12-09 Sebastian Pop + + PR tree-optimization/43023 + * tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p): + Removed. + (stores_zero_from_loop): Call stmt_stores_zero. + * tree-data-ref.h (stmt_stores_zero): New. + * tree-loop-distribution.c (generate_memset_zero): Do not return a + boolean. Call gcc_assert on stride_of_unit_type_p. + (generate_builtin): Call stmt_stores_zero. + (rdg_flag_all_uses): Removed. + (rdg_flag_similar_memory_accesses): Removed. + (build_rdg_partition_for_component): Removed parameter + other_stores. Removed call to rdg_flag_similar_memory_accesses. + (can_generate_builtin): New. + (similar_memory_accesses): New. + (fuse_partitions_with_similar_memory_accesses): New. + (rdg_build_partitions): Call + fuse_partitions_with_similar_memory_accesses. + 2010-12-08 Richard Guenther Sebastian Pop diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index c0af4e3..392e2cb 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2010-12-09 Sebastian Pop + + PR tree-optimization/43023 + * gfortran.dg/ldist-1.f90: Adjust pattern. + * gfortran.dg/ldist-pr43023.f90: New. + 2010-12-08 Richard Guenther Sebastian Pop diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c index 43c1046..dcff6fb 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */ void foo (int * __restrict__ ia, diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c index 0790c18..2e69563 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */ int loop1 (int k) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c index 88651e7..4cc5edc 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */ void foo (int * __restrict__ ia, diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c index 1e555fe..dcfe2de 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */ int foo (int * __restrict__ ia, diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c index 623aacf..2eaa140 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */ int foo (int * __restrict__ ia, diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c index de98ccc..d8b2545 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */ void foo (int * __restrict__ a, diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c index 40adfe1..f2a96d8 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c @@ -1,11 +1,11 @@ -/* { dg-do compile { target int32plus } } */ +/* { dg-do compile { target int32plus } } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */ int loop1 (int k) { unsigned int i; int a[10000], b[10000], c[10000], d[10000]; - + a[0] = k; a[3] = k*2; c[1] = k+1; for (i = 2; i < (10000-1); i ++) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c index a744fea..7030515 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */ int loop1 (int k) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c index 9a03dc1..7381b27 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target int32plus } } */ +/* { dg-do compile { target int32plus } } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */ int loop1 (int k) @@ -9,8 +9,8 @@ int loop1 (int k) a[0][0] = k; for (i = 1; i < 100; i ++) - for (j = 1; j < (100-1); j++) - { + for (j = 1; j < (100-1); j++) + { a[i][j] = k * i; /* S1 */ b[i][j] = a[i][j-1] + k; /* S2 */ c[i][j] = b[i][j] + a[i][j+1]; /* S3 */ @@ -22,7 +22,7 @@ int loop1 (int k) S2->S3 (flow, level 0) S3->S4 (flow, level 0) */ - + return a[100-1][100-1] + b[100-1][100-1] + c[100-1][100-1] + d[100-1][100-1]; } diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c index 7a38c86..2b21780 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */ int loop1 (int k) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c index 124fcde..39397be 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */ int loop1 (int k) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c index 4a8e066..f45f70e 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */ int loop1 (int k) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c index ee8d023..9c328a7 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */ int loop1 (int k) diff --git a/gcc/testsuite/gfortran.dg/ldist-1.f90 b/gcc/testsuite/gfortran.dg/ldist-1.f90 index dd1f02a..315e698 100644 --- a/gcc/testsuite/gfortran.dg/ldist-1.f90 +++ b/gcc/testsuite/gfortran.dg/ldist-1.f90 @@ -1,4 +1,4 @@ -! { dg-do compile } +! { dg-do compile } ! { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } Subroutine PADEC(DKS,DKDS,HVAR,WM,WG,FN,NS,AN,BN,CN,IT) @@ -29,5 +29,8 @@ Subroutine PADEC(DKS,DKDS,HVAR,WM,WG,FN,NS,AN,BN,CN,IT) return end Subroutine PADEC -! { dg-final { scan-tree-dump-times "distributed: split to 4 loops" 1 "ldist" } } +! There are 5 legal partitions in this code. Based on the data +! locality heuristic, this loop should not be split. + +! { dg-final { scan-tree-dump-not "distributed: split to" "ldist" } } ! { dg-final { cleanup-tree-dump "ldist" } } diff --git a/gcc/testsuite/gfortran.dg/ldist-pr43023.f90 b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90 new file mode 100644 index 0000000..3e2d04c --- /dev/null +++ b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90 @@ -0,0 +1,31 @@ +! { dg-do compile } +! { dg-options "-O2 -ftree-loop-distribution" } + +MODULE NFT_mod + +implicit none +integer :: Nangle +real:: Z0 +real, dimension(:,:), allocatable :: Angle +real, dimension(:), allocatable :: exth, ezth, hxth, hyth, hyphi + +CONTAINS + +SUBROUTINE NFT_Init() + +real :: th, fi +integer :: n + +do n = 1,Nangle + th = Angle(n,1) + fi = Angle(n,2) + + exth(n) = cos(fi)*cos(th) + ezth(n) = -sin(th) + hxth(n) = -sin(fi) + hyth(n) = cos(fi) + hyphi(n) = -sin(fi) +end do +END SUBROUTINE NFT_Init + +END MODULE NFT_mod diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index 4dfcd5c..3276079 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -4509,7 +4509,7 @@ dump_rdg_vertex (FILE *file, struct graph *rdg, int i) for (e = v->succ; e; e = e->succ_next) fprintf (file, " %d", e->dest); - fprintf (file, ") \n"); + fprintf (file, ")\n"); print_gimple_stmt (file, RDGV_STMT (v), 0, TDF_VOPS|TDF_MEMSYMS); fprintf (file, ")\n"); } @@ -4976,27 +4976,6 @@ stores_from_loop (struct loop *loop, VEC (gimple, heap) **stmts) free (bbs); } -/* Returns true when STMT is an assignment that contains a data - reference on its LHS with a stride of the same size as its unit - type. */ - -static bool -mem_write_stride_of_same_size_as_unit_type_p (gimple stmt) -{ - struct data_reference *dr = XCNEW (struct data_reference); - tree op0 = gimple_assign_lhs (stmt); - bool res; - - DR_STMT (dr) = stmt; - DR_REF (dr) = op0; - - res = dr_analyze_innermost (dr) - && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0)); - - free_data_ref (dr); - return res; -} - /* Initialize STMTS with all the statements of LOOP that contain a store to memory of the form "A[i] = 0". */ @@ -5007,18 +4986,12 @@ stores_zero_from_loop (struct loop *loop, VEC (gimple, heap) **stmts) basic_block bb; gimple_stmt_iterator si; gimple stmt; - tree op; basic_block *bbs = get_loop_body_in_dom_order (loop); for (i = 0; i < loop->num_nodes; i++) for (bb = bbs[i], si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) if ((stmt = gsi_stmt (si)) - && gimple_vdef (stmt) - && is_gimple_assign (stmt) - && gimple_assign_rhs_code (stmt) == INTEGER_CST - && (op = gimple_assign_rhs1 (stmt)) - && (integer_zerop (op) || real_zerop (op)) - && mem_write_stride_of_same_size_as_unit_type_p (stmt)) + && stmt_stores_zero (stmt)) VEC_safe_push (gimple, heap, *stmts, gsi_stmt (si)); free (bbs); diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index d929f31..e1006aa 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -614,6 +614,38 @@ stride_of_unit_type_p (tree stride, tree type) TYPE_SIZE_UNIT (type)); } +/* Returns true when the statement at STMT is of the form "A[i] = 0" + that contains a data reference on its LHS with a stride of the same + size as its unit type. */ + +static inline bool +stmt_stores_zero (gimple stmt) +{ + tree op0, op1; + bool res; + struct data_reference *dr; + + if (!stmt + || !gimple_vdef (stmt) + || !is_gimple_assign (stmt) + || !gimple_assign_single_p (stmt) + || !(op1 = gimple_assign_rhs1 (stmt)) + || !(integer_zerop (op1) || real_zerop (op1))) + return false; + + dr = XCNEW (struct data_reference); + op0 = gimple_assign_lhs (stmt); + + DR_STMT (dr) = stmt; + DR_REF (dr) = op0; + + res = dr_analyze_innermost (dr) + && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0)); + + free_data_ref (dr); + return res; +} + /* Determines whether RDG vertices V1 and V2 access to similar memory locations, in which case they have to be in the same partition. */ diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index 357f51f..4fc246e 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -241,7 +241,7 @@ build_size_arg_loc (location_t loc, tree nb_iter, tree op, /* Generate a call to memset. Return true when the operation succeeded. */ -static bool +static void generate_memset_zero (gimple stmt, tree op0, tree nb_iter, gimple_stmt_iterator bsi) { @@ -255,11 +255,8 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter, DR_STMT (dr) = stmt; DR_REF (dr) = op0; - if (!dr_analyze_innermost (dr)) - goto end; - - if (!stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0))) - goto end; + res = dr_analyze_innermost (dr); + gcc_assert (res && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0))); nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list); addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr)); @@ -286,14 +283,11 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter, fn_call = gimple_build_call (fn, 3, mem, integer_zero_node, nb_bytes); gimple_seq_add_stmt (&stmt_list, fn_call); gsi_insert_seq_after (&bsi, stmt_list, GSI_CONTINUE_LINKING); - res = true; if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "generated memset zero\n"); - end: free_data_ref (dr); - return res; } /* Tries to generate a builtin function for the instructions of LOOP @@ -307,7 +301,6 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p) unsigned i, x = 0; basic_block *bbs; gimple write = NULL; - tree op0, op1; gimple_stmt_iterator bsi; tree nb_iter = number_of_exit_cond_executions (loop); @@ -343,26 +336,17 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p) } } - if (!write) - goto end; - - op0 = gimple_assign_lhs (write); - op1 = gimple_assign_rhs1 (write); - - if (!(TREE_CODE (op0) == ARRAY_REF - || TREE_CODE (op0) == MEM_REF)) + if (!stmt_stores_zero (write)) goto end; /* The new statements will be placed before LOOP. */ bsi = gsi_last_bb (loop_preheader_edge (loop)->src); - - if (gimple_assign_rhs_code (write) == INTEGER_CST - && (integer_zerop (op1) || real_zerop (op1))) - res = generate_memset_zero (write, op0, nb_iter, bsi); + generate_memset_zero (write, gimple_assign_lhs (write), nb_iter, bsi); + res = true; /* If this is the last partition for which we generate code, we have to destroy the loop. */ - if (res && !copy_p) + if (!copy_p) { unsigned nbbs = loop->num_nodes; edge exit = single_exit (loop); @@ -504,24 +488,6 @@ has_upstream_mem_writes (int u) static void rdg_flag_vertex_and_dependent (struct graph *, int, bitmap, bitmap, bitmap, bool *); -/* Flag all the uses of U. */ - -static void -rdg_flag_all_uses (struct graph *rdg, int u, bitmap partition, bitmap loops, - bitmap processed, bool *part_has_writes) -{ - struct graph_edge *e; - - for (e = rdg->vertices[u].succ; e; e = e->succ_next) - if (!bitmap_bit_p (processed, e->dest)) - { - rdg_flag_vertex_and_dependent (rdg, e->dest, partition, loops, - processed, part_has_writes); - rdg_flag_all_uses (rdg, e->dest, partition, loops, processed, - part_has_writes); - } -} - /* Flag the uses of U stopping following the information from upstream_mem_writes. */ @@ -689,68 +655,13 @@ rdg_flag_loop_exits (struct graph *rdg, bitmap loops, bitmap partition, } } -/* Flag all the nodes of RDG containing memory accesses that could - potentially belong to arrays already accessed in the current - PARTITION. */ - -static void -rdg_flag_similar_memory_accesses (struct graph *rdg, bitmap partition, - bitmap loops, bitmap processed, - VEC (int, heap) **other_stores) -{ - bool foo; - unsigned i, n; - int j, k, kk; - bitmap_iterator ii; - struct graph_edge *e; - - EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, ii) - if (RDG_MEM_WRITE_STMT (rdg, i) - || RDG_MEM_READS_STMT (rdg, i)) - { - for (j = 0; j < rdg->n_vertices; j++) - if (!bitmap_bit_p (processed, j) - && (RDG_MEM_WRITE_STMT (rdg, j) - || RDG_MEM_READS_STMT (rdg, j)) - && rdg_has_similar_memory_accesses (rdg, i, j)) - { - /* Flag first the node J itself, and all the nodes that - are needed to compute J. */ - rdg_flag_vertex_and_dependent (rdg, j, partition, loops, - processed, &foo); - - /* When J is a read, we want to coalesce in the same - PARTITION all the nodes that are using J: this is - needed for better cache locality. */ - rdg_flag_all_uses (rdg, j, partition, loops, processed, &foo); - - /* Remove from OTHER_STORES the vertex that we flagged. */ - if (RDG_MEM_WRITE_STMT (rdg, j)) - FOR_EACH_VEC_ELT (int, *other_stores, k, kk) - if (kk == j) - { - VEC_unordered_remove (int, *other_stores, k); - break; - } - } - - /* If the node I has two uses, then keep these together in the - same PARTITION. */ - for (n = 0, e = rdg->vertices[i].succ; e; e = e->succ_next, n++); - - if (n > 1) - rdg_flag_all_uses (rdg, i, partition, loops, processed, &foo); - } -} - /* Returns a bitmap in which all the statements needed for computing the strongly connected component C of the RDG are flagged, also including the loop exit conditions. */ static bitmap build_rdg_partition_for_component (struct graph *rdg, rdgc c, - bool *part_has_writes, - VEC (int, heap) **other_stores) + bool *part_has_writes) { int i, v; bitmap partition = BITMAP_ALLOC (NULL); @@ -762,14 +673,6 @@ build_rdg_partition_for_component (struct graph *rdg, rdgc c, rdg_flag_vertex_and_dependent (rdg, v, partition, loops, processed, part_has_writes); - /* Also iterate on the array of stores not in the starting vertices, - and determine those vertices that have some memory affinity with - the current nodes in the component: these are stores to the same - arrays, i.e. we're taking care of cache locality. */ - if (!flag_tree_loop_distribute_patterns) - rdg_flag_similar_memory_accesses (rdg, partition, loops, processed, - other_stores); - rdg_flag_loop_exits (rdg, loops, partition, processed, part_has_writes); BITMAP_FREE (processed); @@ -832,6 +735,79 @@ rdg_build_components (struct graph *rdg, VEC (int, heap) *starting_vertices, BITMAP_FREE (saved_components); } +/* Returns true when it is possible to generate a builtin pattern for + the PARTITION of RDG. For the moment we detect only the memset + zero pattern. */ + +static bool +can_generate_builtin (struct graph *rdg, bitmap partition) +{ + unsigned i; + bitmap_iterator bi; + int nb_reads = 0; + int nb_writes = 0; + int stores_zero = 0; + + EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, bi) + if (RDG_MEM_READS_STMT (rdg, i)) + nb_reads++; + else if (RDG_MEM_WRITE_STMT (rdg, i)) + { + nb_writes++; + if (stmt_stores_zero (RDG_STMT (rdg, i))) + stores_zero++; + } + + return stores_zero == 1 && nb_writes == 1 && nb_reads == 0; +} + +/* Returns true when PARTITION1 and PARTITION2 have similar memory + accesses in RDG. */ + +static bool +similar_memory_accesses (struct graph *rdg, bitmap partition1, + bitmap partition2) +{ + unsigned i, j; + bitmap_iterator bi, bj; + + EXECUTE_IF_SET_IN_BITMAP (partition1, 0, i, bi) + if (RDG_MEM_WRITE_STMT (rdg, i) + || RDG_MEM_READS_STMT (rdg, i)) + EXECUTE_IF_SET_IN_BITMAP (partition2, 0, j, bj) + if (RDG_MEM_WRITE_STMT (rdg, j) + || RDG_MEM_READS_STMT (rdg, j)) + if (rdg_has_similar_memory_accesses (rdg, i, j)) + return true; + + return false; +} + +/* Fuse all the partitions from PARTITIONS that contain similar memory + references, i.e., we're taking care of cache locality. This + function does not fuse those partitions that contain patterns that + can be code generated with builtins. */ + +static void +fuse_partitions_with_similar_memory_accesses (struct graph *rdg, + VEC (bitmap, heap) **partitions) +{ + int p1, p2; + bitmap partition1, partition2; + + FOR_EACH_VEC_ELT (bitmap, *partitions, p1, partition1) + if (!can_generate_builtin (rdg, partition1)) + FOR_EACH_VEC_ELT (bitmap, *partitions, p2, partition2) + if (p1 != p2 + && !can_generate_builtin (rdg, partition2) + && similar_memory_accesses (rdg, partition1, partition2)) + { + bitmap_ior_into (partition1, partition2); + VEC_ordered_remove (bitmap, *partitions, p2); + p2--; + } +} + /* Aggregate several components into a useful partition that is registered in the PARTITIONS vector. Partitions will be distributed in different loops. */ @@ -854,8 +830,7 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components, if (bitmap_bit_p (processed, v)) continue; - np = build_rdg_partition_for_component (rdg, x, &part_has_writes, - other_stores); + np = build_rdg_partition_for_component (rdg, x, &part_has_writes); bitmap_ior_into (partition, np); bitmap_ior_into (processed, np); BITMAP_FREE (np); @@ -901,6 +876,8 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components, VEC_safe_push (bitmap, heap, *partitions, partition); else BITMAP_FREE (partition); + + fuse_partitions_with_similar_memory_accesses (rdg, partitions); } /* Dump to FILE the PARTITIONS. */