From patchwork Tue Jun 26 10:21:59 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Matz X-Patchwork-Id: 167362 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id B6AE7B6FBA for ; Tue, 26 Jun 2012 20:22:22 +1000 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1341310944; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Date: From:To:Cc:Subject:Message-ID:User-Agent:MIME-Version: Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:Sender:Delivered-To; bh=j6AJx3x JEaqtZf/totUBO7rkR5Y=; b=tng4v5q0mmG9/5repUzvioYSnDvuV7CieM2TAhC QOXRTJJFJocjUidEzn+kWoH2ZwreLXDKdEnduiFriGn79BJdLAsdWoSV001CD5KT 66FvD/aPr+R/sgvZmJeplZbzO8gucsWnykGaTkFhEIxQ06/E2jxFmCNgdKivmQsQ 6CYw= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Date:From:To:Cc:Subject:Message-ID:User-Agent:MIME-Version:Content-Type:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=SOwap5B5ryGfZFlvUB2ZAoDaVBlgGje7GunVxRJNePE2S/I3WkuNYzGlAB0L0E 3Bx64rAzWVfPcJ2sBiQIyNwrwKZraeDELPqmq8bc/miY3+N6Q8PV1mP6velZlYIX 0heWJ1OanS/0WtskBai/qIYga8yTDWmbt1+5ir4TTR8W0=; Received: (qmail 2744 invoked by alias); 26 Jun 2012 10:22:18 -0000 Received: (qmail 2731 invoked by uid 22791); 26 Jun 2012 10:22:15 -0000 X-SWARE-Spam-Status: No, hits=-3.9 required=5.0 tests=AWL, BAYES_50, KHOP_RCVD_UNTRUST, RCVD_IN_DNSWL_HI, RCVD_IN_HOSTKARMA_W, T_FRT_LOLITA1, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 26 Jun 2012 10:22:01 +0000 Received: from relay1.suse.de (unknown [195.135.220.254]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id 5655CA674B; Tue, 26 Jun 2012 12:21:59 +0200 (CEST) Date: Tue, 26 Jun 2012 12:21:59 +0200 (CEST) From: Michael Matz To: tobias@grosser.es Cc: gcc-patches@gcc.gnu.org Subject: [graphite] RFC: Add ISL variants of remaining PPL things Message-ID: User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, so, to make progress on the graphite front we want to get rid of the ppl dependency. We start from Tobis move-to-isl-and-isl-scheduler branch at github, merged current trunk into that (see also Richis mails about cloog/isl configury), and add ISL implementations of the essential parts that still are ppl-only. These are the number of iteration analysis for potentially transformed iteration spaces (this wasn't mentioned in Tobis mail from February) and the interchange heuristic (basically stride calculation for transformed spaces). So I learned PPL and ISL and rewrote that part :) Like in the patch below. I'm not asking for any proper review for this, but rather if the isl code makes sort of sense. The testsuite works, except for pr43012.c, which exposes the fact that the current PPL nr-iter implementation is buggy (it computes the nr-iter of a 77<=i<=88 loop to be 65), so that testcase runs into the nr-iter-ppl==nr-iter-isl assert. In a private branch I have already removed all PPL code whatsoever (and cleaned up the ISL code I added) so we make good progress there. Ciao, Michael. diff --git a/gcc/graphite-interchange.c b/gcc/graphite-interchange.c index fa38f5c..96a7967 100644 --- a/gcc/graphite-interchange.c +++ b/gcc/graphite-interchange.c @@ -24,9 +24,11 @@ along with GCC; see the file COPYING3. If not see #include "config.h" #ifdef HAVE_cloog +#include #include #include #include +#include #include #include #endif @@ -64,7 +66,7 @@ along with GCC; see the file COPYING3. If not see c_0 * s_0 + c_1 * s_1 + ... c_n * s_n. */ static ppl_Linear_Expression_t -build_linearized_memory_access (ppl_dimension_type offset, poly_dr_p pdr) +ppl_build_linearized_memory_access (ppl_dimension_type offset, poly_dr_p pdr) { ppl_Linear_Expression_t res; ppl_Linear_Expression_t le; @@ -189,7 +191,7 @@ build_partial_difference (ppl_Pointset_Powerset_C_Polyhedron_t *p, the loop at DEPTH. */ static void -pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) +ppl_pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) { ppl_dimension_type time_depth; ppl_Linear_Expression_t le, lma; @@ -245,7 +247,7 @@ pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) /* Construct the 0|0|0|0|0|S|0|l1|0 part. */ { - lma = build_linearized_memory_access (offset + dim_sctr, pdr); + lma = ppl_build_linearized_memory_access (offset + dim_sctr, pdr); ppl_set_coef (lma, dim_L1, -1); ppl_new_Constraint (&new_cstr, lma, PPL_CONSTRAINT_TYPE_EQUAL); ppl_Pointset_Powerset_C_Polyhedron_add_constraint (p1, new_cstr); @@ -310,6 +312,7 @@ pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) ppl_max_for_le_pointset (p1, le, stride); } +#if 0 if (dump_file && (dump_flags & TDF_DETAILS)) { char *str; @@ -322,12 +325,159 @@ pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) mp_get_memory_functions (NULL, NULL, &gmp_free); (*gmp_free) (str, strlen (str) + 1); } +#endif ppl_delete_Pointset_Powerset_C_Polyhedron (p1); ppl_delete_Pointset_Powerset_C_Polyhedron (p2); ppl_delete_Linear_Expression (le); } +static isl_constraint * +build_linearized_memory_access (isl_map *map, poly_dr_p pdr) +{ + isl_constraint *res; + isl_local_space *ls = isl_local_space_from_space (isl_map_get_space (map)); + unsigned offset, nsubs; + int i; + isl_int size, subsize; + + res = isl_equality_alloc (ls); + isl_int_init (size); + isl_int_set_ui (size, 1); + isl_int_init (subsize); + isl_int_set_ui (subsize, 1); + + nsubs = isl_set_dim (pdr->extent, isl_dim_set); + /* -1 for the already included L dimension. */ + offset = isl_map_dim (map, isl_dim_out) - 1 - nsubs; + res = isl_constraint_set_coefficient_si (res, isl_dim_out, offset + nsubs, -1); + /* Go through all subscripts from last to first. First dimension + is the alias set, ignore it. */ + for (i = nsubs - 1; i >= 1; i--) + { + isl_space *dc; + isl_aff *aff; + enum isl_lp_result lpres; + + res = isl_constraint_set_coefficient (res, isl_dim_out, offset + i, size); + + dc = isl_set_get_space (pdr->extent); + aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc)); + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, i, 1); + lpres = isl_set_max (pdr->extent, aff, &subsize); + isl_aff_free (aff); + isl_int_mul (size, size, subsize); + } + + isl_int_clear (subsize); + isl_int_clear (size); + + return res; +} + +static void +pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) +{ + poly_bb_p pbb = PDR_PBB (pdr); + isl_map *map; + isl_set *set; + isl_aff *aff; + isl_space *dc; + isl_constraint *lma, *c; + isl_int islstride; + enum isl_lp_result lpres; + graphite_dim_t time_depth; + unsigned offset, nt; + unsigned i; + /* pdr->accesses: [P1..nb_param,I1..nb_domain]->[a,S1..nb_subscript] + ??? [P] not used for PDRs? + pdr->extent: [a,S1..nb_subscript] + pbb->domain: [P1..nb_param,I1..nb_domain] + pbb->transformed: [P1..nb_param,I1..nb_domain]->[T1..Tnb_sctr] + [T] includes local vars (currently unused) + + First we create [P,I] -> [T,a,S]. */ + + map = isl_map_flat_range_product (isl_map_copy (pbb->transformed), + isl_map_copy (pdr->accesses)); + /* Add a dimension for L: [P,I] -> [T,a,S,L].*/ + map = isl_map_add_dims (map, isl_dim_out, 1); + /* Build a constraint for "lma[S] - L == 0", effectively calculating + L in terms of subscripts. */ + lma = build_linearized_memory_access (map, pdr); + /* And add it to the map, so we now have: + [P,I] -> [T,a,S,L] : lma([S]) == L. */ + map = isl_map_add_constraint (map, lma); + + /* Then we create [P,I,P',I'] -> [T,a,S,L,T',a',S',L']. */ + map = isl_map_flat_product (map, isl_map_copy (map)); + + /* Now add the equality T[time_depth] == T'[time_depth]+1. This will + force L' to be the linear address at T[time_depth] + 1. */ + time_depth = psct_dynamic_dim (pbb, depth); + /* Length of [a,S] plus [L] ... */ + offset = 1 + isl_map_dim (pdr->accesses, isl_dim_out); + /* ... plus [T]. */ + offset += isl_map_dim (pbb->transformed, isl_dim_out); + + c = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space (map))); + c = isl_constraint_set_coefficient_si (c, isl_dim_out, time_depth, 1); + c = isl_constraint_set_coefficient_si (c, isl_dim_out, + offset + time_depth, -1); + c = isl_constraint_set_constant_si (c, 1); + map = isl_map_add_constraint (map, c); + + /* Now we equate most of the T/T' elements (making PITaSL nearly + the same is (PITaSL)', except for one dimension, namely for 'depth' + (an index into [I]), after translating to index into [T]. Take care + to not produce an empty map, which indicates we wanted to equate + two dimensions that are already coupled via the above time_depth + dimension. Happens with strip mining where several scatter dimension + are interdependend. */ + /* Length of [T]. */ + nt = pbb_nb_scattering_transform (pbb) + pbb_nb_local_vars (pbb); + for (i = 0; i < nt; i++) + if (i != time_depth) + { + isl_map *temp = isl_map_equate (isl_map_copy (map), + isl_dim_out, i, + isl_dim_out, offset + i); + if (isl_map_is_empty (temp)) + isl_map_free (temp); + else + { + isl_map_free (map); + map = temp; + } + } + + /* Now maximize the expression L' - L. */ + set = isl_map_range (map); + dc = isl_set_get_space (set); + aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc)); + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, offset - 1, -1); + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, offset + offset - 1, 1); + isl_int_init (islstride); + lpres = isl_set_max (set, aff, &islstride); + isl_int_get_gmp (islstride, stride); + isl_int_clear (islstride); + isl_aff_free (aff); + isl_set_free (set); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + char *str; + void (*gmp_free) (void *, size_t); + + fprintf (dump_file, "\nStride in BB_%d, DR_%d, depth %d:", + pbb_index (pbb), PDR_ID (pdr), (int) depth); + str = mpz_get_str (0, 10, stride); + fprintf (dump_file, " %s ", str); + mp_get_memory_functions (NULL, NULL, &gmp_free); + (*gmp_free) (str, strlen (str) + 1); + } +} + /* Sets STRIDES to the sum of all the strides of the data references accessed in LOOP at DEPTH. */ @@ -339,9 +489,11 @@ memory_strides_in_loop_1 (lst_p loop, graphite_dim_t depth, mpz_t strides) lst_p l; poly_dr_p pdr; mpz_t s, n; + mpz_t sold; mpz_init (s); mpz_init (n); + mpz_init (sold); FOR_EACH_VEC_ELT (lst_p, LST_SEQ (loop), j, l) if (LST_LOOP_P (l)) @@ -350,6 +502,11 @@ memory_strides_in_loop_1 (lst_p loop, graphite_dim_t depth, mpz_t strides) FOR_EACH_VEC_ELT (poly_dr_p, PBB_DRS (LST_PBB (l)), i, pdr) { pdr_stride_in_loop (s, depth, pdr); + ppl_pdr_stride_in_loop (sold, depth, pdr); + /* Ideally this would hold, but it doesn't on e.g. + interchange-8.c. ??? Not analyzed why. */ + if (0 != mpz_cmp (s, sold)) + {} mpz_set_si (n, PDR_NB_REFS (pdr)); mpz_mul (s, s, n); mpz_add (strides, strides, s); @@ -357,6 +514,7 @@ memory_strides_in_loop_1 (lst_p loop, graphite_dim_t depth, mpz_t strides) mpz_clear (s); mpz_clear (n); + mpz_clear (sold); } /* Sets STRIDES to the sum of all the strides of the data references diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c index 4520bbb..bf1854c 100644 --- a/gcc/graphite-poly.c +++ b/gcc/graphite-poly.c @@ -26,6 +26,8 @@ along with GCC; see the file COPYING3. If not see #include #include #include +#include +#include #include #include #endif @@ -1573,6 +1575,25 @@ debug_scop_params (scop_p scop, int verbosity) print_scop_params (stderr, scop, verbosity); } +extern isl_ctx *the_isl_ctx; +void print_isl_set (FILE *f, isl_set *set); +void print_isl_map (FILE *f, isl_map *map); +void +print_isl_set (FILE *f, isl_set *set) +{ + isl_printer *p = isl_printer_to_file (the_isl_ctx, f); + p = isl_printer_print_set (p, set); + isl_printer_free (p); +} + +void +print_isl_map (FILE *f, isl_map *map) +{ + isl_printer *p = isl_printer_to_file (the_isl_ctx, f); + p = isl_printer_print_map (p, map); + isl_printer_free (p); +} + /* The dimension in the transformed scattering polyhedron of PBB containing the scattering iterator for the loop at depth LOOP_DEPTH. */ @@ -1643,6 +1664,9 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, graphite_dim_t time_depth, mpz_t res) { + /* XXX rewrite with isl, used from lst_niter_for_loop. */ + isl_map *lbmap, *ubmap; + isl_set *lbset, *ubset; ppl_Pointset_Powerset_C_Polyhedron_t domain, sctr_lb, sctr_ub; ppl_dimension_type domain_dim, sctr_dim; graphite_dim_t dim_iter_domain = pbb_dim_iter_domain (pbb); @@ -1667,6 +1691,7 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, that upper bound to the scattering. */ ppl_new_Pointset_Powerset_C_Polyhedron_from_C_Polyhedron (&sctr_ub, PBB_TRANSFORMED_SCATTERING (pbb)); + ubmap = isl_map_copy (pbb->transformed); for (i = 0; i < (int) dim_iter_domain; i++) { ppl_Linear_Expression_t eq; @@ -1679,6 +1704,30 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, ppl_set_coef (le, i, 1); ppl_min_for_le_pointset (domain, le, lb); ppl_max_for_le_pointset (domain, le, ub); + { + isl_space *dc = isl_set_get_space (pbb->domain); + isl_aff *aff; + isl_int isllb, islub, isldiff; + enum isl_lp_result res; + isl_constraint *c; + isl_int_init (isllb); + isl_int_init (islub); + isl_int_init (isldiff); + aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc)); + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, i, 1); + res = isl_set_min (pbb->domain, aff, &isllb); + res = isl_set_max (pbb->domain, aff, &islub); + isl_int_sub (isldiff, islub, isllb); + isl_int_add_ui (isldiff, isldiff, 1); + c = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space (pbb->transformed))); + c = isl_constraint_set_coefficient_si (c, isl_dim_in, i, -1); + c = isl_constraint_set_constant (c, isldiff); + ubmap = isl_map_add_constraint (ubmap, c); + isl_int_clear (isllb); + isl_int_clear (islub); + isl_int_clear (isldiff); + isl_aff_free (aff); + } mpz_sub (diff, ub, lb); mpz_add (diff, diff, one); @@ -1704,6 +1753,7 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, it to the scattering. */ ppl_new_Pointset_Powerset_C_Polyhedron_from_C_Polyhedron (&sctr_lb, PBB_TRANSFORMED_SCATTERING (pbb)); + lbmap = isl_map_copy (pbb->transformed); for (i = 0; i < (int) dim_iter_domain; i++) { ppl_Linear_Expression_t eq; @@ -1732,6 +1782,23 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, ppl_delete_Pointset_Powerset_C_Polyhedron (pph); ppl_delete_Constraint (pc); ppl_delete_Constraint_System (cs); + { + isl_space *dc = isl_set_get_space (pbb->domain); + isl_aff *aff; + isl_int isllb; + enum isl_lp_result res; + isl_constraint *c; + isl_int_init (isllb); + aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc)); + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, i, 1); + res = isl_set_min (pbb->domain, aff, &isllb); + c = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space (pbb->transformed))); + c = isl_constraint_set_coefficient_si (c, isl_dim_in, i, -1); + c = isl_constraint_set_constant (c, isllb); + lbmap = isl_map_add_constraint (lbmap, c); + isl_int_clear (isllb); + isl_aff_free (aff); + } } /* Extract the number of iterations. */ @@ -1741,6 +1808,56 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, ppl_max_for_le_pointset (sctr_ub, le, ub); mpz_sub (res, ub, lb); + { + isl_space *dc; + isl_aff *aff; + isl_int isllb, islub, islres; + enum isl_lp_result lpres; + + isl_int_init (isllb); + isl_int_init (islub); + isl_int_init (islres); + + lbset = isl_map_range (lbmap); + ubset = isl_map_range (ubmap); + + dc = isl_set_get_space (lbset); + aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc)); + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, time_depth, 1); + lpres = isl_set_min (lbset, aff, &isllb); + lpres = isl_set_max (ubset, aff, &islub); + + isl_int_sub (islres, islub, isllb); + + isl_int_set_gmp (isllb, res); + + /* This would ideally hold, but the PPL code actually is buggy + (returns upper bound 0 for unanalyzable domains), and incomplete, + pdr_add_data_dimensions can constrain the ISL domain more than + the PPL domain. So, with ISL we get better results on pr37485.c. */ + /*gcc_assert (isl_int_eq (isllb, islres));*/ + + /* Even shorter and more correct method: map the iteration domain + through the current scatter, and work on the resulting set. */ + isl_set_free (lbset); + lbset = isl_set_apply (isl_set_copy (pbb->domain), + isl_map_copy (pbb->transformed)); + lpres = isl_set_min (lbset, aff, &isllb); + lpres = isl_set_max (lbset, aff, &islub); + + isl_int_sub (islub, islub, isllb); + isl_int_add_ui (islub, islub, 1); + gcc_assert (isl_int_eq (islub, islres)); + + isl_int_clear (isllb); + isl_int_clear (islub); + isl_int_clear (islres); + isl_aff_free (aff); + + isl_set_free (lbset); + isl_set_free (ubset); + } + mpz_clear (one); mpz_clear (diff); mpz_clear (lb); diff --git a/gcc/graphite.c b/gcc/graphite.c index dd98f5e..821efb1 100644 --- a/gcc/graphite.c +++ b/gcc/graphite.c @@ -253,6 +253,8 @@ graphite_finalize (bool need_cfg_cleanup_p) print_loops (dump_file, 3); } +isl_ctx *the_isl_ctx; + /* Perform a set of linear transforms on the loops of the current function. */ @@ -276,6 +278,7 @@ graphite_transform_loops (void) if (!graphite_initialize (ctx)) return; + the_isl_ctx = ctx; build_scops (&scops); if (dump_file && (dump_flags & TDF_DETAILS)) @@ -301,6 +304,7 @@ graphite_transform_loops (void) htab_delete (bb_pbb_mapping); free_scops (scops); graphite_finalize (need_cfg_cleanup_p); + the_isl_ctx = NULL; isl_ctx_free (ctx); }