Submitter | Michael Matz |
---|---|

Date | June 26, 2012, 10:21 a.m. |

Message ID | <alpine.LNX.2.00.1206261203020.25982@wotan.suse.de> |

Download | mbox | patch |

Permalink | /patch/167362/ |

State | New |

Headers | show |

## Comments

On 06/26/2012 12:21 PM, Michael Matz wrote: > Hi, > > so, to make progress on the graphite front we want to get rid of the ppl > dependency. We start from Tobis move-to-isl-and-isl-scheduler branch at > github, merged current trunk into that (see also Richis mails about > cloog/isl configury), and add ISL implementations of the essential parts > that still are ppl-only. These are the number of iteration analysis for > potentially transformed iteration spaces (this wasn't mentioned in Tobis > mail from February) and the interchange heuristic (basically stride > calculation for transformed spaces). > > So I learned PPL and ISL and rewrote that part :) Like in the patch > below. I'm not asking for any proper review for this, but rather if the > isl code makes sort of sense. The testsuite works, except for pr43012.c, > which exposes the fact that the current PPL nr-iter implementation is > buggy (it computes the nr-iter of a 77<=i<=88 loop to be 65), so that > testcase runs into the nr-iter-ppl==nr-iter-isl assert. > > In a private branch I have already removed all PPL code whatsoever (and > cleaned up the ISL code I added) so we make good progress there. Hi Michael, I am very impressed with how little time it took you to implement this patch. As it is working for several cases, you got a lot of stuff right. ;-) I will still add some comments to the patch. > +static isl_constraint * > +build_linearized_memory_access (isl_map *map, poly_dr_p pdr) > +{ > + isl_constraint *res; > + isl_local_space *ls = isl_local_space_from_space (isl_map_get_space (map)); > + unsigned offset, nsubs; > + int i; > + isl_int size, subsize; > + > + res = isl_equality_alloc (ls); > + isl_int_init (size); > + isl_int_set_ui (size, 1); > + isl_int_init (subsize); > + isl_int_set_ui (subsize, 1); > + > + nsubs = isl_set_dim (pdr->extent, isl_dim_set); > + /* -1 for the already included L dimension. */ > + offset = isl_map_dim (map, isl_dim_out) - 1 - nsubs; > + res = isl_constraint_set_coefficient_si (res, isl_dim_out, offset + nsubs, -1); > + /* Go through all subscripts from last to first. First dimension > + is the alias set, ignore it. */ > + for (i = nsubs - 1; i>= 1; i--) > + { > + isl_space *dc; > + isl_aff *aff; > + enum isl_lp_result lpres; > + > + res = isl_constraint_set_coefficient (res, isl_dim_out, offset + i, size); > + > + dc = isl_set_get_space (pdr->extent); > + aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc)); > + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, i, 1); > + lpres = isl_set_max (pdr->extent, aff,&subsize); > + isl_aff_free (aff); > + isl_int_mul (size, size, subsize); > + } > + > + isl_int_clear (subsize); > + isl_int_clear (size); > + > + return res; > +} The function itself looks correct. However, I would personally have returned an isl_map instead of an isl_constraint. A map {A[i,j] -> [200 * i]} is probably a nicer way to represent a transformation from the higher level array type to an actual memory reference. > +static void > +pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) > +{ > + poly_bb_p pbb = PDR_PBB (pdr); > + isl_map *map; > + isl_set *set; > + isl_aff *aff; > + isl_space *dc; > + isl_constraint *lma, *c; > + isl_int islstride; > + enum isl_lp_result lpres; > + graphite_dim_t time_depth; > + unsigned offset, nt; > + unsigned i; > + /* pdr->accesses: [P1..nb_param,I1..nb_domain]->[a,S1..nb_subscript] > + ??? [P] not used for PDRs? > + pdr->extent: [a,S1..nb_subscript] > + pbb->domain: [P1..nb_param,I1..nb_domain] > + pbb->transformed: [P1..nb_param,I1..nb_domain]->[T1..Tnb_sctr] > + [T] includes local vars (currently unused) > + > + First we create [P,I] -> [T,a,S]. */ > + > + map = isl_map_flat_range_product (isl_map_copy (pbb->transformed), > + isl_map_copy (pdr->accesses)); > + /* Add a dimension for L: [P,I] -> [T,a,S,L].*/ > + map = isl_map_add_dims (map, isl_dim_out, 1); > + /* Build a constraint for "lma[S] - L == 0", effectively calculating > + L in terms of subscripts. */ > + lma = build_linearized_memory_access (map, pdr); > + /* And add it to the map, so we now have: > + [P,I] -> [T,a,S,L] : lma([S]) == L. */ > + map = isl_map_add_constraint (map, lma); > + > + /* Then we create [P,I,P',I'] -> [T,a,S,L,T',a',S',L']. */ > + map = isl_map_flat_product (map, isl_map_copy (map)); Your analysis is here a 1:1 mapping from PPL to isl. That works, but is probably not the nicest/conceptually cleanest way to implement this. In general, in isl we try to avoid adding/removing individual dimensions explicitly. The canonical way to express such transformations is to create a new map and to "apply" this map on the set/map you want to modify. For this algorithm, I would start with the pdr->accesses map and 'apply' pbb->transformed to map the original iteration space into the scattering space. I would than apply the isl_map calculated by build_linearized_memory_access to map the array to the actual data location accessed. Now you create a map 'scattering->scattering' that maps from one scattering time point to the next. And apply the map 'scattering->memory' on both the range and the domain. You will end up with a map memory -> memory. Here you can use the deltas function to give you the distances. ;-) You can have a look in Polly, where I implement a similar calculation: http://repo.or.cz/w/polly-mirror.git/blob/HEAD:/lib/Analysis/ScopInfo.cpp#l390 I wrote the stride detection code. You are allowed to use a copy of it for GCC under my copyright assignment. > /* Sets STRIDES to the sum of all the strides of the data references > diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c > index 4520bbb..bf1854c 100644 > --- a/gcc/graphite-poly.c > +++ b/gcc/graphite-poly.c > @@ -26,6 +26,8 @@ along with GCC; see the file COPYING3. If not see > #include<isl/map.h> > #include<isl/union_map.h> > #include<isl/constraint.h> > +#include<isl/ilp.h> > +#include<isl/aff.h> > #include<cloog/cloog.h> > #include<cloog/isl/domain.h> > #endif > @@ -1573,6 +1575,25 @@ debug_scop_params (scop_p scop, int verbosity) > print_scop_params (stderr, scop, verbosity); > } > > +extern isl_ctx *the_isl_ctx; > +void print_isl_set (FILE *f, isl_set *set); > +void print_isl_map (FILE *f, isl_map *map); > +void > +print_isl_set (FILE *f, isl_set *set) > +{ > + isl_printer *p = isl_printer_to_file (the_isl_ctx, f); > + p = isl_printer_print_set (p, set); > + isl_printer_free (p); > +} > + > +void > +print_isl_map (FILE *f, isl_map *map) > +{ > + isl_printer *p = isl_printer_to_file (the_isl_ctx, f); > + p = isl_printer_print_map (p, map); > + isl_printer_free (p); > +} Are you introducing these functions for debugging purposes. If this is the case, you can either use isl_set_dump(set), isl_map_dump(map), .. or you can use the gdb islprint command that should be made available by the gdb python script, installed by isl. Also, you can get the_isl_ctx, by calling isl_map_get_ctx(map). > @@ -1704,6 +1753,7 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, > it to the scattering. */ > ppl_new_Pointset_Powerset_C_Polyhedron_from_C_Polyhedron > (&sctr_lb, PBB_TRANSFORMED_SCATTERING (pbb)); > + lbmap = isl_map_copy (pbb->transformed); > for (i = 0; i< (int) dim_iter_domain; i++) > { > ppl_Linear_Expression_t eq; > @@ -1732,6 +1782,23 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, > ppl_delete_Pointset_Powerset_C_Polyhedron (pph); > ppl_delete_Constraint (pc); > ppl_delete_Constraint_System (cs); > + { > + isl_space *dc = isl_set_get_space (pbb->domain); > + isl_aff *aff; > + isl_int isllb; > + enum isl_lp_result res; > + isl_constraint *c; > + isl_int_init (isllb); > + aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc)); > + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, i, 1); > + res = isl_set_min (pbb->domain, aff,&isllb); > + c = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space (pbb->transformed))); > + c = isl_constraint_set_coefficient_si (c, isl_dim_in, i, -1); > + c = isl_constraint_set_constant (c, isllb); > + lbmap = isl_map_add_constraint (lbmap, c); > + isl_int_clear (isllb); > + isl_aff_free (aff); > + } > } > > /* Extract the number of iterations. */ > @@ -1741,6 +1808,56 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, > ppl_max_for_le_pointset (sctr_ub, le, ub); > mpz_sub (res, ub, lb); > > + { > + isl_space *dc; > + isl_aff *aff; > + isl_int isllb, islub, islres; > + enum isl_lp_result lpres; > + > + isl_int_init (isllb); > + isl_int_init (islub); > + isl_int_init (islres); > + > + lbset = isl_map_range (lbmap); > + ubset = isl_map_range (ubmap); > + > + dc = isl_set_get_space (lbset); > + aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc)); > + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, time_depth, 1); > + lpres = isl_set_min (lbset, aff,&isllb); > + lpres = isl_set_max (ubset, aff,&islub); > + > + isl_int_sub (islres, islub, isllb); > + > + isl_int_set_gmp (isllb, res); > + > + /* This would ideally hold, but the PPL code actually is buggy > + (returns upper bound 0 for unanalyzable domains), and incomplete, > + pdr_add_data_dimensions can constrain the ISL domain more than > + the PPL domain. So, with ISL we get better results on pr37485.c. */ > + /*gcc_assert (isl_int_eq (isllb, islres));*/ > + > + /* Even shorter and more correct method: map the iteration domain > + through the current scatter, and work on the resulting set. */ > + isl_set_free (lbset); > + lbset = isl_set_apply (isl_set_copy (pbb->domain), > + isl_map_copy (pbb->transformed)); > + lpres = isl_set_min (lbset, aff,&isllb); > + lpres = isl_set_max (lbset, aff,&islub); > + > + isl_int_sub (islub, islub, isllb); > + isl_int_add_ui (islub, islub, 1); > + gcc_assert (isl_int_eq (islub, islres)); > + > + isl_int_clear (isllb); > + isl_int_clear (islub); > + isl_int_clear (islres); > + isl_aff_free (aff); > + > + isl_set_free (lbset); > + isl_set_free (ubset); > + } This is again modeling the PPL code. There is nothing wrong with this, however you may be able to simplify the code by using isl_*_deltas. > + > mpz_clear (one); > mpz_clear (diff); > mpz_clear (lb); > diff --git a/gcc/graphite.c b/gcc/graphite.c > index dd98f5e..821efb1 100644 > --- a/gcc/graphite.c > +++ b/gcc/graphite.c > @@ -253,6 +253,8 @@ graphite_finalize (bool need_cfg_cleanup_p) > print_loops (dump_file, 3); > } > > +isl_ctx *the_isl_ctx; Why are you creating a global ctx here? Thanks a lot Michael for working on this. Tobi

Hi Tobi, On Fri, 29 Jun 2012, Tobias Grosser wrote: > > +static isl_constraint * > > +build_linearized_memory_access (isl_map *map, poly_dr_p pdr) > > +{ > > +} > > The function itself looks correct. However, I would personally have returned > an isl_map instead of an isl_constraint. As you noticed I modelled all my isl code after the existing PPL code (which returned an expression here, turned into a conflict in the caller). I couldn't learn the polyhedral basics, PPL basics, isl basics _and_ the correct isl idioms on one weekend :) So many thanks for the hints. > A map {A[i,j] -> [200 * i]} is probably a nicer way to represent a > transformation from the higher level array type to an actual memory > reference. Hmm. This ignores the 'j' input dimension. Can I also have a map ala {A[i,j] -> [200 * i + j]}? I don't directly see how, as for that I'd need at least another dimension (the 'L' in the callers map). > > +pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) > > +{ > > + poly_bb_p pbb = PDR_PBB (pdr); > > + isl_map *map; > > + isl_set *set; > > + isl_aff *aff; > > + isl_space *dc; > > + isl_constraint *lma, *c; > > + isl_int islstride; > > + enum isl_lp_result lpres; > > + graphite_dim_t time_depth; > > + unsigned offset, nt; > > + unsigned i; > > + /* pdr->accesses: [P1..nb_param,I1..nb_domain]->[a,S1..nb_subscript] > > + ??? [P] not used for PDRs? > > + pdr->extent: [a,S1..nb_subscript] > > + pbb->domain: [P1..nb_param,I1..nb_domain] > > + pbb->transformed: [P1..nb_param,I1..nb_domain]->[T1..Tnb_sctr] > > + [T] includes local vars (currently unused) > > + > > + First we create [P,I] -> [T,a,S]. */ > > + > > + map = isl_map_flat_range_product (isl_map_copy (pbb->transformed), > > + isl_map_copy (pdr->accesses)); > > + /* Add a dimension for L: [P,I] -> [T,a,S,L].*/ > > + map = isl_map_add_dims (map, isl_dim_out, 1); > > + /* Build a constraint for "lma[S] - L == 0", effectively calculating > > + L in terms of subscripts. */ > > + lma = build_linearized_memory_access (map, pdr); > > + /* And add it to the map, so we now have: > > + [P,I] -> [T,a,S,L] : lma([S]) == L. */ > > + map = isl_map_add_constraint (map, lma); > > + > > + /* Then we create [P,I,P',I'] -> [T,a,S,L,T',a',S',L']. */ > > + map = isl_map_flat_product (map, isl_map_copy (map)); > > Your analysis is here a 1:1 mapping from PPL to isl. That works, but is > probably not the nicest/conceptually cleanest way to implement this. In > general, in isl we try to avoid adding/removing individual dimensions > explicitly. Yeah, I noticed that, but didn't know how to get around adding the L dimension (I at least used flat_product for creating the X' dimensions instead of explicit dimension insertions and moving). > The canonical way to express such transformations is to create a > new map and to "apply" this map on the set/map you want to modify. For this > algorithm, I would start with the pdr->accesses map and 'apply' > pbb->transformed to map the original iteration space into the scattering > space. Let's see if I understand: So, we have [PI]->[aS] (Params, Iteration domain, alias set, Subscript) as accesses, and we have [PI]->[T] (Transformed) as the scattering. Indeed what I ideally wanted to have is a [T]->[aS] mapping, but I couldn't see how to construct such. So you're saying that I can apply the PI->T to the domain of PI->aS, and that gives me the T->aS? Probably I was confused by the functional syntax (the ->), as in normal functions you can't simply apply a function to a functions domain and expect to get anything sensible out of that. I see that in your code you reverse the scattering first (T->PI), IIRC that's one thing I also tried, but I got stuck somewhere. > I would than apply the isl_map calculated by > build_linearized_memory_access to map the array to the actual data > location accessed. Now you create a map 'scattering->scattering' that > maps from one scattering time point to the next. And apply the map > 'scattering->memory' on both the range and the domain. You will end up > with a map memory -> memory. Yeah. If my above understanding is correct the path is clear. > Here you can use the deltas function to give you the distances. ;-) I couldn't figure out from isls user manual what exactly the delta functions do :) From the description above I think I understand now. Let me experiment a bit more. > You can have a look in Polly, where I implement a similar calculation: > > http://repo.or.cz/w/polly-mirror.git/blob/HEAD:/lib/Analysis/ScopInfo.cpp#l390 The games you play with projecting out some dimensions are related to which dimension you want to calculate the stride, right? (As you always calculate the next scatterting to have n+1 in the last dimension, not an arbitrary one). > > +print_isl_map (FILE *f, isl_map *map) > > If this is the > case, you can either use isl_set_dump(set), isl_map_dump(map), .. There functions weren't documented. Thanks. But I need a file argument anyway as the plan is to use them also for debug dumps. > Also, you can get the_isl_ctx, by calling isl_map_get_ctx(map). Indeed, that occurred to me only later. > > + /* Even shorter and more correct method: map the iteration domain > > + through the current scatter, and work on the resulting set. */ > > + isl_set_free (lbset); > > + lbset = isl_set_apply (isl_set_copy (pbb->domain), > > + isl_map_copy (pbb->transformed)); > > + lpres = isl_set_min (lbset, aff,&isllb); > > + lpres = isl_set_max (lbset, aff,&islub); > > + > > + isl_int_sub (islub, islub, isllb); > > + isl_int_add_ui (islub, islub, 1); > > This is again modeling the PPL code. Actually the PPL code translation is the one I removed from my reply here. The above is what made that code dead and what is in the final version (with renamed variables and such). Is using _deltas really better? I would have to construct a map from lexmin(transformed-domain)->lexmax(transformed-domain), then apply _delts and the still extract the number. > > +++ b/gcc/graphite.c > > @@ -253,6 +253,8 @@ graphite_finalize (bool need_cfg_cleanup_p) > > print_loops (dump_file, 3); > > } > > > > +isl_ctx *the_isl_ctx; > > Why are you creating a global ctx here? For the printer, before I was aware of the fact that of course all isl objects know about their context. Ciao, Michael.

On 07/02/2012 01:32 PM, Michael Matz wrote: > Hi Tobi, > > On Fri, 29 Jun 2012, Tobias Grosser wrote: > >>> +static isl_constraint * >>> +build_linearized_memory_access (isl_map *map, poly_dr_p pdr) >>> +{ >>> +} >> >> The function itself looks correct. However, I would personally have returned >> an isl_map instead of an isl_constraint. > > As you noticed I modelled all my isl code after the existing PPL code > (which returned an expression here, turned into a conflict in the caller). > I couldn't learn the polyhedral basics, PPL basics, isl basics _and_ the > correct isl idioms on one weekend :) So many thanks for the hints. No worries, I believe a direct translation is a good start. we can improve later on that. >> A map {A[i,j] -> [200 * i]} is probably a nicer way to represent a >> transformation from the higher level array type to an actual memory >> reference. > > Hmm. This ignores the 'j' input dimension. Can I also have a map ala > {A[i,j] -> [200 * i + j]}? I don't directly see how, as for that I'd need > at least another dimension (the 'L' in the callers map). Yes, you can have this: A[i,j] -> [200 * i + j]} It is basically a pretty printing of: {A[i,j] -> [o]: o = 200 * i + j} >> The canonical way to express such transformations is to create a >> new map and to "apply" this map on the set/map you want to modify. For this >> algorithm, I would start with the pdr->accesses map and 'apply' >> pbb->transformed to map the original iteration space into the scattering >> space. > > Let's see if I understand: So, we have [PI]->[aS] (Params, Iteration > domain, alias set, Subscript) as accesses, and we have [PI]->[T] > (Transformed) as the scattering. Indeed what I ideally wanted to have is > a [T]->[aS] mapping, but I couldn't see how to construct such. So you're > saying that I can apply the PI->T to the domain of PI->aS, and that gives > me the T->aS? You start with a [P] -> {[I] -> [aS]} as accesses and a [P] -> {[I] -> [T]}. To get [P] -> {[T] -> [aS]} you do the following: isl_map *Accesses = "[P] -> {[I] -> [aS]}" isl_map *Scattering = " [P] -> {[I] -> [T]}" isl_map *TimeToAccess = isl_map_apply_domain(Accesses, Scattering) isl_map_dump(TimeToAccess); $ [P] { [T]->[aS] } > Probably I was confused by the functional syntax (the ->), > as in normal functions you can't simply apply a function to a functions > domain and expect to get anything sensible out of that. No, all those are relations. Reverse and combine them works in general very well. > I see that in > your code you reverse the scattering first (T->PI), IIRC that's one thing > I also tried, but I got stuck somewhere. I don't think reversing is necessary in your case. As the input space of the scattering, which is "[I]", matches the domain of the access map. So you can directly use isl_map_apply_domain. >> I would than apply the isl_map calculated by >> build_linearized_memory_access to map the array to the actual data >> location accessed. Now you create a map 'scattering->scattering' that >> maps from one scattering time point to the next. And apply the map >> 'scattering->memory' on both the range and the domain. You will end up >> with a map memory -> memory. > > Yeah. If my above understanding is correct the path is clear. I believe it is. >> Here you can use the deltas function to give you the distances. ;-) > > I couldn't figure out from isls user manual what exactly the delta > functions do :) From the description above I think I understand now. Let > me experiment a bit more. The delta function calculates the following: Input: {[a1, a2, a3] -> [b1, b2, b3]: <constraints between a and b> } Output {[x1, x2, x3] exists a1, a2, a3, b1, b2, b3 : x1 = b1 - a1 and x2 = b2 - a2 and x3 = b3 - a3 and <constraints between a and b> } So basically it calculates the distances between each possible pair of input and output values. >> You can have a look in Polly, where I implement a similar calculation: >> >> http://repo.or.cz/w/polly-mirror.git/blob/HEAD:/lib/Analysis/ScopInfo.cpp#l390 > > The games you play with projecting out some dimensions are related to > which dimension you want to calculate the stride, right? (As you always > calculate the next scatterting to have n+1 in the last dimension, not an > arbitrary one). I always calculate the stride in the last dimension. This is less general as what you are trying to calculate. The projecting out of dimensions is just needed due to the ugly CLooG interface, which gives me a mixture or scattering and domain dimensions. You should need it in gcc. >>> +print_isl_map (FILE *f, isl_map *map) >> >> If this is the >> case, you can either use isl_set_dump(set), isl_map_dump(map), .. > > There functions weren't documented. Thanks. But I need a file argument > anyway as the plan is to use them also for debug dumps. OK. >>> + /* Even shorter and more correct method: map the iteration domain >>> + through the current scatter, and work on the resulting set. */ >>> + isl_set_free (lbset); >>> + lbset = isl_set_apply (isl_set_copy (pbb->domain), >>> + isl_map_copy (pbb->transformed)); >>> + lpres = isl_set_min (lbset, aff,&isllb); >>> + lpres = isl_set_max (lbset, aff,&islub); >>> + >>> + isl_int_sub (islub, islub, isllb); >>> + isl_int_add_ui (islub, islub, 1); >> >> This is again modeling the PPL code. > > Actually the PPL code translation is the one I removed from my reply here. > The above is what made that code dead and what is in the final version > (with renamed variables and such). Is using _deltas really better? I > would have to construct a map from > lexmin(transformed-domain)->lexmax(transformed-domain), then apply > _delts and the still extract the number. Yes, you are right. For this specific case, the benefits are less clear. Cheers Tobi

On 07/02/2012 01:32 PM, Michael Matz wrote: >>> +++ b/gcc/graphite.c >>> @@ -253,6 +253,8 @@ graphite_finalize (bool need_cfg_cleanup_p) >>> print_loops (dump_file, 3); >>> } >>> >>> +isl_ctx *the_isl_ctx; >> >> Why are you creating a global ctx here? > > For the printer, before I was aware of the fact that of course all isl > objects know about their context. Ah, I forgot. Can you commit a patch, that removes the_isl_ctx from graphite? Tobi

Hi, On Mon, 2 Jul 2012, Tobias Grosser wrote: > > Yeah. If my above understanding is correct the path is clear. > > I believe it is. It somewhat works. It has problems in those cases where there are dependencies between different input dimensions in the scattering, which happens for strip-mining. interchange-8.c for instance. The core is: isl_map *subs = isl_map_copy (pdr->accesses); subs = isl_map_intersect_range (subs, isl_set_copy (pdr->extent)); subs = isl_map_apply_domain (subs, isl_map_copy (pbb->transformed)); subs = isl_map_apply_range (subs, build_linearized_memory_access (pdr)); So, subs is now the [T]->address mapping. space = isl_space_range (isl_map_get_space (pbb->transformed)); t_to_t1 = isl_map_universe (isl_space_map_from_set (space)); space = isl_map_get_space (t_to_t1); c = isl_equality_alloc (isl_local_space_from_space (space)); c = isl_constraint_set_coefficient_si (c, isl_dim_in, time_depth, 1); c = isl_constraint_set_coefficient_si (c, isl_dim_out, time_depth, -1); c = isl_constraint_set_constant_si (c, 1); t_to_t1 = isl_map_add_constraint (t_to_t1, c); t_to_t1 is now the T_depth->T_depth+1 mapping, e.g.: { [i0, i1, i2, i3, i4] -> [o0, 1 + i1, o2, o3, o4] } t_to_t1 = isl_map_apply_domain (t_to_t1, isl_map_copy (subs)); t_to_t1 = isl_map_apply_range (t_to_t1, subs); Now it's mem_addr -> mem_addr' mapping. deltas = isl_map_deltas (t_to_t1); And deltas is the difference, mem_addr' - mem_addr isl_int_init (islstride); if (!isl_set_plain_is_fixed (deltas, isl_dim_set, 0, &islstride)) gcc_unreachable (); So, I want to extract the difference, and that is what doesn't work. In my initial tries I had this additional loop to construct t_to_t1: for (i = 0; i < nt; i++) if (i != time_depth) t_to_t1 = isl_map_equate (t_to_t1, isl_dim_in, i, isl_dim_out, i); That is, I enforced also equality on all dimensions except for the T_depth's one. For interchange-0.c I got these sets: T->addr: { [0, i1, 0, i3, 0] -> [i1 + 999i3] : i1 >= 0 and i3 >= 0 and i1 <= 999 and i3 <= 999 } t_to_t1: { [i0, i1, i2, i3, i4] -> [i0, 1 + i1, i2, i3, i4] } mem->mem': { [i0] -> [1 + i0] : i0 <= 998999 and i0 >= 0 } And hence deltas indeed contains a single fixed number that I could extract. Of course forcing all transformed domains the same breaks exactly where there are interdependencies, so e.g. interchange-8.c breaks. So I thought I simply don't equate those. Now the sets for interchange-0.c look like so: T->addr: as above t_to_t1: { [i0, i1, i2, i3, i4] -> [o0, 1 + i1, o2, o3, o4] } Note how only dimension 1 is constrained. mem->mem': { [i0] -> [o0] : exists (e0, e1 = [(998 - i0 + o0 + 999e0)/999] : 999e1 = 998 - i0 + o0 + 999e0 and 999e0 <= -999 + i0 and e0 >= -1 and 999e0 >= -1997 + i0 and e0 <= 998 and 999e0 <= 997003 + i0 - o0 and 999e0 >= -998 + i0 - o0) } Doing deltas on that one gives: { [i0] : exists (e0, e1 = [(998 + i0 + 999e0)/999] : 999e1 = 998 + i0 + 999e0 and 999e0 >= -998 - i0 and e0 >= -1 and 999e0 <= 997003 - i0 and e0 <= 998) } > So, no fixed number. Doing isl_set_max/min on that doesn't seem to give sensible results (max=998002, min=-998000). For this specific instance it's supposed to give '1'. Hmm. Any thoughts? :) Ciao, Michael.

## Patch

diff --git a/gcc/graphite-interchange.c b/gcc/graphite-interchange.c index fa38f5c..96a7967 100644 --- a/gcc/graphite-interchange.c +++ b/gcc/graphite-interchange.c @@ -24,9 +24,11 @@ along with GCC; see the file COPYING3. If not see #include "config.h" #ifdef HAVE_cloog +#include <isl/aff.h> #include <isl/set.h> #include <isl/map.h> #include <isl/union_map.h> +#include <isl/ilp.h> #include <cloog/cloog.h> #include <cloog/isl/domain.h> #endif @@ -64,7 +66,7 @@ along with GCC; see the file COPYING3. If not see c_0 * s_0 + c_1 * s_1 + ... c_n * s_n. */ static ppl_Linear_Expression_t -build_linearized_memory_access (ppl_dimension_type offset, poly_dr_p pdr) +ppl_build_linearized_memory_access (ppl_dimension_type offset, poly_dr_p pdr) { ppl_Linear_Expression_t res; ppl_Linear_Expression_t le; @@ -189,7 +191,7 @@ build_partial_difference (ppl_Pointset_Powerset_C_Polyhedron_t *p, the loop at DEPTH. */ static void -pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) +ppl_pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) { ppl_dimension_type time_depth; ppl_Linear_Expression_t le, lma; @@ -245,7 +247,7 @@ pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) /* Construct the 0|0|0|0|0|S|0|l1|0 part. */ { - lma = build_linearized_memory_access (offset + dim_sctr, pdr); + lma = ppl_build_linearized_memory_access (offset + dim_sctr, pdr); ppl_set_coef (lma, dim_L1, -1); ppl_new_Constraint (&new_cstr, lma, PPL_CONSTRAINT_TYPE_EQUAL); ppl_Pointset_Powerset_C_Polyhedron_add_constraint (p1, new_cstr); @@ -310,6 +312,7 @@ pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) ppl_max_for_le_pointset (p1, le, stride); } +#if 0 if (dump_file && (dump_flags & TDF_DETAILS)) { char *str; @@ -322,12 +325,159 @@ pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) mp_get_memory_functions (NULL, NULL, &gmp_free); (*gmp_free) (str, strlen (str) + 1); } +#endif ppl_delete_Pointset_Powerset_C_Polyhedron (p1); ppl_delete_Pointset_Powerset_C_Polyhedron (p2); ppl_delete_Linear_Expression (le); } +static isl_constraint * +build_linearized_memory_access (isl_map *map, poly_dr_p pdr) +{ + isl_constraint *res; + isl_local_space *ls = isl_local_space_from_space (isl_map_get_space (map)); + unsigned offset, nsubs; + int i; + isl_int size, subsize; + + res = isl_equality_alloc (ls); + isl_int_init (size); + isl_int_set_ui (size, 1); + isl_int_init (subsize); + isl_int_set_ui (subsize, 1); + + nsubs = isl_set_dim (pdr->extent, isl_dim_set); + /* -1 for the already included L dimension. */ + offset = isl_map_dim (map, isl_dim_out) - 1 - nsubs; + res = isl_constraint_set_coefficient_si (res, isl_dim_out, offset + nsubs, -1); + /* Go through all subscripts from last to first. First dimension + is the alias set, ignore it. */ + for (i = nsubs - 1; i >= 1; i--) + { + isl_space *dc; + isl_aff *aff; + enum isl_lp_result lpres; + + res = isl_constraint_set_coefficient (res, isl_dim_out, offset + i, size); + + dc = isl_set_get_space (pdr->extent); + aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc)); + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, i, 1); + lpres = isl_set_max (pdr->extent, aff, &subsize); + isl_aff_free (aff); + isl_int_mul (size, size, subsize); + } + + isl_int_clear (subsize); + isl_int_clear (size); + + return res; +} + +static void +pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) +{ + poly_bb_p pbb = PDR_PBB (pdr); + isl_map *map; + isl_set *set; + isl_aff *aff; + isl_space *dc; + isl_constraint *lma, *c; + isl_int islstride; + enum isl_lp_result lpres; + graphite_dim_t time_depth; + unsigned offset, nt; + unsigned i; + /* pdr->accesses: [P1..nb_param,I1..nb_domain]->[a,S1..nb_subscript] + ??? [P] not used for PDRs? + pdr->extent: [a,S1..nb_subscript] + pbb->domain: [P1..nb_param,I1..nb_domain] + pbb->transformed: [P1..nb_param,I1..nb_domain]->[T1..Tnb_sctr] + [T] includes local vars (currently unused) + + First we create [P,I] -> [T,a,S]. */ + + map = isl_map_flat_range_product (isl_map_copy (pbb->transformed), + isl_map_copy (pdr->accesses)); + /* Add a dimension for L: [P,I] -> [T,a,S,L].*/ + map = isl_map_add_dims (map, isl_dim_out, 1); + /* Build a constraint for "lma[S] - L == 0", effectively calculating + L in terms of subscripts. */ + lma = build_linearized_memory_access (map, pdr); + /* And add it to the map, so we now have: + [P,I] -> [T,a,S,L] : lma([S]) == L. */ + map = isl_map_add_constraint (map, lma); + + /* Then we create [P,I,P',I'] -> [T,a,S,L,T',a',S',L']. */ + map = isl_map_flat_product (map, isl_map_copy (map)); + + /* Now add the equality T[time_depth] == T'[time_depth]+1. This will + force L' to be the linear address at T[time_depth] + 1. */ + time_depth = psct_dynamic_dim (pbb, depth); + /* Length of [a,S] plus [L] ... */ + offset = 1 + isl_map_dim (pdr->accesses, isl_dim_out); + /* ... plus [T]. */ + offset += isl_map_dim (pbb->transformed, isl_dim_out); + + c = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space (map))); + c = isl_constraint_set_coefficient_si (c, isl_dim_out, time_depth, 1); + c = isl_constraint_set_coefficient_si (c, isl_dim_out, + offset + time_depth, -1); + c = isl_constraint_set_constant_si (c, 1); + map = isl_map_add_constraint (map, c); + + /* Now we equate most of the T/T' elements (making PITaSL nearly + the same is (PITaSL)', except for one dimension, namely for 'depth' + (an index into [I]), after translating to index into [T]. Take care + to not produce an empty map, which indicates we wanted to equate + two dimensions that are already coupled via the above time_depth + dimension. Happens with strip mining where several scatter dimension + are interdependend. */ + /* Length of [T]. */ + nt = pbb_nb_scattering_transform (pbb) + pbb_nb_local_vars (pbb); + for (i = 0; i < nt; i++) + if (i != time_depth) + { + isl_map *temp = isl_map_equate (isl_map_copy (map), + isl_dim_out, i, + isl_dim_out, offset + i); + if (isl_map_is_empty (temp)) + isl_map_free (temp); + else + { + isl_map_free (map); + map = temp; + } + } + + /* Now maximize the expression L' - L. */ + set = isl_map_range (map); + dc = isl_set_get_space (set); + aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc)); + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, offset - 1, -1); + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, offset + offset - 1, 1); + isl_int_init (islstride); + lpres = isl_set_max (set, aff, &islstride); + isl_int_get_gmp (islstride, stride); + isl_int_clear (islstride); + isl_aff_free (aff); + isl_set_free (set); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + char *str; + void (*gmp_free) (void *, size_t); + + fprintf (dump_file, "\nStride in BB_%d, DR_%d, depth %d:", + pbb_index (pbb), PDR_ID (pdr), (int) depth); + str = mpz_get_str (0, 10, stride); + fprintf (dump_file, " %s ", str); + mp_get_memory_functions (NULL, NULL, &gmp_free); + (*gmp_free) (str, strlen (str) + 1); + } +} + /* Sets STRIDES to the sum of all the strides of the data references accessed in LOOP at DEPTH. */ @@ -339,9 +489,11 @@ memory_strides_in_loop_1 (lst_p loop, graphite_dim_t depth, mpz_t strides) lst_p l; poly_dr_p pdr; mpz_t s, n; + mpz_t sold; mpz_init (s); mpz_init (n); + mpz_init (sold); FOR_EACH_VEC_ELT (lst_p, LST_SEQ (loop), j, l) if (LST_LOOP_P (l)) @@ -350,6 +502,11 @@ memory_strides_in_loop_1 (lst_p loop, graphite_dim_t depth, mpz_t strides) FOR_EACH_VEC_ELT (poly_dr_p, PBB_DRS (LST_PBB (l)), i, pdr) { pdr_stride_in_loop (s, depth, pdr); + ppl_pdr_stride_in_loop (sold, depth, pdr); + /* Ideally this would hold, but it doesn't on e.g. + interchange-8.c. ??? Not analyzed why. */ + if (0 != mpz_cmp (s, sold)) + {} mpz_set_si (n, PDR_NB_REFS (pdr)); mpz_mul (s, s, n); mpz_add (strides, strides, s); @@ -357,6 +514,7 @@ memory_strides_in_loop_1 (lst_p loop, graphite_dim_t depth, mpz_t strides) mpz_clear (s); mpz_clear (n); + mpz_clear (sold); } /* Sets STRIDES to the sum of all the strides of the data references diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c index 4520bbb..bf1854c 100644 --- a/gcc/graphite-poly.c +++ b/gcc/graphite-poly.c @@ -26,6 +26,8 @@ along with GCC; see the file COPYING3. If not see #include <isl/map.h> #include <isl/union_map.h> #include <isl/constraint.h> +#include <isl/ilp.h> +#include <isl/aff.h> #include <cloog/cloog.h> #include <cloog/isl/domain.h> #endif @@ -1573,6 +1575,25 @@ debug_scop_params (scop_p scop, int verbosity) print_scop_params (stderr, scop, verbosity); } +extern isl_ctx *the_isl_ctx; +void print_isl_set (FILE *f, isl_set *set); +void print_isl_map (FILE *f, isl_map *map); +void +print_isl_set (FILE *f, isl_set *set) +{ + isl_printer *p = isl_printer_to_file (the_isl_ctx, f); + p = isl_printer_print_set (p, set); + isl_printer_free (p); +} + +void +print_isl_map (FILE *f, isl_map *map) +{ + isl_printer *p = isl_printer_to_file (the_isl_ctx, f); + p = isl_printer_print_map (p, map); + isl_printer_free (p); +} + /* The dimension in the transformed scattering polyhedron of PBB containing the scattering iterator for the loop at depth LOOP_DEPTH. */ @@ -1643,6 +1664,9 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, graphite_dim_t time_depth, mpz_t res) { + /* XXX rewrite with isl, used from lst_niter_for_loop. */ + isl_map *lbmap, *ubmap; + isl_set *lbset, *ubset; ppl_Pointset_Powerset_C_Polyhedron_t domain, sctr_lb, sctr_ub; ppl_dimension_type domain_dim, sctr_dim; graphite_dim_t dim_iter_domain = pbb_dim_iter_domain (pbb); @@ -1667,6 +1691,7 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, that upper bound to the scattering. */ ppl_new_Pointset_Powerset_C_Polyhedron_from_C_Polyhedron (&sctr_ub, PBB_TRANSFORMED_SCATTERING (pbb)); + ubmap = isl_map_copy (pbb->transformed); for (i = 0; i < (int) dim_iter_domain; i++) { ppl_Linear_Expression_t eq; @@ -1679,6 +1704,30 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, ppl_set_coef (le, i, 1); ppl_min_for_le_pointset (domain, le, lb); ppl_max_for_le_pointset (domain, le, ub); + { + isl_space *dc = isl_set_get_space (pbb->domain); + isl_aff *aff; + isl_int isllb, islub, isldiff; + enum isl_lp_result res; + isl_constraint *c; + isl_int_init (isllb); + isl_int_init (islub); + isl_int_init (isldiff); + aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc)); + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, i, 1); + res = isl_set_min (pbb->domain, aff, &isllb); + res = isl_set_max (pbb->domain, aff, &islub); + isl_int_sub (isldiff, islub, isllb); + isl_int_add_ui (isldiff, isldiff, 1); + c = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space (pbb->transformed))); + c = isl_constraint_set_coefficient_si (c, isl_dim_in, i, -1); + c = isl_constraint_set_constant (c, isldiff); + ubmap = isl_map_add_constraint (ubmap, c); + isl_int_clear (isllb); + isl_int_clear (islub); + isl_int_clear (isldiff); + isl_aff_free (aff); + } mpz_sub (diff, ub, lb); mpz_add (diff, diff, one); @@ -1704,6 +1753,7 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, it to the scattering. */ ppl_new_Pointset_Powerset_C_Polyhedron_from_C_Polyhedron (&sctr_lb, PBB_TRANSFORMED_SCATTERING (pbb)); + lbmap = isl_map_copy (pbb->transformed); for (i = 0; i < (int) dim_iter_domain; i++) { ppl_Linear_Expression_t eq; @@ -1732,6 +1782,23 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, ppl_delete_Pointset_Powerset_C_Polyhedron (pph); ppl_delete_Constraint (pc); ppl_delete_Constraint_System (cs); + { + isl_space *dc = isl_set_get_space (pbb->domain); + isl_aff *aff; + isl_int isllb; + enum isl_lp_result res; + isl_constraint *c; + isl_int_init (isllb); + aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc)); + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, i, 1); + res = isl_set_min (pbb->domain, aff, &isllb); + c = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space (pbb->transformed))); + c = isl_constraint_set_coefficient_si (c, isl_dim_in, i, -1); + c = isl_constraint_set_constant (c, isllb); + lbmap = isl_map_add_constraint (lbmap, c); + isl_int_clear (isllb); + isl_aff_free (aff); + } } /* Extract the number of iterations. */ @@ -1741,6 +1808,56 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb, ppl_max_for_le_pointset (sctr_ub, le, ub); mpz_sub (res, ub, lb); + { + isl_space *dc; + isl_aff *aff; + isl_int isllb, islub, islres; + enum isl_lp_result lpres; + + isl_int_init (isllb); + isl_int_init (islub); + isl_int_init (islres); + + lbset = isl_map_range (lbmap); + ubset = isl_map_range (ubmap); + + dc = isl_set_get_space (lbset); + aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc)); + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, time_depth, 1); + lpres = isl_set_min (lbset, aff, &isllb); + lpres = isl_set_max (ubset, aff, &islub); + + isl_int_sub (islres, islub, isllb); + + isl_int_set_gmp (isllb, res); + + /* This would ideally hold, but the PPL code actually is buggy + (returns upper bound 0 for unanalyzable domains), and incomplete, + pdr_add_data_dimensions can constrain the ISL domain more than + the PPL domain. So, with ISL we get better results on pr37485.c. */ + /*gcc_assert (isl_int_eq (isllb, islres));*/ + + /* Even shorter and more correct method: map the iteration domain + through the current scatter, and work on the resulting set. */ + isl_set_free (lbset); + lbset = isl_set_apply (isl_set_copy (pbb->domain), + isl_map_copy (pbb->transformed)); + lpres = isl_set_min (lbset, aff, &isllb); + lpres = isl_set_max (lbset, aff, &islub); + + isl_int_sub (islub, islub, isllb); + isl_int_add_ui (islub, islub, 1); + gcc_assert (isl_int_eq (islub, islres)); + + isl_int_clear (isllb); + isl_int_clear (islub); + isl_int_clear (islres); + isl_aff_free (aff); + + isl_set_free (lbset); + isl_set_free (ubset); + } + mpz_clear (one); mpz_clear (diff); mpz_clear (lb); diff --git a/gcc/graphite.c b/gcc/graphite.c index dd98f5e..821efb1 100644 --- a/gcc/graphite.c +++ b/gcc/graphite.c @@ -253,6 +253,8 @@ graphite_finalize (bool need_cfg_cleanup_p) print_loops (dump_file, 3); } +isl_ctx *the_isl_ctx; + /* Perform a set of linear transforms on the loops of the current function. */ @@ -276,6 +278,7 @@ graphite_transform_loops (void) if (!graphite_initialize (ctx)) return; + the_isl_ctx = ctx; build_scops (&scops); if (dump_file && (dump_flags & TDF_DETAILS)) @@ -301,6 +304,7 @@ graphite_transform_loops (void) htab_delete (bb_pbb_mapping); free_scops (scops); graphite_finalize (need_cfg_cleanup_p); + the_isl_ctx = NULL; isl_ctx_free (ctx); }