Message ID | 87wp6urs0j.fsf@linaro.org |
---|---|
State | New |
Headers | show |
Ping Richard Sandiford <richard.sandiford@linaro.org> writes: > Richard Sandiford <richard.sandiford@linaro.org> writes: >> Eric Botcazou <ebotcazou@adacore.com> writes: >>> [Sorry for missing the previous messages] >>> >>>> Thanks. Just been retesting, and I think I must have forgotten >>>> to include Ada last time. It turns out that the patch causes a dg-scan >>>> regression in gnat.dg/vect17.adb, because we now think that if the >>>> array RECORD_TYPEs *do* alias in: >>>> >>>> procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is >>>> begin >>>> for I in Sarray'Range loop >>>> R(I) := X(I) + Y(I); >>>> end loop; >>>> end; >>>> >>>> then the dependence distance must be zero. Eric, does that hold true >>>> for Ada? I.e. if X and R (or Y and R) alias, must it be the case that >>>> X(I) can only alias R(I) and not for example R(I-1) or R(I+1)? >>> >>> Yes, I'd think so (even without the artificial RECORD_TYPE around the arrays). >> >> Good! >> >>>> 2017-06-07 Richard Sandiford <richard.sandiford@linaro.org> >>>> >>>> gcc/testsuite/ >>>> * gnat.dg/vect17.ads (Sarray): Increase range to 1 .. 5. >>>> * gnat.dg/vect17.adb (Add): Create a dependence distance of 1 >>>> when X = R or Y = R. >>> >>> I think that you need to modify vect15 and vect16 the same way. >> >> Ah, yeah. And doing that shows that I'd not handled safelen for >> DDR_COULD_BE_INDEPENDENT_P. I've fixed that locally. >> >> How does this look? Tested on x86_64-linux-gnu both without the >> vectoriser changes and with the fixed vectoriser patch. > > Here's a version of the patch that handles safelen. I split the > handling out into a new function (vect_analyze_possibly_independent_ddr) > since it was getting too big to do inline. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > > Thanks, > Richard > > > 2017-07-27 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > * tree-data-ref.h (subscript): Add access_fn field. > (data_dependence_relation): Add could_be_independent_p. > (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. > (same_access_functions): Move to tree-data-ref.c. > * tree-data-ref.c (ref_contains_union_access_p): New function. > (access_fn_component_p): Likewise. > (access_fn_components_comparable_p): Likewise. > (dr_analyze_indices): Add a reference to access_fn_component_p. > (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of > DR_ACCESS_FN. > (constant_access_functions): Likewise. > (add_other_self_distances): Likewise. > (same_access_functions): Likewise. (Moved from tree-data-ref.h.) > (initialize_data_dependence_relation): Use XCNEW and remove > explicit zeroing of DDR_REVERSED_P. Look for a subsequence > of access functions that have the same type. Allow the > subsequence to end with different bases in some circumstances. > Record the chosen access functions in SUB_ACCESS_FN. > (build_classic_dist_vector_1): Replace ddr_a and ddr_b with > a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. > (subscript_dependence_tester_1): Likewise dra and drb. > (build_classic_dist_vector): Update calls accordingly. > (subscript_dependence_tester): Likewise. > * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check > DDR_COULD_BE_INDEPENDENT_P. > * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test > comp_alias_ddrs instead of may_alias_ddrs. > * tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr): > New function. > (vect_analyze_data_ref_dependence): Use it if > DDR_COULD_BE_INDEPENDENT_P, but fall back to using the recorded > distance vectors if that fails. > (dependence_distance_ge_vf): New function. > (vect_prune_runtime_alias_test_list): Use it. Don't clear > LOOP_VINFO_MAY_ALIAS_DDRS. > > gcc/testsuite/ > * gcc.dg/vect/vect-alias-check-3.c: New test. > * gcc.dg/vect/vect-alias-check-4.c: Likewise. > * gcc.dg/vect/vect-alias-check-5.c: Likewise. > > Index: gcc/tree-data-ref.h > =================================================================== > --- gcc/tree-data-ref.h 2017-07-27 13:10:29.620045506 +0100 > +++ gcc/tree-data-ref.h 2017-07-27 13:10:33.023912613 +0100 > @@ -260,6 +260,9 @@ struct conflict_function > > struct subscript > { > + /* The access functions of the two references. */ > + tree access_fn[2]; > + > /* A description of the iterations for which the elements are > accessed twice. */ > conflict_function *conflicting_iterations_in_a; > @@ -278,6 +281,7 @@ struct subscript > > typedef struct subscript *subscript_p; > > +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] > #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a > #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b > #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict > @@ -333,6 +337,33 @@ struct data_dependence_relation > /* Set to true when the dependence relation is on the same data > access. */ > bool self_reference_p; > + > + /* True if the dependence described is conservatively correct rather > + than exact, and if it is still possible for the accesses to be > + conditionally independent. For example, the a and b references in: > + > + struct s *a, *b; > + for (int i = 0; i < n; ++i) > + a->f[i] += b->f[i]; > + > + conservatively have a distance vector of (0), for the case in which > + a == b, but the accesses are independent if a != b. Similarly, > + the a and b references in: > + > + struct s *a, *b; > + for (int i = 0; i < n; ++i) > + a[0].f[i] += b[i].f[i]; > + > + conservatively have a distance vector of (0), but they are indepenent > + when a != b + i. In contrast, the references in: > + > + struct s *a; > + for (int i = 0; i < n; ++i) > + a->f[i] += a->f[i]; > + > + have the same distance vector of (0), but the accesses can never be > + independent. */ > + bool could_be_independent_p; > }; > > typedef struct data_dependence_relation *ddr_p; > @@ -363,6 +394,7 @@ #define DDR_DIR_VECT(DDR, I) \ > #define DDR_DIST_VECT(DDR, I) \ > DDR_DIST_VECTS (DDR)[I] > #define DDR_REVERSED_P(DDR) (DDR)->reversed_p > +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p > > > bool dr_analyze_innermost (innermost_loop_behavior *, tree, struct loop *); > @@ -457,22 +489,6 @@ same_data_refs (data_reference_p a, data > return false; > > return true; > -} > - > -/* Return true when the DDR contains two data references that have the > - same access functions. */ > - > -static inline bool > -same_access_functions (const struct data_dependence_relation *ddr) > -{ > - unsigned i; > - > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), > - DR_ACCESS_FN (DDR_B (ddr), i))) > - return false; > - > - return true; > } > > /* Returns true when all the dependences are computable. */ > Index: gcc/tree-data-ref.c > =================================================================== > --- gcc/tree-data-ref.c 2017-07-27 13:10:29.620045506 +0100 > +++ gcc/tree-data-ref.c 2017-07-27 13:10:33.023912613 +0100 > @@ -124,8 +124,7 @@ Software Foundation; either version 3, o > } dependence_stats; > > static bool subscript_dependence_tester_1 (struct data_dependence_relation *, > - struct data_reference *, > - struct data_reference *, > + unsigned int, unsigned int, > struct loop *); > /* Returns true iff A divides B. */ > > @@ -145,6 +144,21 @@ int_divides_p (int a, int b) > return ((b % a) == 0); > } > > +/* Return true if reference REF contains a union access. */ > + > +static bool > +ref_contains_union_access_p (tree ref) > +{ > + while (handled_component_p (ref)) > + { > + ref = TREE_OPERAND (ref, 0); > + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE > + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) > + return true; > + } > + return false; > +} > + > > > /* Dump into FILE all the data references from DATAREFS. */ > @@ -434,13 +448,14 @@ dump_data_dependence_relation (FILE *out > unsigned int i; > struct loop *loopi; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > + subscript *sub; > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > { > fprintf (outf, " access_fn_A: "); > - print_generic_stmt (outf, DR_ACCESS_FN (dra, i)); > + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0)); > fprintf (outf, " access_fn_B: "); > - print_generic_stmt (outf, DR_ACCESS_FN (drb, i)); > - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); > + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1)); > + dump_subscript (outf, sub); > } > > fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); > @@ -920,6 +935,27 @@ dr_analyze_innermost (innermost_loop_beh > return true; > } > > +/* Return true if OP is a valid component reference for a DR access > + function. This accepts a subset of what handled_component_p accepts. */ > + > +static bool > +access_fn_component_p (tree op) > +{ > + switch (TREE_CODE (op)) > + { > + case REALPART_EXPR: > + case IMAGPART_EXPR: > + case ARRAY_REF: > + return true; > + > + case COMPONENT_REF: > + return TREE_CODE (TREE_TYPE (TREE_OPERAND (op, 0))) == RECORD_TYPE; > + > + default: > + return false; > + } > +} > + > /* Determines the base object and the list of indices of memory reference > DR, analyzed in LOOP and instantiated in loop nest NEST. */ > > @@ -957,7 +993,9 @@ dr_analyze_indices (struct data_referenc > access_fns.safe_push (integer_one_node); > } > > - /* Analyze access functions of dimensions we know to be independent. */ > + /* Analyze access functions of dimensions we know to be independent. > + The list of component references handled here should be kept in > + sync with access_fn_component_p. */ > while (handled_component_p (ref)) > { > if (TREE_CODE (ref) == ARRAY_REF) > @@ -2148,6 +2186,38 @@ dr_may_alias_p (const struct data_refere > return refs_may_alias_p (addr_a, addr_b); > } > > +/* REF_A and REF_B both satisfy access_fn_component_p. Return true > + if it is meaningful to compare their associated access functions > + when checking for dependencies. */ > + > +static bool > +access_fn_components_comparable_p (tree ref_a, tree ref_b) > +{ > + /* Allow pairs of component refs from the following sets: > + > + { REALPART_EXPR, IMAGPART_EXPR } > + { COMPONENT_REF } > + { ARRAY_REF }. */ > + tree_code code_a = TREE_CODE (ref_a); > + tree_code code_b = TREE_CODE (ref_b); > + if (code_a == IMAGPART_EXPR) > + code_a = REALPART_EXPR; > + if (code_b == IMAGPART_EXPR) > + code_b = REALPART_EXPR; > + if (code_a != code_b) > + return false; > + > + if (TREE_CODE (ref_a) == COMPONENT_REF) > + /* ??? We cannot simply use the type of operand #0 of the refs here as > + the Fortran compiler smuggles type punning into COMPONENT_REFs. > + Use the DECL_CONTEXT of the FIELD_DECLs instead. */ > + return (DECL_CONTEXT (TREE_OPERAND (ref_a, 1)) > + == DECL_CONTEXT (TREE_OPERAND (ref_b, 1))); > + > + return types_compatible_p (TREE_TYPE (TREE_OPERAND (ref_a, 0)), > + TREE_TYPE (TREE_OPERAND (ref_b, 0))); > +} > + > /* Initialize a data dependence relation between data accesses A and > B. NB_LOOPS is the number of loops surrounding the references: the > size of the classic distance/direction vectors. */ > @@ -2160,11 +2230,10 @@ initialize_data_dependence_relation (str > struct data_dependence_relation *res; > unsigned int i; > > - res = XNEW (struct data_dependence_relation); > + res = XCNEW (struct data_dependence_relation); > DDR_A (res) = a; > DDR_B (res) = b; > DDR_LOOP_NEST (res).create (0); > - DDR_REVERSED_P (res) = false; > DDR_SUBSCRIPTS (res).create (0); > DDR_DIR_VECTS (res).create (0); > DDR_DIST_VECTS (res).create (0); > @@ -2182,82 +2251,277 @@ initialize_data_dependence_relation (str > return res; > } > > - /* The case where the references are exactly the same. */ > - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) > + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); > + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); > + if (num_dimensions_a == 0 || num_dimensions_b == 0) > { > - if ((loop_nest.exists () > - && !object_address_invariant_in_loop_p (loop_nest[0], > - DR_BASE_OBJECT (a))) > - || DR_NUM_DIMENSIONS (a) == 0) > + DDR_ARE_DEPENDENT (res) = chrec_dont_know; > + return res; > + } > + > + /* For unconstrained bases, the root (highest-indexed) subscript > + describes a variation in the base of the original DR_REF rather > + than a component access. We have no type that accurately describes > + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* > + applying this subscript) so limit the search to the last real > + component access. > + > + E.g. for: > + > + void > + f (int a[][8], int b[][8]) > + { > + for (int i = 0; i < 8; ++i) > + a[i * 2][0] = b[i][0]; > + } > + > + the a and b accesses have a single ARRAY_REF component reference [0] > + but have two subscripts. */ > + if (DR_UNCONSTRAINED_BASE (a)) > + num_dimensions_a -= 1; > + if (DR_UNCONSTRAINED_BASE (b)) > + num_dimensions_b -= 1; > + > + /* These structures describe sequences of component references in > + DR_REF (A) and DR_REF (B). Each component reference is tied to a > + specific access function. */ > + struct { > + /* The sequence starts at DR_ACCESS_FN (A, START_A) of A and > + DR_ACCESS_FN (B, START_B) of B (inclusive) and extends to higher > + indices. In C notation, these are the indices of the rightmost > + component references; e.g. for a sequence .b.c.d, the start > + index is for .d. */ > + unsigned int start_a; > + unsigned int start_b; > + > + /* The sequence contains LENGTH consecutive access functions from > + each DR. */ > + unsigned int length; > + > + /* The enclosing objects for the A and B sequences respectively, > + i.e. the objects to which DR_ACCESS_FN (A, START_A + LENGTH - 1) > + and DR_ACCESS_FN (B, START_B + LENGTH - 1) are applied. */ > + tree object_a; > + tree object_b; > + } full_seq = {}, struct_seq = {}; > + > + /* Before each iteration of the loop: > + > + - REF_A is what you get after applying DR_ACCESS_FN (A, INDEX_A) and > + - REF_B is what you get after applying DR_ACCESS_FN (B, INDEX_B). */ > + unsigned int index_a = 0; > + unsigned int index_b = 0; > + tree ref_a = DR_REF (a); > + tree ref_b = DR_REF (b); > + > + /* Now walk the component references from the final DR_REFs back up to > + the enclosing base objects. Each component reference corresponds > + to one access function in the DR, with access function 0 being for > + the final DR_REF and the highest-indexed access function being the > + one that is applied to the base of the DR. > + > + Look for a sequence of component references whose access functions > + are comparable (see access_fn_components_comparable_p). If more > + than one such sequence exists, pick the one nearest the base > + (which is the leftmost sequence in C notation). Store this sequence > + in FULL_SEQ. > + > + For example, if we have: > + > + struct foo { struct bar s; ... } (*a)[10], (*b)[10]; > + > + A: a[0][i].s.c.d > + B: __real b[0][i].s.e[i].f > + > + (where d is the same type as the real component of f) then the access > + functions would be: > + > + 0 1 2 3 > + A: .d .c .s [i] > + > + 0 1 2 3 4 5 > + B: __real .f [i] .e .s [i] > + > + The A0/B2 column isn't comparable, since .d is a COMPONENT_REF > + and [i] is an ARRAY_REF. However, the A1/B3 column contains two > + COMPONENT_REF accesses for struct bar, so is comparable. Likewise > + the A2/B4 column contains two COMPONENT_REF accesses for struct foo, > + so is comparable. The A3/B5 column contains two ARRAY_REFs that > + index foo[10] arrays, so is again comparable. The sequence is > + therefore: > + > + A: [1, 3] (i.e. [i].s.c) > + B: [3, 5] (i.e. [i].s.e) > + > + Also look for sequences of component references whose access > + functions are comparable and whose enclosing objects have the same > + RECORD_TYPE. Store this sequence in STRUCT_SEQ. In the above > + example, STRUCT_SEQ would be: > + > + A: [1, 2] (i.e. s.c) > + B: [3, 4] (i.e. s.e) */ > + while (index_a < num_dimensions_a && index_b < num_dimensions_b) > + { > + /* REF_A and REF_B must be one of the component access types > + allowed by dr_analyze_indices. */ > + gcc_checking_assert (access_fn_component_p (ref_a)); > + gcc_checking_assert (access_fn_component_p (ref_b)); > + > + /* Get the immediately-enclosing objects for REF_A and REF_B, > + i.e. the references *before* applying DR_ACCESS_FN (A, INDEX_A) > + and DR_ACCESS_FN (B, INDEX_B). */ > + tree object_a = TREE_OPERAND (ref_a, 0); > + tree object_b = TREE_OPERAND (ref_b, 0); > + > + tree type_a = TREE_TYPE (object_a); > + tree type_b = TREE_TYPE (object_b); > + if (access_fn_components_comparable_p (ref_a, ref_b)) > + { > + /* This pair of component accesses is comparable for dependence > + analysis, so we can include DR_ACCESS_FN (A, INDEX_A) and > + DR_ACCESS_FN (B, INDEX_B) in the sequence. */ > + if (full_seq.start_a + full_seq.length != index_a > + || full_seq.start_b + full_seq.length != index_b) > + { > + /* The accesses don't extend the current sequence, > + so start a new one here. */ > + full_seq.start_a = index_a; > + full_seq.start_b = index_b; > + full_seq.length = 0; > + } > + > + /* Add this pair of references to the sequence. */ > + full_seq.length += 1; > + full_seq.object_a = object_a; > + full_seq.object_b = object_b; > + > + /* If the enclosing objects are structures (and thus have the > + same RECORD_TYPE), record the new sequence in STRUCT_SEQ. */ > + if (TREE_CODE (type_a) == RECORD_TYPE) > + struct_seq = full_seq; > + > + /* Move to the next containing reference for both A and B. */ > + ref_a = object_a; > + ref_b = object_b; > + index_a += 1; > + index_b += 1; > + continue; > + } > + > + /* Try to approach equal type sizes. */ > + if (!COMPLETE_TYPE_P (type_a) > + || !COMPLETE_TYPE_P (type_b) > + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) > + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) > + break; > + > + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); > + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); > + if (size_a <= size_b) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + index_a += 1; > + ref_a = object_a; > + } > + if (size_b <= size_a) > + { > + index_b += 1; > + ref_b = object_b; > } > - DDR_AFFINE_P (res) = true; > - DDR_ARE_DEPENDENT (res) = NULL_TREE; > - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); > - DDR_LOOP_NEST (res) = loop_nest; > - DDR_INNER_LOOP (res) = 0; > - DDR_SELF_REFERENCE (res) = true; > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > - { > - struct subscript *subscript; > - > - subscript = XNEW (struct subscript); > - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); > - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); > - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; > - SUB_DISTANCE (subscript) = chrec_dont_know; > - DDR_SUBSCRIPTS (res).safe_push (subscript); > - } > - return res; > } > > - /* If the references do not access the same object, we do not know > - whether they alias or not. We do not care about TBAA or alignment > - info so we can use OEP_ADDRESS_OF to avoid false negatives. > - But the accesses have to use compatible types as otherwise the > - built indices would not match. */ > - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF) > - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), > - TREE_TYPE (DR_BASE_OBJECT (b)))) > + /* See whether FULL_SEQ ends at the base and whether the two bases > + are equal. We do not care about TBAA or alignment info so we can > + use OEP_ADDRESS_OF to avoid false negatives. */ > + tree base_a = DR_BASE_OBJECT (a); > + tree base_b = DR_BASE_OBJECT (b); > + bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a > + && full_seq.start_b + full_seq.length == num_dimensions_b > + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) > + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) > + && types_compatible_p (TREE_TYPE (base_a), > + TREE_TYPE (base_b)) > + && (!loop_nest.exists () > + || (object_address_invariant_in_loop_p > + (loop_nest[0], base_a)))); > + > + /* If the bases are the same, we can include the base variation too. > + E.g. the b accesses in: > + > + for (int i = 0; i < n; ++i) > + b[i + 4][0] = b[i][0]; > + > + have a definite dependence distance of 4, while for: > + > + for (int i = 0; i < n; ++i) > + a[i + 4][0] = b[i][0]; > + > + the dependence distance depends on the gap between a and b. > + > + If the bases are different then we can only rely on the sequence > + rooted at a structure access, since arrays are allowed to overlap > + arbitrarily and change shape arbitrarily. E.g. we treat this as > + valid code: > + > + int a[256]; > + ... > + ((int (*)[4][3]) &a[1])[i][0] += ((int (*)[4][3]) &a[2])[i][0]; > + > + where two lvalues with the same int[4][3] type overlap, and where > + both lvalues are distinct from the object's declared type. */ > + if (same_base_p) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + if (DR_UNCONSTRAINED_BASE (a)) > + full_seq.length += 1; > } > + else > + full_seq = struct_seq; > > - /* If the base of the object is not invariant in the loop nest, we cannot > - analyze it. TODO -- in fact, it would suffice to record that there may > - be arbitrary dependences in the loops where the base object varies. */ > - if ((loop_nest.exists () > - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a))) > - || DR_NUM_DIMENSIONS (a) == 0) > + /* Punt if we didn't find a suitable sequence. */ > + if (full_seq.length == 0) > { > DDR_ARE_DEPENDENT (res) = chrec_dont_know; > return res; > } > > - /* If the number of dimensions of the access to not agree we can have > - a pointer access to a component of the array element type and an > - array access while the base-objects are still the same. Punt. */ > - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) > + if (!same_base_p) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + /* Partial overlap is possible for different bases when strict aliasing > + is not in effect. It's also possible if either base involves a union > + access; e.g. for: > + > + struct s1 { int a[2]; }; > + struct s2 { struct s1 b; int c; }; > + struct s3 { int d; struct s1 e; }; > + union u { struct s2 f; struct s3 g; } *p, *q; > + > + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at > + "p->g.e" (base "p->g") and might partially overlap the s1 at > + "q->g.e" (base "q->g"). */ > + if (!flag_strict_aliasing > + || ref_contains_union_access_p (full_seq.object_a) > + || ref_contains_union_access_p (full_seq.object_b)) > + { > + DDR_ARE_DEPENDENT (res) = chrec_dont_know; > + return res; > + } > + > + DDR_COULD_BE_INDEPENDENT_P (res) = true; > } > > DDR_AFFINE_P (res) = true; > DDR_ARE_DEPENDENT (res) = NULL_TREE; > - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); > + DDR_SUBSCRIPTS (res).create (full_seq.length); > DDR_LOOP_NEST (res) = loop_nest; > DDR_INNER_LOOP (res) = 0; > DDR_SELF_REFERENCE (res) = false; > > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > + for (i = 0; i < full_seq.length; ++i) > { > struct subscript *subscript; > > subscript = XNEW (struct subscript); > + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i); > + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i); > SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); > SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); > SUB_LAST_CONFLICT (subscript) = chrec_dont_know; > @@ -3839,14 +4103,15 @@ add_outer_distances (struct data_depende > } > > /* Return false when fail to represent the data dependence as a > - distance vector. INIT_B is set to true when a component has been > + distance vector. A_INDEX is the index of the first reference > + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the > + second reference. INIT_B is set to true when a component has been > added to the distance vector DIST_V. INDEX_CARRY is then set to > the index in DIST_V that carries the dependence. */ > > static bool > build_classic_dist_vector_1 (struct data_dependence_relation *ddr, > - struct data_reference *ddr_a, > - struct data_reference *ddr_b, > + unsigned int a_index, unsigned int b_index, > lambda_vector dist_v, bool *init_b, > int *index_carry) > { > @@ -3864,8 +4129,8 @@ build_classic_dist_vector_1 (struct data > return false; > } > > - access_fn_a = DR_ACCESS_FN (ddr_a, i); > - access_fn_b = DR_ACCESS_FN (ddr_b, i); > + access_fn_a = SUB_ACCESS_FN (subscript, a_index); > + access_fn_b = SUB_ACCESS_FN (subscript, b_index); > > if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC > && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) > @@ -3925,10 +4190,11 @@ build_classic_dist_vector_1 (struct data > constant_access_functions (const struct data_dependence_relation *ddr) > { > unsigned i; > + subscript *sub; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) > - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) > + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) > return false; > > return true; > @@ -3991,10 +4257,11 @@ add_other_self_distances (struct data_de > lambda_vector dist_v; > unsigned i; > int index_carry = DDR_NB_LOOPS (ddr); > + subscript *sub; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > { > - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); > + tree access_fun = SUB_ACCESS_FN (sub, 0); > > if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) > { > @@ -4006,7 +4273,7 @@ add_other_self_distances (struct data_de > return; > } > > - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); > + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); > > if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) > add_multivariate_self_dist (ddr, access_fun); > @@ -4077,6 +4344,23 @@ add_distance_for_zero_overlaps (struct d > } > } > > +/* Return true when the DDR contains two data references that have the > + same access functions. */ > + > +static inline bool > +same_access_functions (const struct data_dependence_relation *ddr) > +{ > + unsigned i; > + subscript *sub; > + > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), > + SUB_ACCESS_FN (sub, 1))) > + return false; > + > + return true; > +} > + > /* Compute the classic per loop distance vector. DDR is the data > dependence relation to build a vector from. Return false when fail > to represent the data dependence as a distance vector. */ > @@ -4108,8 +4392,7 @@ build_classic_dist_vector (struct data_d > } > > dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), > - dist_v, &init_b, &index_carry)) > + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry)) > return false; > > /* Save the distance vector if we initialized one. */ > @@ -4142,12 +4425,11 @@ build_classic_dist_vector (struct data_d > if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) > { > lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - loop_nest)) > + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) > return false; > compute_subscript_distance (ddr); > - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - save_v, &init_b, &index_carry)) > + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, > + &index_carry)) > return false; > save_dist_v (ddr, save_v); > DDR_REVERSED_P (ddr) = true; > @@ -4183,12 +4465,10 @@ build_classic_dist_vector (struct data_d > { > lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > > - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), > - DDR_A (ddr), loop_nest)) > + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) > return false; > compute_subscript_distance (ddr); > - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - opposite_v, &init_b, > + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, > &index_carry)) > return false; > > @@ -4267,13 +4547,13 @@ build_classic_dir_vector (struct data_de > } > } > > -/* Helper function. Returns true when there is a dependence between > - data references DRA and DRB. */ > +/* Helper function. Returns true when there is a dependence between the > + data references. A_INDEX is the index of the first reference (0 for > + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ > > static bool > subscript_dependence_tester_1 (struct data_dependence_relation *ddr, > - struct data_reference *dra, > - struct data_reference *drb, > + unsigned int a_index, unsigned int b_index, > struct loop *loop_nest) > { > unsigned int i; > @@ -4285,8 +4565,8 @@ subscript_dependence_tester_1 (struct da > { > conflict_function *overlaps_a, *overlaps_b; > > - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), > - DR_ACCESS_FN (drb, i), > + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), > + SUB_ACCESS_FN (subscript, b_index), > &overlaps_a, &overlaps_b, > &last_conflicts, loop_nest); > > @@ -4335,7 +4615,7 @@ subscript_dependence_tester_1 (struct da > subscript_dependence_tester (struct data_dependence_relation *ddr, > struct loop *loop_nest) > { > - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest)) > + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) > dependence_stats.num_dependence_dependent++; > > compute_subscript_distance (ddr); > Index: gcc/tree-ssa-loop-prefetch.c > =================================================================== > --- gcc/tree-ssa-loop-prefetch.c 2017-07-27 13:10:29.620045506 +0100 > +++ gcc/tree-ssa-loop-prefetch.c 2017-07-27 13:10:33.023912613 +0100 > @@ -1668,6 +1668,7 @@ determine_loop_nest_reuse (struct loop * > refb = (struct mem_ref *) DDR_B (dep)->aux; > > if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know > + || DDR_COULD_BE_INDEPENDENT_P (dep) > || DDR_NUM_DIST_VECTS (dep) == 0) > { > /* If the dependence cannot be analyzed, assume that there might be > Index: gcc/tree-vectorizer.h > =================================================================== > --- gcc/tree-vectorizer.h 2017-07-27 13:10:29.620045506 +0100 > +++ gcc/tree-vectorizer.h 2017-07-27 13:10:33.024912868 +0100 > @@ -358,7 +358,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) > #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ > ((L)->may_misalign_stmts.length () > 0) > #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ > - ((L)->may_alias_ddrs.length () > 0) > + ((L)->comp_alias_ddrs.length () > 0) > #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ > (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) > #define LOOP_REQUIRES_VERSIONING(L) \ > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2017-07-27 13:10:29.620045506 +0100 > +++ gcc/tree-vect-data-refs.c 2017-07-27 13:10:33.024912868 +0100 > @@ -160,6 +160,60 @@ vect_mark_for_runtime_alias_test (ddr_p > } > > > +/* A subroutine of vect_analyze_data_ref_dependence. Handle > + DDR_COULD_BE_INDEPENDENT_P ddr DDR that has a known set of dependence > + distances. These distances are conservatively correct but they don't > + reflect a guaranteed dependence. > + > + Return true if this function does all the work necessary to avoid > + an alias or false if the caller should use the dependence distances > + to limit the vectorization factor in the usual way. LOOP_DEPTH is > + the depth of the loop described by LOOP_VINFO and the other arguments > + are as for vect_analyze_data_ref_dependence. */ > + > +static bool > +vect_analyze_possibly_independent_ddr (data_dependence_relation *ddr, > + loop_vec_info loop_vinfo, > + int loop_depth, int *max_vf) > +{ > + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > + lambda_vector dist_v; > + unsigned int i; > + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > + { > + int dist = dist_v[loop_depth]; > + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) > + { > + /* If the user asserted safelen >= DIST consecutive iterations > + can be executed concurrently, assume independence. > + > + ??? An alternative would be to add the alias check even > + in this case, and vectorize the fallback loop with the > + maximum VF set to safelen. However, if the user has > + explicitly given a length, it's less likely that that > + would be a win. */ > + if (loop->safelen >= 2 && abs_hwi (dist) <= loop->safelen) > + { > + if (loop->safelen < *max_vf) > + *max_vf = loop->safelen; > + LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo) = false; > + continue; > + } > + > + /* For dependence distances of 2 or more, we have the option > + of limiting VF or checking for an alias at runtime. > + Prefer to check at runtime if we can, to avoid limiting > + the VF unnecessarily when the bases are in fact independent. > + > + Note that the alias checks will be removed if the VF ends up > + being small enough. */ > + return vect_mark_for_runtime_alias_test (ddr, loop_vinfo); > + } > + } > + return true; > +} > + > + > /* Function vect_analyze_data_ref_dependence. > > Return TRUE if there (might) exist a dependence between a memory-reference > @@ -305,6 +359,12 @@ vect_analyze_data_ref_dependence (struct > } > > loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); > + > + if (DDR_COULD_BE_INDEPENDENT_P (ddr) > + && vect_analyze_possibly_independent_ddr (ddr, loop_vinfo, > + loop_depth, max_vf)) > + return false; > + > FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > { > int dist = dist_v[loop_depth]; > @@ -2878,6 +2938,44 @@ vect_no_alias_p (struct data_reference * > return false; > } > > +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH > + in DDR is >= VF. */ > + > +static bool > +dependence_distance_ge_vf (data_dependence_relation *ddr, > + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) > +{ > + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE > + || DDR_NUM_DIST_VECTS (ddr) == 0) > + return false; > + > + /* If the dependence is exact, we should have limited the VF instead. */ > + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); > + > + unsigned int i; > + lambda_vector dist_v; > + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > + { > + HOST_WIDE_INT dist = dist_v[loop_depth]; > + if (dist != 0 > + && !(dist > 0 && DDR_REVERSED_P (ddr)) > + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) > + return false; > + } > + > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, > + "dependence distance between "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); > + dump_printf (MSG_NOTE, " and "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); > + dump_printf (MSG_NOTE, " is >= VF\n"); > + } > + > + return true; > +} > + > /* Function vect_prune_runtime_alias_test_list. > > Prune a list of ddrs to be tested at run-time by versioning for alias. > @@ -2908,6 +3006,10 @@ vect_prune_runtime_alias_test_list (loop > > comp_alias_ddrs.create (may_alias_ddrs.length ()); > > + unsigned int loop_depth > + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, > + LOOP_VINFO_LOOP_NEST (loop_vinfo)); > + > /* First, we collect all data ref pairs for aliasing checks. */ > FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) > { > @@ -2917,6 +3019,11 @@ vect_prune_runtime_alias_test_list (loop > tree segment_length_a, segment_length_b; > gimple *stmt_a, *stmt_b; > > + /* Ignore the alias if the VF we chose ended up being no greater > + than the dependence distance. */ > + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) > + continue; > + > dr_a = DDR_A (ddr); > stmt_a = DR_STMT (DDR_A (ddr)); > dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); > @@ -2993,10 +3100,6 @@ vect_prune_runtime_alias_test_list (loop > return false; > } > > - /* All alias checks have been resolved at compilation time. */ > - if (!comp_alias_ddrs.length ()) > - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); > - > return true; > } > > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c > =================================================================== > --- /dev/null 2017-07-27 10:25:31.671280760 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-07-27 13:10:33.022912357 +0100 > @@ -0,0 +1,120 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0 -fopenmp-simd" } */ > + > +/* Intended to be larger than any VF. */ > +#define GAP 128 > +#define N (GAP * 3) > + > +struct s { int x[N + 1]; }; > +struct t { struct s x[N + 1]; }; > +struct u { int x[N + 1]; int y; }; > +struct v { struct s s; }; > + > +void > +f1 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i]; > +} > + > +void > +f2 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[2].x[i]; > +} > + > +void > +f3 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[i].x[i]; > +} > + > +void > +f4 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[i].x[i] += b[i].x[i]; > +} > + > +void > +f5 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i + 1]; > +} > + > +void > +f6 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[2].x[i + 1]; > +} > + > +void > +f7 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[i].x[i + 1]; > +} > + > +void > +f8 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[i].x[i] += b[i].x[i + 1]; > +} > + > +void > +f9 (struct s *a, struct t *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[1].x[i]; > +} > + > +void > +f10 (struct s *a, struct t *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i].x[i]; > +} > + > +void > +f11 (struct u *a, struct u *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i] + b[i].y; > +} > + > +void > +f12 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +void > +f13 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP * 2; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +void > +f14 (struct v *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->s.x[i] = b->x[i]; > +} > + > +void > +f15 (struct s *a, struct s *b) > +{ > + #pragma omp simd safelen(N) > + for (int i = 0; i < N; ++i) > + a->x[i + 1] += b->x[i]; > +} > + > +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 15 "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c > =================================================================== > --- /dev/null 2017-07-27 10:25:31.671280760 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-07-27 13:10:33.022912357 +0100 > @@ -0,0 +1,35 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ > + > +#define N 16 > + > +struct s1 { int a[N]; }; > +struct s2 { struct s1 b; int c; }; > +struct s3 { int d; struct s1 e; }; > +union u { struct s2 f; struct s3 g; }; > + > +/* We allow a and b to overlap arbitrarily. */ > + > +void > +f1 (int a[][N], int b[][N]) > +{ > + for (int i = 0; i < N; ++i) > + a[0][i] += b[0][i]; > +} > + > +void > +f2 (union u *a, union u *b) > +{ > + for (int i = 0; i < N; ++i) > + a->f.b.a[i] += b->g.e.a[i]; > +} > + > +void > +f3 (struct s1 *a, struct s1 *b) > +{ > + for (int i = 0; i < N - 1; ++i) > + a->a[i + 1] += b->a[i]; > +} > + > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c > =================================================================== > --- /dev/null 2017-07-27 10:25:31.671280760 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-07-27 13:10:33.022912357 +0100 > @@ -0,0 +1,19 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > + > +/* Intended to be larger than any VF. */ > +#define GAP 128 > +#define N (GAP * 3) > + > +struct s { int x[N]; }; > + > +void > +f1 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP * 2; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +/* { dg-final { scan-tree-dump-times "consider run-time aliasing" 1 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */
On Thu, Jul 27, 2017 at 2:19 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Sandiford <richard.sandiford@linaro.org> writes: >> Eric Botcazou <ebotcazou@adacore.com> writes: >>> [Sorry for missing the previous messages] >>> >>>> Thanks. Just been retesting, and I think I must have forgotten >>>> to include Ada last time. It turns out that the patch causes a dg-scan >>>> regression in gnat.dg/vect17.adb, because we now think that if the >>>> array RECORD_TYPEs *do* alias in: >>>> >>>> procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is >>>> begin >>>> for I in Sarray'Range loop >>>> R(I) := X(I) + Y(I); >>>> end loop; >>>> end; >>>> >>>> then the dependence distance must be zero. Eric, does that hold true >>>> for Ada? I.e. if X and R (or Y and R) alias, must it be the case that >>>> X(I) can only alias R(I) and not for example R(I-1) or R(I+1)? >>> >>> Yes, I'd think so (even without the artificial RECORD_TYPE around the arrays). >> >> Good! >> >>>> 2017-06-07 Richard Sandiford <richard.sandiford@linaro.org> >>>> >>>> gcc/testsuite/ >>>> * gnat.dg/vect17.ads (Sarray): Increase range to 1 .. 5. >>>> * gnat.dg/vect17.adb (Add): Create a dependence distance of 1 >>>> when X = R or Y = R. >>> >>> I think that you need to modify vect15 and vect16 the same way. >> >> Ah, yeah. And doing that shows that I'd not handled safelen for >> DDR_COULD_BE_INDEPENDENT_P. I've fixed that locally. >> >> How does this look? Tested on x86_64-linux-gnu both without the >> vectoriser changes and with the fixed vectoriser patch. > > Here's a version of the patch that handles safelen. I split the > handling out into a new function (vect_analyze_possibly_independent_ddr) > since it was getting too big to do inline. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Ok. Did you check whether BB vectorization is affected? See vect_slp_analyze_instance_dependence and friends. It's quite conservative but given the prefetching change I wonder if we need to rule out DDR_COULD_BE_INDEPENDENT_P? Thanks, Richard. > Thanks, > Richard > > > 2017-07-27 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > * tree-data-ref.h (subscript): Add access_fn field. > (data_dependence_relation): Add could_be_independent_p. > (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. > (same_access_functions): Move to tree-data-ref.c. > * tree-data-ref.c (ref_contains_union_access_p): New function. > (access_fn_component_p): Likewise. > (access_fn_components_comparable_p): Likewise. > (dr_analyze_indices): Add a reference to access_fn_component_p. > (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of > DR_ACCESS_FN. > (constant_access_functions): Likewise. > (add_other_self_distances): Likewise. > (same_access_functions): Likewise. (Moved from tree-data-ref.h.) > (initialize_data_dependence_relation): Use XCNEW and remove > explicit zeroing of DDR_REVERSED_P. Look for a subsequence > of access functions that have the same type. Allow the > subsequence to end with different bases in some circumstances. > Record the chosen access functions in SUB_ACCESS_FN. > (build_classic_dist_vector_1): Replace ddr_a and ddr_b with > a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. > (subscript_dependence_tester_1): Likewise dra and drb. > (build_classic_dist_vector): Update calls accordingly. > (subscript_dependence_tester): Likewise. > * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check > DDR_COULD_BE_INDEPENDENT_P. > * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test > comp_alias_ddrs instead of may_alias_ddrs. > * tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr): > New function. > (vect_analyze_data_ref_dependence): Use it if > DDR_COULD_BE_INDEPENDENT_P, but fall back to using the recorded > distance vectors if that fails. > (dependence_distance_ge_vf): New function. > (vect_prune_runtime_alias_test_list): Use it. Don't clear > LOOP_VINFO_MAY_ALIAS_DDRS. > > gcc/testsuite/ > * gcc.dg/vect/vect-alias-check-3.c: New test. > * gcc.dg/vect/vect-alias-check-4.c: Likewise. > * gcc.dg/vect/vect-alias-check-5.c: Likewise. > > Index: gcc/tree-data-ref.h > =================================================================== > --- gcc/tree-data-ref.h 2017-07-27 13:10:29.620045506 +0100 > +++ gcc/tree-data-ref.h 2017-07-27 13:10:33.023912613 +0100 > @@ -260,6 +260,9 @@ struct conflict_function > > struct subscript > { > + /* The access functions of the two references. */ > + tree access_fn[2]; > + > /* A description of the iterations for which the elements are > accessed twice. */ > conflict_function *conflicting_iterations_in_a; > @@ -278,6 +281,7 @@ struct subscript > > typedef struct subscript *subscript_p; > > +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] > #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a > #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b > #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict > @@ -333,6 +337,33 @@ struct data_dependence_relation > /* Set to true when the dependence relation is on the same data > access. */ > bool self_reference_p; > + > + /* True if the dependence described is conservatively correct rather > + than exact, and if it is still possible for the accesses to be > + conditionally independent. For example, the a and b references in: > + > + struct s *a, *b; > + for (int i = 0; i < n; ++i) > + a->f[i] += b->f[i]; > + > + conservatively have a distance vector of (0), for the case in which > + a == b, but the accesses are independent if a != b. Similarly, > + the a and b references in: > + > + struct s *a, *b; > + for (int i = 0; i < n; ++i) > + a[0].f[i] += b[i].f[i]; > + > + conservatively have a distance vector of (0), but they are indepenent > + when a != b + i. In contrast, the references in: > + > + struct s *a; > + for (int i = 0; i < n; ++i) > + a->f[i] += a->f[i]; > + > + have the same distance vector of (0), but the accesses can never be > + independent. */ > + bool could_be_independent_p; > }; > > typedef struct data_dependence_relation *ddr_p; > @@ -363,6 +394,7 @@ #define DDR_DIR_VECT(DDR, I) \ > #define DDR_DIST_VECT(DDR, I) \ > DDR_DIST_VECTS (DDR)[I] > #define DDR_REVERSED_P(DDR) (DDR)->reversed_p > +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p > > > bool dr_analyze_innermost (innermost_loop_behavior *, tree, struct loop *); > @@ -457,22 +489,6 @@ same_data_refs (data_reference_p a, data > return false; > > return true; > -} > - > -/* Return true when the DDR contains two data references that have the > - same access functions. */ > - > -static inline bool > -same_access_functions (const struct data_dependence_relation *ddr) > -{ > - unsigned i; > - > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), > - DR_ACCESS_FN (DDR_B (ddr), i))) > - return false; > - > - return true; > } > > /* Returns true when all the dependences are computable. */ > Index: gcc/tree-data-ref.c > =================================================================== > --- gcc/tree-data-ref.c 2017-07-27 13:10:29.620045506 +0100 > +++ gcc/tree-data-ref.c 2017-07-27 13:10:33.023912613 +0100 > @@ -124,8 +124,7 @@ Software Foundation; either version 3, o > } dependence_stats; > > static bool subscript_dependence_tester_1 (struct data_dependence_relation *, > - struct data_reference *, > - struct data_reference *, > + unsigned int, unsigned int, > struct loop *); > /* Returns true iff A divides B. */ > > @@ -145,6 +144,21 @@ int_divides_p (int a, int b) > return ((b % a) == 0); > } > > +/* Return true if reference REF contains a union access. */ > + > +static bool > +ref_contains_union_access_p (tree ref) > +{ > + while (handled_component_p (ref)) > + { > + ref = TREE_OPERAND (ref, 0); > + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE > + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) > + return true; > + } > + return false; > +} > + > > > /* Dump into FILE all the data references from DATAREFS. */ > @@ -434,13 +448,14 @@ dump_data_dependence_relation (FILE *out > unsigned int i; > struct loop *loopi; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > + subscript *sub; > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > { > fprintf (outf, " access_fn_A: "); > - print_generic_stmt (outf, DR_ACCESS_FN (dra, i)); > + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0)); > fprintf (outf, " access_fn_B: "); > - print_generic_stmt (outf, DR_ACCESS_FN (drb, i)); > - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); > + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1)); > + dump_subscript (outf, sub); > } > > fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); > @@ -920,6 +935,27 @@ dr_analyze_innermost (innermost_loop_beh > return true; > } > > +/* Return true if OP is a valid component reference for a DR access > + function. This accepts a subset of what handled_component_p accepts. */ > + > +static bool > +access_fn_component_p (tree op) > +{ > + switch (TREE_CODE (op)) > + { > + case REALPART_EXPR: > + case IMAGPART_EXPR: > + case ARRAY_REF: > + return true; > + > + case COMPONENT_REF: > + return TREE_CODE (TREE_TYPE (TREE_OPERAND (op, 0))) == RECORD_TYPE; > + > + default: > + return false; > + } > +} > + > /* Determines the base object and the list of indices of memory reference > DR, analyzed in LOOP and instantiated in loop nest NEST. */ > > @@ -957,7 +993,9 @@ dr_analyze_indices (struct data_referenc > access_fns.safe_push (integer_one_node); > } > > - /* Analyze access functions of dimensions we know to be independent. */ > + /* Analyze access functions of dimensions we know to be independent. > + The list of component references handled here should be kept in > + sync with access_fn_component_p. */ > while (handled_component_p (ref)) > { > if (TREE_CODE (ref) == ARRAY_REF) > @@ -2148,6 +2186,38 @@ dr_may_alias_p (const struct data_refere > return refs_may_alias_p (addr_a, addr_b); > } > > +/* REF_A and REF_B both satisfy access_fn_component_p. Return true > + if it is meaningful to compare their associated access functions > + when checking for dependencies. */ > + > +static bool > +access_fn_components_comparable_p (tree ref_a, tree ref_b) > +{ > + /* Allow pairs of component refs from the following sets: > + > + { REALPART_EXPR, IMAGPART_EXPR } > + { COMPONENT_REF } > + { ARRAY_REF }. */ > + tree_code code_a = TREE_CODE (ref_a); > + tree_code code_b = TREE_CODE (ref_b); > + if (code_a == IMAGPART_EXPR) > + code_a = REALPART_EXPR; > + if (code_b == IMAGPART_EXPR) > + code_b = REALPART_EXPR; > + if (code_a != code_b) > + return false; > + > + if (TREE_CODE (ref_a) == COMPONENT_REF) > + /* ??? We cannot simply use the type of operand #0 of the refs here as > + the Fortran compiler smuggles type punning into COMPONENT_REFs. > + Use the DECL_CONTEXT of the FIELD_DECLs instead. */ > + return (DECL_CONTEXT (TREE_OPERAND (ref_a, 1)) > + == DECL_CONTEXT (TREE_OPERAND (ref_b, 1))); > + > + return types_compatible_p (TREE_TYPE (TREE_OPERAND (ref_a, 0)), > + TREE_TYPE (TREE_OPERAND (ref_b, 0))); > +} > + > /* Initialize a data dependence relation between data accesses A and > B. NB_LOOPS is the number of loops surrounding the references: the > size of the classic distance/direction vectors. */ > @@ -2160,11 +2230,10 @@ initialize_data_dependence_relation (str > struct data_dependence_relation *res; > unsigned int i; > > - res = XNEW (struct data_dependence_relation); > + res = XCNEW (struct data_dependence_relation); > DDR_A (res) = a; > DDR_B (res) = b; > DDR_LOOP_NEST (res).create (0); > - DDR_REVERSED_P (res) = false; > DDR_SUBSCRIPTS (res).create (0); > DDR_DIR_VECTS (res).create (0); > DDR_DIST_VECTS (res).create (0); > @@ -2182,82 +2251,277 @@ initialize_data_dependence_relation (str > return res; > } > > - /* The case where the references are exactly the same. */ > - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) > + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); > + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); > + if (num_dimensions_a == 0 || num_dimensions_b == 0) > { > - if ((loop_nest.exists () > - && !object_address_invariant_in_loop_p (loop_nest[0], > - DR_BASE_OBJECT (a))) > - || DR_NUM_DIMENSIONS (a) == 0) > + DDR_ARE_DEPENDENT (res) = chrec_dont_know; > + return res; > + } > + > + /* For unconstrained bases, the root (highest-indexed) subscript > + describes a variation in the base of the original DR_REF rather > + than a component access. We have no type that accurately describes > + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* > + applying this subscript) so limit the search to the last real > + component access. > + > + E.g. for: > + > + void > + f (int a[][8], int b[][8]) > + { > + for (int i = 0; i < 8; ++i) > + a[i * 2][0] = b[i][0]; > + } > + > + the a and b accesses have a single ARRAY_REF component reference [0] > + but have two subscripts. */ > + if (DR_UNCONSTRAINED_BASE (a)) > + num_dimensions_a -= 1; > + if (DR_UNCONSTRAINED_BASE (b)) > + num_dimensions_b -= 1; > + > + /* These structures describe sequences of component references in > + DR_REF (A) and DR_REF (B). Each component reference is tied to a > + specific access function. */ > + struct { > + /* The sequence starts at DR_ACCESS_FN (A, START_A) of A and > + DR_ACCESS_FN (B, START_B) of B (inclusive) and extends to higher > + indices. In C notation, these are the indices of the rightmost > + component references; e.g. for a sequence .b.c.d, the start > + index is for .d. */ > + unsigned int start_a; > + unsigned int start_b; > + > + /* The sequence contains LENGTH consecutive access functions from > + each DR. */ > + unsigned int length; > + > + /* The enclosing objects for the A and B sequences respectively, > + i.e. the objects to which DR_ACCESS_FN (A, START_A + LENGTH - 1) > + and DR_ACCESS_FN (B, START_B + LENGTH - 1) are applied. */ > + tree object_a; > + tree object_b; > + } full_seq = {}, struct_seq = {}; > + > + /* Before each iteration of the loop: > + > + - REF_A is what you get after applying DR_ACCESS_FN (A, INDEX_A) and > + - REF_B is what you get after applying DR_ACCESS_FN (B, INDEX_B). */ > + unsigned int index_a = 0; > + unsigned int index_b = 0; > + tree ref_a = DR_REF (a); > + tree ref_b = DR_REF (b); > + > + /* Now walk the component references from the final DR_REFs back up to > + the enclosing base objects. Each component reference corresponds > + to one access function in the DR, with access function 0 being for > + the final DR_REF and the highest-indexed access function being the > + one that is applied to the base of the DR. > + > + Look for a sequence of component references whose access functions > + are comparable (see access_fn_components_comparable_p). If more > + than one such sequence exists, pick the one nearest the base > + (which is the leftmost sequence in C notation). Store this sequence > + in FULL_SEQ. > + > + For example, if we have: > + > + struct foo { struct bar s; ... } (*a)[10], (*b)[10]; > + > + A: a[0][i].s.c.d > + B: __real b[0][i].s.e[i].f > + > + (where d is the same type as the real component of f) then the access > + functions would be: > + > + 0 1 2 3 > + A: .d .c .s [i] > + > + 0 1 2 3 4 5 > + B: __real .f [i] .e .s [i] > + > + The A0/B2 column isn't comparable, since .d is a COMPONENT_REF > + and [i] is an ARRAY_REF. However, the A1/B3 column contains two > + COMPONENT_REF accesses for struct bar, so is comparable. Likewise > + the A2/B4 column contains two COMPONENT_REF accesses for struct foo, > + so is comparable. The A3/B5 column contains two ARRAY_REFs that > + index foo[10] arrays, so is again comparable. The sequence is > + therefore: > + > + A: [1, 3] (i.e. [i].s.c) > + B: [3, 5] (i.e. [i].s.e) > + > + Also look for sequences of component references whose access > + functions are comparable and whose enclosing objects have the same > + RECORD_TYPE. Store this sequence in STRUCT_SEQ. In the above > + example, STRUCT_SEQ would be: > + > + A: [1, 2] (i.e. s.c) > + B: [3, 4] (i.e. s.e) */ > + while (index_a < num_dimensions_a && index_b < num_dimensions_b) > + { > + /* REF_A and REF_B must be one of the component access types > + allowed by dr_analyze_indices. */ > + gcc_checking_assert (access_fn_component_p (ref_a)); > + gcc_checking_assert (access_fn_component_p (ref_b)); > + > + /* Get the immediately-enclosing objects for REF_A and REF_B, > + i.e. the references *before* applying DR_ACCESS_FN (A, INDEX_A) > + and DR_ACCESS_FN (B, INDEX_B). */ > + tree object_a = TREE_OPERAND (ref_a, 0); > + tree object_b = TREE_OPERAND (ref_b, 0); > + > + tree type_a = TREE_TYPE (object_a); > + tree type_b = TREE_TYPE (object_b); > + if (access_fn_components_comparable_p (ref_a, ref_b)) > + { > + /* This pair of component accesses is comparable for dependence > + analysis, so we can include DR_ACCESS_FN (A, INDEX_A) and > + DR_ACCESS_FN (B, INDEX_B) in the sequence. */ > + if (full_seq.start_a + full_seq.length != index_a > + || full_seq.start_b + full_seq.length != index_b) > + { > + /* The accesses don't extend the current sequence, > + so start a new one here. */ > + full_seq.start_a = index_a; > + full_seq.start_b = index_b; > + full_seq.length = 0; > + } > + > + /* Add this pair of references to the sequence. */ > + full_seq.length += 1; > + full_seq.object_a = object_a; > + full_seq.object_b = object_b; > + > + /* If the enclosing objects are structures (and thus have the > + same RECORD_TYPE), record the new sequence in STRUCT_SEQ. */ > + if (TREE_CODE (type_a) == RECORD_TYPE) > + struct_seq = full_seq; > + > + /* Move to the next containing reference for both A and B. */ > + ref_a = object_a; > + ref_b = object_b; > + index_a += 1; > + index_b += 1; > + continue; > + } > + > + /* Try to approach equal type sizes. */ > + if (!COMPLETE_TYPE_P (type_a) > + || !COMPLETE_TYPE_P (type_b) > + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) > + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) > + break; > + > + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); > + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); > + if (size_a <= size_b) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + index_a += 1; > + ref_a = object_a; > + } > + if (size_b <= size_a) > + { > + index_b += 1; > + ref_b = object_b; > } > - DDR_AFFINE_P (res) = true; > - DDR_ARE_DEPENDENT (res) = NULL_TREE; > - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); > - DDR_LOOP_NEST (res) = loop_nest; > - DDR_INNER_LOOP (res) = 0; > - DDR_SELF_REFERENCE (res) = true; > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > - { > - struct subscript *subscript; > - > - subscript = XNEW (struct subscript); > - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); > - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); > - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; > - SUB_DISTANCE (subscript) = chrec_dont_know; > - DDR_SUBSCRIPTS (res).safe_push (subscript); > - } > - return res; > } > > - /* If the references do not access the same object, we do not know > - whether they alias or not. We do not care about TBAA or alignment > - info so we can use OEP_ADDRESS_OF to avoid false negatives. > - But the accesses have to use compatible types as otherwise the > - built indices would not match. */ > - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF) > - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), > - TREE_TYPE (DR_BASE_OBJECT (b)))) > + /* See whether FULL_SEQ ends at the base and whether the two bases > + are equal. We do not care about TBAA or alignment info so we can > + use OEP_ADDRESS_OF to avoid false negatives. */ > + tree base_a = DR_BASE_OBJECT (a); > + tree base_b = DR_BASE_OBJECT (b); > + bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a > + && full_seq.start_b + full_seq.length == num_dimensions_b > + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) > + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) > + && types_compatible_p (TREE_TYPE (base_a), > + TREE_TYPE (base_b)) > + && (!loop_nest.exists () > + || (object_address_invariant_in_loop_p > + (loop_nest[0], base_a)))); > + > + /* If the bases are the same, we can include the base variation too. > + E.g. the b accesses in: > + > + for (int i = 0; i < n; ++i) > + b[i + 4][0] = b[i][0]; > + > + have a definite dependence distance of 4, while for: > + > + for (int i = 0; i < n; ++i) > + a[i + 4][0] = b[i][0]; > + > + the dependence distance depends on the gap between a and b. > + > + If the bases are different then we can only rely on the sequence > + rooted at a structure access, since arrays are allowed to overlap > + arbitrarily and change shape arbitrarily. E.g. we treat this as > + valid code: > + > + int a[256]; > + ... > + ((int (*)[4][3]) &a[1])[i][0] += ((int (*)[4][3]) &a[2])[i][0]; > + > + where two lvalues with the same int[4][3] type overlap, and where > + both lvalues are distinct from the object's declared type. */ > + if (same_base_p) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + if (DR_UNCONSTRAINED_BASE (a)) > + full_seq.length += 1; > } > + else > + full_seq = struct_seq; > > - /* If the base of the object is not invariant in the loop nest, we cannot > - analyze it. TODO -- in fact, it would suffice to record that there may > - be arbitrary dependences in the loops where the base object varies. */ > - if ((loop_nest.exists () > - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a))) > - || DR_NUM_DIMENSIONS (a) == 0) > + /* Punt if we didn't find a suitable sequence. */ > + if (full_seq.length == 0) > { > DDR_ARE_DEPENDENT (res) = chrec_dont_know; > return res; > } > > - /* If the number of dimensions of the access to not agree we can have > - a pointer access to a component of the array element type and an > - array access while the base-objects are still the same. Punt. */ > - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) > + if (!same_base_p) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + /* Partial overlap is possible for different bases when strict aliasing > + is not in effect. It's also possible if either base involves a union > + access; e.g. for: > + > + struct s1 { int a[2]; }; > + struct s2 { struct s1 b; int c; }; > + struct s3 { int d; struct s1 e; }; > + union u { struct s2 f; struct s3 g; } *p, *q; > + > + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at > + "p->g.e" (base "p->g") and might partially overlap the s1 at > + "q->g.e" (base "q->g"). */ > + if (!flag_strict_aliasing > + || ref_contains_union_access_p (full_seq.object_a) > + || ref_contains_union_access_p (full_seq.object_b)) > + { > + DDR_ARE_DEPENDENT (res) = chrec_dont_know; > + return res; > + } > + > + DDR_COULD_BE_INDEPENDENT_P (res) = true; > } > > DDR_AFFINE_P (res) = true; > DDR_ARE_DEPENDENT (res) = NULL_TREE; > - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); > + DDR_SUBSCRIPTS (res).create (full_seq.length); > DDR_LOOP_NEST (res) = loop_nest; > DDR_INNER_LOOP (res) = 0; > DDR_SELF_REFERENCE (res) = false; > > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > + for (i = 0; i < full_seq.length; ++i) > { > struct subscript *subscript; > > subscript = XNEW (struct subscript); > + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i); > + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i); > SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); > SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); > SUB_LAST_CONFLICT (subscript) = chrec_dont_know; > @@ -3839,14 +4103,15 @@ add_outer_distances (struct data_depende > } > > /* Return false when fail to represent the data dependence as a > - distance vector. INIT_B is set to true when a component has been > + distance vector. A_INDEX is the index of the first reference > + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the > + second reference. INIT_B is set to true when a component has been > added to the distance vector DIST_V. INDEX_CARRY is then set to > the index in DIST_V that carries the dependence. */ > > static bool > build_classic_dist_vector_1 (struct data_dependence_relation *ddr, > - struct data_reference *ddr_a, > - struct data_reference *ddr_b, > + unsigned int a_index, unsigned int b_index, > lambda_vector dist_v, bool *init_b, > int *index_carry) > { > @@ -3864,8 +4129,8 @@ build_classic_dist_vector_1 (struct data > return false; > } > > - access_fn_a = DR_ACCESS_FN (ddr_a, i); > - access_fn_b = DR_ACCESS_FN (ddr_b, i); > + access_fn_a = SUB_ACCESS_FN (subscript, a_index); > + access_fn_b = SUB_ACCESS_FN (subscript, b_index); > > if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC > && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) > @@ -3925,10 +4190,11 @@ build_classic_dist_vector_1 (struct data > constant_access_functions (const struct data_dependence_relation *ddr) > { > unsigned i; > + subscript *sub; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) > - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) > + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) > return false; > > return true; > @@ -3991,10 +4257,11 @@ add_other_self_distances (struct data_de > lambda_vector dist_v; > unsigned i; > int index_carry = DDR_NB_LOOPS (ddr); > + subscript *sub; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > { > - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); > + tree access_fun = SUB_ACCESS_FN (sub, 0); > > if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) > { > @@ -4006,7 +4273,7 @@ add_other_self_distances (struct data_de > return; > } > > - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); > + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); > > if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) > add_multivariate_self_dist (ddr, access_fun); > @@ -4077,6 +4344,23 @@ add_distance_for_zero_overlaps (struct d > } > } > > +/* Return true when the DDR contains two data references that have the > + same access functions. */ > + > +static inline bool > +same_access_functions (const struct data_dependence_relation *ddr) > +{ > + unsigned i; > + subscript *sub; > + > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), > + SUB_ACCESS_FN (sub, 1))) > + return false; > + > + return true; > +} > + > /* Compute the classic per loop distance vector. DDR is the data > dependence relation to build a vector from. Return false when fail > to represent the data dependence as a distance vector. */ > @@ -4108,8 +4392,7 @@ build_classic_dist_vector (struct data_d > } > > dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), > - dist_v, &init_b, &index_carry)) > + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry)) > return false; > > /* Save the distance vector if we initialized one. */ > @@ -4142,12 +4425,11 @@ build_classic_dist_vector (struct data_d > if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) > { > lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - loop_nest)) > + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) > return false; > compute_subscript_distance (ddr); > - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - save_v, &init_b, &index_carry)) > + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, > + &index_carry)) > return false; > save_dist_v (ddr, save_v); > DDR_REVERSED_P (ddr) = true; > @@ -4183,12 +4465,10 @@ build_classic_dist_vector (struct data_d > { > lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > > - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), > - DDR_A (ddr), loop_nest)) > + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) > return false; > compute_subscript_distance (ddr); > - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - opposite_v, &init_b, > + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, > &index_carry)) > return false; > > @@ -4267,13 +4547,13 @@ build_classic_dir_vector (struct data_de > } > } > > -/* Helper function. Returns true when there is a dependence between > - data references DRA and DRB. */ > +/* Helper function. Returns true when there is a dependence between the > + data references. A_INDEX is the index of the first reference (0 for > + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ > > static bool > subscript_dependence_tester_1 (struct data_dependence_relation *ddr, > - struct data_reference *dra, > - struct data_reference *drb, > + unsigned int a_index, unsigned int b_index, > struct loop *loop_nest) > { > unsigned int i; > @@ -4285,8 +4565,8 @@ subscript_dependence_tester_1 (struct da > { > conflict_function *overlaps_a, *overlaps_b; > > - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), > - DR_ACCESS_FN (drb, i), > + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), > + SUB_ACCESS_FN (subscript, b_index), > &overlaps_a, &overlaps_b, > &last_conflicts, loop_nest); > > @@ -4335,7 +4615,7 @@ subscript_dependence_tester_1 (struct da > subscript_dependence_tester (struct data_dependence_relation *ddr, > struct loop *loop_nest) > { > - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest)) > + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) > dependence_stats.num_dependence_dependent++; > > compute_subscript_distance (ddr); > Index: gcc/tree-ssa-loop-prefetch.c > =================================================================== > --- gcc/tree-ssa-loop-prefetch.c 2017-07-27 13:10:29.620045506 +0100 > +++ gcc/tree-ssa-loop-prefetch.c 2017-07-27 13:10:33.023912613 +0100 > @@ -1668,6 +1668,7 @@ determine_loop_nest_reuse (struct loop * > refb = (struct mem_ref *) DDR_B (dep)->aux; > > if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know > + || DDR_COULD_BE_INDEPENDENT_P (dep) > || DDR_NUM_DIST_VECTS (dep) == 0) > { > /* If the dependence cannot be analyzed, assume that there might be > Index: gcc/tree-vectorizer.h > =================================================================== > --- gcc/tree-vectorizer.h 2017-07-27 13:10:29.620045506 +0100 > +++ gcc/tree-vectorizer.h 2017-07-27 13:10:33.024912868 +0100 > @@ -358,7 +358,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) > #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ > ((L)->may_misalign_stmts.length () > 0) > #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ > - ((L)->may_alias_ddrs.length () > 0) > + ((L)->comp_alias_ddrs.length () > 0) > #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ > (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) > #define LOOP_REQUIRES_VERSIONING(L) \ > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2017-07-27 13:10:29.620045506 +0100 > +++ gcc/tree-vect-data-refs.c 2017-07-27 13:10:33.024912868 +0100 > @@ -160,6 +160,60 @@ vect_mark_for_runtime_alias_test (ddr_p > } > > > +/* A subroutine of vect_analyze_data_ref_dependence. Handle > + DDR_COULD_BE_INDEPENDENT_P ddr DDR that has a known set of dependence > + distances. These distances are conservatively correct but they don't > + reflect a guaranteed dependence. > + > + Return true if this function does all the work necessary to avoid > + an alias or false if the caller should use the dependence distances > + to limit the vectorization factor in the usual way. LOOP_DEPTH is > + the depth of the loop described by LOOP_VINFO and the other arguments > + are as for vect_analyze_data_ref_dependence. */ > + > +static bool > +vect_analyze_possibly_independent_ddr (data_dependence_relation *ddr, > + loop_vec_info loop_vinfo, > + int loop_depth, int *max_vf) > +{ > + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > + lambda_vector dist_v; > + unsigned int i; > + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > + { > + int dist = dist_v[loop_depth]; > + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) > + { > + /* If the user asserted safelen >= DIST consecutive iterations > + can be executed concurrently, assume independence. > + > + ??? An alternative would be to add the alias check even > + in this case, and vectorize the fallback loop with the > + maximum VF set to safelen. However, if the user has > + explicitly given a length, it's less likely that that > + would be a win. */ > + if (loop->safelen >= 2 && abs_hwi (dist) <= loop->safelen) > + { > + if (loop->safelen < *max_vf) > + *max_vf = loop->safelen; > + LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo) = false; > + continue; > + } > + > + /* For dependence distances of 2 or more, we have the option > + of limiting VF or checking for an alias at runtime. > + Prefer to check at runtime if we can, to avoid limiting > + the VF unnecessarily when the bases are in fact independent. > + > + Note that the alias checks will be removed if the VF ends up > + being small enough. */ > + return vect_mark_for_runtime_alias_test (ddr, loop_vinfo); > + } > + } > + return true; > +} > + > + > /* Function vect_analyze_data_ref_dependence. > > Return TRUE if there (might) exist a dependence between a memory-reference > @@ -305,6 +359,12 @@ vect_analyze_data_ref_dependence (struct > } > > loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); > + > + if (DDR_COULD_BE_INDEPENDENT_P (ddr) > + && vect_analyze_possibly_independent_ddr (ddr, loop_vinfo, > + loop_depth, max_vf)) > + return false; > + > FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > { > int dist = dist_v[loop_depth]; > @@ -2878,6 +2938,44 @@ vect_no_alias_p (struct data_reference * > return false; > } > > +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH > + in DDR is >= VF. */ > + > +static bool > +dependence_distance_ge_vf (data_dependence_relation *ddr, > + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) > +{ > + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE > + || DDR_NUM_DIST_VECTS (ddr) == 0) > + return false; > + > + /* If the dependence is exact, we should have limited the VF instead. */ > + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); > + > + unsigned int i; > + lambda_vector dist_v; > + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > + { > + HOST_WIDE_INT dist = dist_v[loop_depth]; > + if (dist != 0 > + && !(dist > 0 && DDR_REVERSED_P (ddr)) > + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) > + return false; > + } > + > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, > + "dependence distance between "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); > + dump_printf (MSG_NOTE, " and "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); > + dump_printf (MSG_NOTE, " is >= VF\n"); > + } > + > + return true; > +} > + > /* Function vect_prune_runtime_alias_test_list. > > Prune a list of ddrs to be tested at run-time by versioning for alias. > @@ -2908,6 +3006,10 @@ vect_prune_runtime_alias_test_list (loop > > comp_alias_ddrs.create (may_alias_ddrs.length ()); > > + unsigned int loop_depth > + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, > + LOOP_VINFO_LOOP_NEST (loop_vinfo)); > + > /* First, we collect all data ref pairs for aliasing checks. */ > FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) > { > @@ -2917,6 +3019,11 @@ vect_prune_runtime_alias_test_list (loop > tree segment_length_a, segment_length_b; > gimple *stmt_a, *stmt_b; > > + /* Ignore the alias if the VF we chose ended up being no greater > + than the dependence distance. */ > + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) > + continue; > + > dr_a = DDR_A (ddr); > stmt_a = DR_STMT (DDR_A (ddr)); > dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); > @@ -2993,10 +3100,6 @@ vect_prune_runtime_alias_test_list (loop > return false; > } > > - /* All alias checks have been resolved at compilation time. */ > - if (!comp_alias_ddrs.length ()) > - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); > - > return true; > } > > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c > =================================================================== > --- /dev/null 2017-07-27 10:25:31.671280760 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-07-27 13:10:33.022912357 +0100 > @@ -0,0 +1,120 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0 -fopenmp-simd" } */ > + > +/* Intended to be larger than any VF. */ > +#define GAP 128 > +#define N (GAP * 3) > + > +struct s { int x[N + 1]; }; > +struct t { struct s x[N + 1]; }; > +struct u { int x[N + 1]; int y; }; > +struct v { struct s s; }; > + > +void > +f1 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i]; > +} > + > +void > +f2 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[2].x[i]; > +} > + > +void > +f3 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[i].x[i]; > +} > + > +void > +f4 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[i].x[i] += b[i].x[i]; > +} > + > +void > +f5 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i + 1]; > +} > + > +void > +f6 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[2].x[i + 1]; > +} > + > +void > +f7 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[i].x[i + 1]; > +} > + > +void > +f8 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[i].x[i] += b[i].x[i + 1]; > +} > + > +void > +f9 (struct s *a, struct t *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[1].x[i]; > +} > + > +void > +f10 (struct s *a, struct t *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i].x[i]; > +} > + > +void > +f11 (struct u *a, struct u *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i] + b[i].y; > +} > + > +void > +f12 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +void > +f13 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP * 2; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +void > +f14 (struct v *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->s.x[i] = b->x[i]; > +} > + > +void > +f15 (struct s *a, struct s *b) > +{ > + #pragma omp simd safelen(N) > + for (int i = 0; i < N; ++i) > + a->x[i + 1] += b->x[i]; > +} > + > +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 15 "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c > =================================================================== > --- /dev/null 2017-07-27 10:25:31.671280760 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-07-27 13:10:33.022912357 +0100 > @@ -0,0 +1,35 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ > + > +#define N 16 > + > +struct s1 { int a[N]; }; > +struct s2 { struct s1 b; int c; }; > +struct s3 { int d; struct s1 e; }; > +union u { struct s2 f; struct s3 g; }; > + > +/* We allow a and b to overlap arbitrarily. */ > + > +void > +f1 (int a[][N], int b[][N]) > +{ > + for (int i = 0; i < N; ++i) > + a[0][i] += b[0][i]; > +} > + > +void > +f2 (union u *a, union u *b) > +{ > + for (int i = 0; i < N; ++i) > + a->f.b.a[i] += b->g.e.a[i]; > +} > + > +void > +f3 (struct s1 *a, struct s1 *b) > +{ > + for (int i = 0; i < N - 1; ++i) > + a->a[i + 1] += b->a[i]; > +} > + > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c > =================================================================== > --- /dev/null 2017-07-27 10:25:31.671280760 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-07-27 13:10:33.022912357 +0100 > @@ -0,0 +1,19 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > + > +/* Intended to be larger than any VF. */ > +#define GAP 128 > +#define N (GAP * 3) > + > +struct s { int x[N]; }; > + > +void > +f1 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP * 2; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +/* { dg-final { scan-tree-dump-times "consider run-time aliasing" 1 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */
Richard Biener <richard.guenther@gmail.com> writes: > On Thu, Jul 27, 2017 at 2:19 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Richard Sandiford <richard.sandiford@linaro.org> writes: >>> Eric Botcazou <ebotcazou@adacore.com> writes: >>>> [Sorry for missing the previous messages] >>>> >>>>> Thanks. Just been retesting, and I think I must have forgotten >>>>> to include Ada last time. It turns out that the patch causes a dg-scan >>>>> regression in gnat.dg/vect17.adb, because we now think that if the >>>>> array RECORD_TYPEs *do* alias in: >>>>> >>>>> procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is >>>>> begin >>>>> for I in Sarray'Range loop >>>>> R(I) := X(I) + Y(I); >>>>> end loop; >>>>> end; >>>>> >>>>> then the dependence distance must be zero. Eric, does that hold true >>>>> for Ada? I.e. if X and R (or Y and R) alias, must it be the case that >>>>> X(I) can only alias R(I) and not for example R(I-1) or R(I+1)? >>>> >>>> Yes, I'd think so (even without the artificial RECORD_TYPE around > the arrays). >>> >>> Good! >>> >>>>> 2017-06-07 Richard Sandiford <richard.sandiford@linaro.org> >>>>> >>>>> gcc/testsuite/ >>>>> * gnat.dg/vect17.ads (Sarray): Increase range to 1 .. 5. >>>>> * gnat.dg/vect17.adb (Add): Create a dependence distance of 1 >>>>> when X = R or Y = R. >>>> >>>> I think that you need to modify vect15 and vect16 the same way. >>> >>> Ah, yeah. And doing that shows that I'd not handled safelen for >>> DDR_COULD_BE_INDEPENDENT_P. I've fixed that locally. >>> >>> How does this look? Tested on x86_64-linux-gnu both without the >>> vectoriser changes and with the fixed vectoriser patch. >> >> Here's a version of the patch that handles safelen. I split the >> handling out into a new function (vect_analyze_possibly_independent_ddr) >> since it was getting too big to do inline. >> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > > Ok. Thanks! > Did you check whether BB vectorization is affected? See > vect_slp_analyze_instance_dependence > and friends. It's quite conservative but given the prefetching change > I wonder if we need > to rule out DDR_COULD_BE_INDEPENDENT_P? I think it should be OK. When DDR_COULD_BE_INDEPENDENT_P is set, we've effectively changed from DDR_ARE_DEPENDENT == chrec_dont_know to a conservatively-correct distance vector. It looks like vect_slp_analyze_data_ref_dependence handles both cases in the same way (by returning true). Thanks, Richard > > Thanks, > Richard. > >> Thanks, >> Richard >> >> >> 2017-07-27 Richard Sandiford <richard.sandiford@linaro.org> >> >> gcc/ >> * tree-data-ref.h (subscript): Add access_fn field. >> (data_dependence_relation): Add could_be_independent_p. >> (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. >> (same_access_functions): Move to tree-data-ref.c. >> * tree-data-ref.c (ref_contains_union_access_p): New function. >> (access_fn_component_p): Likewise. >> (access_fn_components_comparable_p): Likewise. >> (dr_analyze_indices): Add a reference to access_fn_component_p. >> (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of >> DR_ACCESS_FN. >> (constant_access_functions): Likewise. >> (add_other_self_distances): Likewise. >> (same_access_functions): Likewise. (Moved from tree-data-ref.h.) >> (initialize_data_dependence_relation): Use XCNEW and remove >> explicit zeroing of DDR_REVERSED_P. Look for a subsequence >> of access functions that have the same type. Allow the >> subsequence to end with different bases in some circumstances. >> Record the chosen access functions in SUB_ACCESS_FN. >> (build_classic_dist_vector_1): Replace ddr_a and ddr_b with >> a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. >> (subscript_dependence_tester_1): Likewise dra and drb. >> (build_classic_dist_vector): Update calls accordingly. >> (subscript_dependence_tester): Likewise. >> * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check >> DDR_COULD_BE_INDEPENDENT_P. >> * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test >> comp_alias_ddrs instead of may_alias_ddrs. >> * tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr): >> New function. >> (vect_analyze_data_ref_dependence): Use it if >> DDR_COULD_BE_INDEPENDENT_P, but fall back to using the recorded >> distance vectors if that fails. >> (dependence_distance_ge_vf): New function. >> (vect_prune_runtime_alias_test_list): Use it. Don't clear >> LOOP_VINFO_MAY_ALIAS_DDRS. >> >> gcc/testsuite/ >> * gcc.dg/vect/vect-alias-check-3.c: New test. >> * gcc.dg/vect/vect-alias-check-4.c: Likewise. >> * gcc.dg/vect/vect-alias-check-5.c: Likewise. >> >> Index: gcc/tree-data-ref.h >> =================================================================== >> --- gcc/tree-data-ref.h 2017-07-27 13:10:29.620045506 +0100 >> +++ gcc/tree-data-ref.h 2017-07-27 13:10:33.023912613 +0100 >> @@ -260,6 +260,9 @@ struct conflict_function >> >> struct subscript >> { >> + /* The access functions of the two references. */ >> + tree access_fn[2]; >> + >> /* A description of the iterations for which the elements are >> accessed twice. */ >> conflict_function *conflicting_iterations_in_a; >> @@ -278,6 +281,7 @@ struct subscript >> >> typedef struct subscript *subscript_p; >> >> +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] >> #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a >> #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b >> #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict >> @@ -333,6 +337,33 @@ struct data_dependence_relation >> /* Set to true when the dependence relation is on the same data >> access. */ >> bool self_reference_p; >> + >> + /* True if the dependence described is conservatively correct rather >> + than exact, and if it is still possible for the accesses to be >> + conditionally independent. For example, the a and b references in: >> + >> + struct s *a, *b; >> + for (int i = 0; i < n; ++i) >> + a->f[i] += b->f[i]; >> + >> + conservatively have a distance vector of (0), for the case in which >> + a == b, but the accesses are independent if a != b. Similarly, >> + the a and b references in: >> + >> + struct s *a, *b; >> + for (int i = 0; i < n; ++i) >> + a[0].f[i] += b[i].f[i]; >> + >> + conservatively have a distance vector of (0), but they are indepenent >> + when a != b + i. In contrast, the references in: >> + >> + struct s *a; >> + for (int i = 0; i < n; ++i) >> + a->f[i] += a->f[i]; >> + >> + have the same distance vector of (0), but the accesses can never be >> + independent. */ >> + bool could_be_independent_p; >> }; >> >> typedef struct data_dependence_relation *ddr_p; >> @@ -363,6 +394,7 @@ #define DDR_DIR_VECT(DDR, I) \ >> #define DDR_DIST_VECT(DDR, I) \ >> DDR_DIST_VECTS (DDR)[I] >> #define DDR_REVERSED_P(DDR) (DDR)->reversed_p >> +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p >> >> >> bool dr_analyze_innermost (innermost_loop_behavior *, tree, struct loop *); >> @@ -457,22 +489,6 @@ same_data_refs (data_reference_p a, data >> return false; >> >> return true; >> -} >> - >> -/* Return true when the DDR contains two data references that have the >> - same access functions. */ >> - >> -static inline bool >> -same_access_functions (const struct data_dependence_relation *ddr) >> -{ >> - unsigned i; >> - >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), >> - DR_ACCESS_FN (DDR_B (ddr), i))) >> - return false; >> - >> - return true; >> } >> >> /* Returns true when all the dependences are computable. */ >> Index: gcc/tree-data-ref.c >> =================================================================== >> --- gcc/tree-data-ref.c 2017-07-27 13:10:29.620045506 +0100 >> +++ gcc/tree-data-ref.c 2017-07-27 13:10:33.023912613 +0100 >> @@ -124,8 +124,7 @@ Software Foundation; either version 3, o >> } dependence_stats; >> >> static bool subscript_dependence_tester_1 (struct data_dependence_relation *, >> - struct data_reference *, >> - struct data_reference *, >> + unsigned int, unsigned int, >> struct loop *); >> /* Returns true iff A divides B. */ >> >> @@ -145,6 +144,21 @@ int_divides_p (int a, int b) >> return ((b % a) == 0); >> } >> >> +/* Return true if reference REF contains a union access. */ >> + >> +static bool >> +ref_contains_union_access_p (tree ref) >> +{ >> + while (handled_component_p (ref)) >> + { >> + ref = TREE_OPERAND (ref, 0); >> + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE >> + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) >> + return true; >> + } >> + return false; >> +} >> + >> >> >> /* Dump into FILE all the data references from DATAREFS. */ >> @@ -434,13 +448,14 @@ dump_data_dependence_relation (FILE *out >> unsigned int i; >> struct loop *loopi; >> >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> + subscript *sub; >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> { >> fprintf (outf, " access_fn_A: "); >> - print_generic_stmt (outf, DR_ACCESS_FN (dra, i)); >> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0)); >> fprintf (outf, " access_fn_B: "); >> - print_generic_stmt (outf, DR_ACCESS_FN (drb, i)); >> - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); >> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1)); >> + dump_subscript (outf, sub); >> } >> >> fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); >> @@ -920,6 +935,27 @@ dr_analyze_innermost (innermost_loop_beh >> return true; >> } >> >> +/* Return true if OP is a valid component reference for a DR access >> + function. This accepts a subset of what handled_component_p accepts. */ >> + >> +static bool >> +access_fn_component_p (tree op) >> +{ >> + switch (TREE_CODE (op)) >> + { >> + case REALPART_EXPR: >> + case IMAGPART_EXPR: >> + case ARRAY_REF: >> + return true; >> + >> + case COMPONENT_REF: >> + return TREE_CODE (TREE_TYPE (TREE_OPERAND (op, 0))) == RECORD_TYPE; >> + >> + default: >> + return false; >> + } >> +} >> + >> /* Determines the base object and the list of indices of memory reference >> DR, analyzed in LOOP and instantiated in loop nest NEST. */ >> >> @@ -957,7 +993,9 @@ dr_analyze_indices (struct data_referenc >> access_fns.safe_push (integer_one_node); >> } >> >> - /* Analyze access functions of dimensions we know to be independent. */ >> + /* Analyze access functions of dimensions we know to be independent. >> + The list of component references handled here should be kept in >> + sync with access_fn_component_p. */ >> while (handled_component_p (ref)) >> { >> if (TREE_CODE (ref) == ARRAY_REF) >> @@ -2148,6 +2186,38 @@ dr_may_alias_p (const struct data_refere >> return refs_may_alias_p (addr_a, addr_b); >> } >> >> +/* REF_A and REF_B both satisfy access_fn_component_p. Return true >> + if it is meaningful to compare their associated access functions >> + when checking for dependencies. */ >> + >> +static bool >> +access_fn_components_comparable_p (tree ref_a, tree ref_b) >> +{ >> + /* Allow pairs of component refs from the following sets: >> + >> + { REALPART_EXPR, IMAGPART_EXPR } >> + { COMPONENT_REF } >> + { ARRAY_REF }. */ >> + tree_code code_a = TREE_CODE (ref_a); >> + tree_code code_b = TREE_CODE (ref_b); >> + if (code_a == IMAGPART_EXPR) >> + code_a = REALPART_EXPR; >> + if (code_b == IMAGPART_EXPR) >> + code_b = REALPART_EXPR; >> + if (code_a != code_b) >> + return false; >> + >> + if (TREE_CODE (ref_a) == COMPONENT_REF) >> + /* ??? We cannot simply use the type of operand #0 of the refs here as >> + the Fortran compiler smuggles type punning into COMPONENT_REFs. >> + Use the DECL_CONTEXT of the FIELD_DECLs instead. */ >> + return (DECL_CONTEXT (TREE_OPERAND (ref_a, 1)) >> + == DECL_CONTEXT (TREE_OPERAND (ref_b, 1))); >> + >> + return types_compatible_p (TREE_TYPE (TREE_OPERAND (ref_a, 0)), >> + TREE_TYPE (TREE_OPERAND (ref_b, 0))); >> +} >> + >> /* Initialize a data dependence relation between data accesses A and >> B. NB_LOOPS is the number of loops surrounding the references: the >> size of the classic distance/direction vectors. */ >> @@ -2160,11 +2230,10 @@ initialize_data_dependence_relation (str >> struct data_dependence_relation *res; >> unsigned int i; >> >> - res = XNEW (struct data_dependence_relation); >> + res = XCNEW (struct data_dependence_relation); >> DDR_A (res) = a; >> DDR_B (res) = b; >> DDR_LOOP_NEST (res).create (0); >> - DDR_REVERSED_P (res) = false; >> DDR_SUBSCRIPTS (res).create (0); >> DDR_DIR_VECTS (res).create (0); >> DDR_DIST_VECTS (res).create (0); >> @@ -2182,82 +2251,277 @@ initialize_data_dependence_relation (str >> return res; >> } >> >> - /* The case where the references are exactly the same. */ >> - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) >> + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); >> + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); >> + if (num_dimensions_a == 0 || num_dimensions_b == 0) >> { >> - if ((loop_nest.exists () >> - && !object_address_invariant_in_loop_p (loop_nest[0], >> - DR_BASE_OBJECT (a))) >> - || DR_NUM_DIMENSIONS (a) == 0) >> + DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> + return res; >> + } >> + >> + /* For unconstrained bases, the root (highest-indexed) subscript >> + describes a variation in the base of the original DR_REF rather >> + than a component access. We have no type that accurately describes >> + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* >> + applying this subscript) so limit the search to the last real >> + component access. >> + >> + E.g. for: >> + >> + void >> + f (int a[][8], int b[][8]) >> + { >> + for (int i = 0; i < 8; ++i) >> + a[i * 2][0] = b[i][0]; >> + } >> + >> + the a and b accesses have a single ARRAY_REF component reference [0] >> + but have two subscripts. */ >> + if (DR_UNCONSTRAINED_BASE (a)) >> + num_dimensions_a -= 1; >> + if (DR_UNCONSTRAINED_BASE (b)) >> + num_dimensions_b -= 1; >> + >> + /* These structures describe sequences of component references in >> + DR_REF (A) and DR_REF (B). Each component reference is tied to a >> + specific access function. */ >> + struct { >> + /* The sequence starts at DR_ACCESS_FN (A, START_A) of A and >> + DR_ACCESS_FN (B, START_B) of B (inclusive) and extends to higher >> + indices. In C notation, these are the indices of the rightmost >> + component references; e.g. for a sequence .b.c.d, the start >> + index is for .d. */ >> + unsigned int start_a; >> + unsigned int start_b; >> + >> + /* The sequence contains LENGTH consecutive access functions from >> + each DR. */ >> + unsigned int length; >> + >> + /* The enclosing objects for the A and B sequences respectively, >> + i.e. the objects to which DR_ACCESS_FN (A, START_A + LENGTH - 1) >> + and DR_ACCESS_FN (B, START_B + LENGTH - 1) are applied. */ >> + tree object_a; >> + tree object_b; >> + } full_seq = {}, struct_seq = {}; >> + >> + /* Before each iteration of the loop: >> + >> + - REF_A is what you get after applying DR_ACCESS_FN (A, INDEX_A) and >> + - REF_B is what you get after applying DR_ACCESS_FN (B, INDEX_B). */ >> + unsigned int index_a = 0; >> + unsigned int index_b = 0; >> + tree ref_a = DR_REF (a); >> + tree ref_b = DR_REF (b); >> + >> + /* Now walk the component references from the final DR_REFs back up to >> + the enclosing base objects. Each component reference corresponds >> + to one access function in the DR, with access function 0 being for >> + the final DR_REF and the highest-indexed access function being the >> + one that is applied to the base of the DR. >> + >> + Look for a sequence of component references whose access functions >> + are comparable (see access_fn_components_comparable_p). If more >> + than one such sequence exists, pick the one nearest the base >> + (which is the leftmost sequence in C notation). Store this sequence >> + in FULL_SEQ. >> + >> + For example, if we have: >> + >> + struct foo { struct bar s; ... } (*a)[10], (*b)[10]; >> + >> + A: a[0][i].s.c.d >> + B: __real b[0][i].s.e[i].f >> + >> + (where d is the same type as the real component of f) then the access >> + functions would be: >> + >> + 0 1 2 3 >> + A: .d .c .s [i] >> + >> + 0 1 2 3 4 5 >> + B: __real .f [i] .e .s [i] >> + >> + The A0/B2 column isn't comparable, since .d is a COMPONENT_REF >> + and [i] is an ARRAY_REF. However, the A1/B3 column contains two >> + COMPONENT_REF accesses for struct bar, so is comparable. Likewise >> + the A2/B4 column contains two COMPONENT_REF accesses for struct foo, >> + so is comparable. The A3/B5 column contains two ARRAY_REFs that >> + index foo[10] arrays, so is again comparable. The sequence is >> + therefore: >> + >> + A: [1, 3] (i.e. [i].s.c) >> + B: [3, 5] (i.e. [i].s.e) >> + >> + Also look for sequences of component references whose access >> + functions are comparable and whose enclosing objects have the same >> + RECORD_TYPE. Store this sequence in STRUCT_SEQ. In the above >> + example, STRUCT_SEQ would be: >> + >> + A: [1, 2] (i.e. s.c) >> + B: [3, 4] (i.e. s.e) */ >> + while (index_a < num_dimensions_a && index_b < num_dimensions_b) >> + { >> + /* REF_A and REF_B must be one of the component access types >> + allowed by dr_analyze_indices. */ >> + gcc_checking_assert (access_fn_component_p (ref_a)); >> + gcc_checking_assert (access_fn_component_p (ref_b)); >> + >> + /* Get the immediately-enclosing objects for REF_A and REF_B, >> + i.e. the references *before* applying DR_ACCESS_FN (A, INDEX_A) >> + and DR_ACCESS_FN (B, INDEX_B). */ >> + tree object_a = TREE_OPERAND (ref_a, 0); >> + tree object_b = TREE_OPERAND (ref_b, 0); >> + >> + tree type_a = TREE_TYPE (object_a); >> + tree type_b = TREE_TYPE (object_b); >> + if (access_fn_components_comparable_p (ref_a, ref_b)) >> + { >> + /* This pair of component accesses is comparable for dependence >> + analysis, so we can include DR_ACCESS_FN (A, INDEX_A) and >> + DR_ACCESS_FN (B, INDEX_B) in the sequence. */ >> + if (full_seq.start_a + full_seq.length != index_a >> + || full_seq.start_b + full_seq.length != index_b) >> + { >> + /* The accesses don't extend the current sequence, >> + so start a new one here. */ >> + full_seq.start_a = index_a; >> + full_seq.start_b = index_b; >> + full_seq.length = 0; >> + } >> + >> + /* Add this pair of references to the sequence. */ >> + full_seq.length += 1; >> + full_seq.object_a = object_a; >> + full_seq.object_b = object_b; >> + >> + /* If the enclosing objects are structures (and thus have the >> + same RECORD_TYPE), record the new sequence in STRUCT_SEQ. */ >> + if (TREE_CODE (type_a) == RECORD_TYPE) >> + struct_seq = full_seq; >> + >> + /* Move to the next containing reference for both A and B. */ >> + ref_a = object_a; >> + ref_b = object_b; >> + index_a += 1; >> + index_b += 1; >> + continue; >> + } >> + >> + /* Try to approach equal type sizes. */ >> + if (!COMPLETE_TYPE_P (type_a) >> + || !COMPLETE_TYPE_P (type_b) >> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) >> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) >> + break; >> + >> + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); >> + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); >> + if (size_a <= size_b) >> { >> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> - return res; >> + index_a += 1; >> + ref_a = object_a; >> + } >> + if (size_b <= size_a) >> + { >> + index_b += 1; >> + ref_b = object_b; >> } >> - DDR_AFFINE_P (res) = true; >> - DDR_ARE_DEPENDENT (res) = NULL_TREE; >> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); >> - DDR_LOOP_NEST (res) = loop_nest; >> - DDR_INNER_LOOP (res) = 0; >> - DDR_SELF_REFERENCE (res) = true; >> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) >> - { >> - struct subscript *subscript; >> - >> - subscript = XNEW (struct subscript); >> - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); >> - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); >> - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; >> - SUB_DISTANCE (subscript) = chrec_dont_know; >> - DDR_SUBSCRIPTS (res).safe_push (subscript); >> - } >> - return res; >> } >> >> - /* If the references do not access the same object, we do not know >> - whether they alias or not. We do not care about TBAA or alignment >> - info so we can use OEP_ADDRESS_OF to avoid false negatives. >> - But the accesses have to use compatible types as otherwise the >> - built indices would not match. */ >> - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), > OEP_ADDRESS_OF) >> - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), >> - TREE_TYPE (DR_BASE_OBJECT (b)))) >> + /* See whether FULL_SEQ ends at the base and whether the two bases >> + are equal. We do not care about TBAA or alignment info so we can >> + use OEP_ADDRESS_OF to avoid false negatives. */ >> + tree base_a = DR_BASE_OBJECT (a); >> + tree base_b = DR_BASE_OBJECT (b); >> + bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a >> + && full_seq.start_b + full_seq.length == num_dimensions_b >> + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) >> + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) >> + && types_compatible_p (TREE_TYPE (base_a), >> + TREE_TYPE (base_b)) >> + && (!loop_nest.exists () >> + || (object_address_invariant_in_loop_p >> + (loop_nest[0], base_a)))); >> + >> + /* If the bases are the same, we can include the base variation too. >> + E.g. the b accesses in: >> + >> + for (int i = 0; i < n; ++i) >> + b[i + 4][0] = b[i][0]; >> + >> + have a definite dependence distance of 4, while for: >> + >> + for (int i = 0; i < n; ++i) >> + a[i + 4][0] = b[i][0]; >> + >> + the dependence distance depends on the gap between a and b. >> + >> + If the bases are different then we can only rely on the sequence >> + rooted at a structure access, since arrays are allowed to overlap >> + arbitrarily and change shape arbitrarily. E.g. we treat this as >> + valid code: >> + >> + int a[256]; >> + ... >> + ((int (*)[4][3]) &a[1])[i][0] += ((int (*)[4][3]) &a[2])[i][0]; >> + >> + where two lvalues with the same int[4][3] type overlap, and where >> + both lvalues are distinct from the object's declared type. */ >> + if (same_base_p) >> { >> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> - return res; >> + if (DR_UNCONSTRAINED_BASE (a)) >> + full_seq.length += 1; >> } >> + else >> + full_seq = struct_seq; >> >> - /* If the base of the object is not invariant in the loop nest, we cannot >> - analyze it. TODO -- in fact, it would suffice to record that there may >> - be arbitrary dependences in the loops where the base object varies. */ >> - if ((loop_nest.exists () >> - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT > (a))) >> - || DR_NUM_DIMENSIONS (a) == 0) >> + /* Punt if we didn't find a suitable sequence. */ >> + if (full_seq.length == 0) >> { >> DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> return res; >> } >> >> - /* If the number of dimensions of the access to not agree we can have >> - a pointer access to a component of the array element type and an >> - array access while the base-objects are still the same. Punt. */ >> - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) >> + if (!same_base_p) >> { >> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> - return res; >> + /* Partial overlap is possible for different bases when strict aliasing >> + is not in effect. It's also possible if either base involves a union >> + access; e.g. for: >> + >> + struct s1 { int a[2]; }; >> + struct s2 { struct s1 b; int c; }; >> + struct s3 { int d; struct s1 e; }; >> + union u { struct s2 f; struct s3 g; } *p, *q; >> + >> + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at >> + "p->g.e" (base "p->g") and might partially overlap the s1 at >> + "q->g.e" (base "q->g"). */ >> + if (!flag_strict_aliasing >> + || ref_contains_union_access_p (full_seq.object_a) >> + || ref_contains_union_access_p (full_seq.object_b)) >> + { >> + DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> + return res; >> + } >> + >> + DDR_COULD_BE_INDEPENDENT_P (res) = true; >> } >> >> DDR_AFFINE_P (res) = true; >> DDR_ARE_DEPENDENT (res) = NULL_TREE; >> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); >> + DDR_SUBSCRIPTS (res).create (full_seq.length); >> DDR_LOOP_NEST (res) = loop_nest; >> DDR_INNER_LOOP (res) = 0; >> DDR_SELF_REFERENCE (res) = false; >> >> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) >> + for (i = 0; i < full_seq.length; ++i) >> { >> struct subscript *subscript; >> >> subscript = XNEW (struct subscript); >> + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i); >> + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i); >> SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); >> SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); >> SUB_LAST_CONFLICT (subscript) = chrec_dont_know; >> @@ -3839,14 +4103,15 @@ add_outer_distances (struct data_depende >> } >> >> /* Return false when fail to represent the data dependence as a >> - distance vector. INIT_B is set to true when a component has been >> + distance vector. A_INDEX is the index of the first reference >> + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the >> + second reference. INIT_B is set to true when a component has been >> added to the distance vector DIST_V. INDEX_CARRY is then set to >> the index in DIST_V that carries the dependence. */ >> >> static bool >> build_classic_dist_vector_1 (struct data_dependence_relation *ddr, >> - struct data_reference *ddr_a, >> - struct data_reference *ddr_b, >> + unsigned int a_index, unsigned int b_index, >> lambda_vector dist_v, bool *init_b, >> int *index_carry) >> { >> @@ -3864,8 +4129,8 @@ build_classic_dist_vector_1 (struct data >> return false; >> } >> >> - access_fn_a = DR_ACCESS_FN (ddr_a, i); >> - access_fn_b = DR_ACCESS_FN (ddr_b, i); >> + access_fn_a = SUB_ACCESS_FN (subscript, a_index); >> + access_fn_b = SUB_ACCESS_FN (subscript, b_index); >> >> if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC >> && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) >> @@ -3925,10 +4190,11 @@ build_classic_dist_vector_1 (struct data >> constant_access_functions (const struct data_dependence_relation *ddr) >> { >> unsigned i; >> + subscript *sub; >> >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) >> - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) >> + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) >> return false; >> >> return true; >> @@ -3991,10 +4257,11 @@ add_other_self_distances (struct data_de >> lambda_vector dist_v; >> unsigned i; >> int index_carry = DDR_NB_LOOPS (ddr); >> + subscript *sub; >> >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> { >> - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); >> + tree access_fun = SUB_ACCESS_FN (sub, 0); >> >> if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) >> { >> @@ -4006,7 +4273,7 @@ add_other_self_distances (struct data_de >> return; >> } >> >> - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); >> + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); >> >> if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) >> add_multivariate_self_dist (ddr, access_fun); >> @@ -4077,6 +4344,23 @@ add_distance_for_zero_overlaps (struct d >> } >> } >> >> +/* Return true when the DDR contains two data references that have the >> + same access functions. */ >> + >> +static inline bool >> +same_access_functions (const struct data_dependence_relation *ddr) >> +{ >> + unsigned i; >> + subscript *sub; >> + >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), >> + SUB_ACCESS_FN (sub, 1))) >> + return false; >> + >> + return true; >> +} >> + >> /* Compute the classic per loop distance vector. DDR is the data >> dependence relation to build a vector from. Return false when fail >> to represent the data dependence as a distance vector. */ >> @@ -4108,8 +4392,7 @@ build_classic_dist_vector (struct data_d >> } >> >> dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >> - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), >> - dist_v, &init_b, &index_carry)) >> + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, > &index_carry)) >> return false; >> >> /* Save the distance vector if we initialized one. */ >> @@ -4142,12 +4425,11 @@ build_classic_dist_vector (struct data_d >> if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) >> { >> lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), >> - loop_nest)) >> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) >> return false; >> compute_subscript_distance (ddr); >> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), >> - save_v, &init_b, &index_carry)) >> + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, >> + &index_carry)) >> return false; >> save_dist_v (ddr, save_v); >> DDR_REVERSED_P (ddr) = true; >> @@ -4183,12 +4465,10 @@ build_classic_dist_vector (struct data_d >> { >> lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >> >> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), >> - DDR_A (ddr), loop_nest)) >> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) >> return false; >> compute_subscript_distance (ddr); >> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), >> - opposite_v, &init_b, >> + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, >> &index_carry)) >> return false; >> >> @@ -4267,13 +4547,13 @@ build_classic_dir_vector (struct data_de >> } >> } >> >> -/* Helper function. Returns true when there is a dependence between >> - data references DRA and DRB. */ >> +/* Helper function. Returns true when there is a dependence between the >> + data references. A_INDEX is the index of the first reference (0 for >> + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ >> >> static bool >> subscript_dependence_tester_1 (struct data_dependence_relation *ddr, >> - struct data_reference *dra, >> - struct data_reference *drb, >> + unsigned int a_index, unsigned int b_index, >> struct loop *loop_nest) >> { >> unsigned int i; >> @@ -4285,8 +4565,8 @@ subscript_dependence_tester_1 (struct da >> { >> conflict_function *overlaps_a, *overlaps_b; >> >> - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), >> - DR_ACCESS_FN (drb, i), >> + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), >> + SUB_ACCESS_FN (subscript, b_index), >> &overlaps_a, &overlaps_b, >> &last_conflicts, loop_nest); >> >> @@ -4335,7 +4615,7 @@ subscript_dependence_tester_1 (struct da >> subscript_dependence_tester (struct data_dependence_relation *ddr, >> struct loop *loop_nest) >> { >> - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), > loop_nest)) >> + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) >> dependence_stats.num_dependence_dependent++; >> >> compute_subscript_distance (ddr); >> Index: gcc/tree-ssa-loop-prefetch.c >> =================================================================== >> --- gcc/tree-ssa-loop-prefetch.c 2017-07-27 13:10:29.620045506 +0100 >> +++ gcc/tree-ssa-loop-prefetch.c 2017-07-27 13:10:33.023912613 +0100 >> @@ -1668,6 +1668,7 @@ determine_loop_nest_reuse (struct loop * >> refb = (struct mem_ref *) DDR_B (dep)->aux; >> >> if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know >> + || DDR_COULD_BE_INDEPENDENT_P (dep) >> || DDR_NUM_DIST_VECTS (dep) == 0) >> { >> /* If the dependence cannot be analyzed, assume that there might be >> Index: gcc/tree-vectorizer.h >> =================================================================== >> --- gcc/tree-vectorizer.h 2017-07-27 13:10:29.620045506 +0100 >> +++ gcc/tree-vectorizer.h 2017-07-27 13:10:33.024912868 +0100 >> @@ -358,7 +358,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) >> #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ >> ((L)->may_misalign_stmts.length () > 0) >> #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ >> - ((L)->may_alias_ddrs.length () > 0) >> + ((L)->comp_alias_ddrs.length () > 0) >> #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ >> (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) >> #define LOOP_REQUIRES_VERSIONING(L) \ >> Index: gcc/tree-vect-data-refs.c >> =================================================================== >> --- gcc/tree-vect-data-refs.c 2017-07-27 13:10:29.620045506 +0100 >> +++ gcc/tree-vect-data-refs.c 2017-07-27 13:10:33.024912868 +0100 >> @@ -160,6 +160,60 @@ vect_mark_for_runtime_alias_test (ddr_p >> } >> >> >> +/* A subroutine of vect_analyze_data_ref_dependence. Handle >> + DDR_COULD_BE_INDEPENDENT_P ddr DDR that has a known set of dependence >> + distances. These distances are conservatively correct but they don't >> + reflect a guaranteed dependence. >> + >> + Return true if this function does all the work necessary to avoid >> + an alias or false if the caller should use the dependence distances >> + to limit the vectorization factor in the usual way. LOOP_DEPTH is >> + the depth of the loop described by LOOP_VINFO and the other arguments >> + are as for vect_analyze_data_ref_dependence. */ >> + >> +static bool >> +vect_analyze_possibly_independent_ddr (data_dependence_relation *ddr, >> + loop_vec_info loop_vinfo, >> + int loop_depth, int *max_vf) >> +{ >> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); >> + lambda_vector dist_v; >> + unsigned int i; >> + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) >> + { >> + int dist = dist_v[loop_depth]; >> + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) >> + { >> + /* If the user asserted safelen >= DIST consecutive iterations >> + can be executed concurrently, assume independence. >> + >> + ??? An alternative would be to add the alias check even >> + in this case, and vectorize the fallback loop with the >> + maximum VF set to safelen. However, if the user has >> + explicitly given a length, it's less likely that that >> + would be a win. */ >> + if (loop->safelen >= 2 && abs_hwi (dist) <= loop->safelen) >> + { >> + if (loop->safelen < *max_vf) >> + *max_vf = loop->safelen; >> + LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo) = false; >> + continue; >> + } >> + >> + /* For dependence distances of 2 or more, we have the option >> + of limiting VF or checking for an alias at runtime. >> + Prefer to check at runtime if we can, to avoid limiting >> + the VF unnecessarily when the bases are in fact independent. >> + >> + Note that the alias checks will be removed if the VF ends up >> + being small enough. */ >> + return vect_mark_for_runtime_alias_test (ddr, loop_vinfo); >> + } >> + } >> + return true; >> +} >> + >> + >> /* Function vect_analyze_data_ref_dependence. >> >> Return TRUE if there (might) exist a dependence between a memory-reference >> @@ -305,6 +359,12 @@ vect_analyze_data_ref_dependence (struct >> } >> >> loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); >> + >> + if (DDR_COULD_BE_INDEPENDENT_P (ddr) >> + && vect_analyze_possibly_independent_ddr (ddr, loop_vinfo, >> + loop_depth, max_vf)) >> + return false; >> + >> FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) >> { >> int dist = dist_v[loop_depth]; >> @@ -2878,6 +2938,44 @@ vect_no_alias_p (struct data_reference * >> return false; >> } >> >> +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH >> + in DDR is >= VF. */ >> + >> +static bool >> +dependence_distance_ge_vf (data_dependence_relation *ddr, >> + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) >> +{ >> + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE >> + || DDR_NUM_DIST_VECTS (ddr) == 0) >> + return false; >> + >> + /* If the dependence is exact, we should have limited the VF instead. */ >> + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); >> + >> + unsigned int i; >> + lambda_vector dist_v; >> + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) >> + { >> + HOST_WIDE_INT dist = dist_v[loop_depth]; >> + if (dist != 0 >> + && !(dist > 0 && DDR_REVERSED_P (ddr)) >> + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) >> + return false; >> + } >> + >> + if (dump_enabled_p ()) >> + { >> + dump_printf_loc (MSG_NOTE, vect_location, >> + "dependence distance between "); >> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); >> + dump_printf (MSG_NOTE, " and "); >> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); >> + dump_printf (MSG_NOTE, " is >= VF\n"); >> + } >> + >> + return true; >> +} >> + >> /* Function vect_prune_runtime_alias_test_list. >> >> Prune a list of ddrs to be tested at run-time by versioning for alias. >> @@ -2908,6 +3006,10 @@ vect_prune_runtime_alias_test_list (loop >> >> comp_alias_ddrs.create (may_alias_ddrs.length ()); >> >> + unsigned int loop_depth >> + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, >> + LOOP_VINFO_LOOP_NEST (loop_vinfo)); >> + >> /* First, we collect all data ref pairs for aliasing checks. */ >> FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) >> { >> @@ -2917,6 +3019,11 @@ vect_prune_runtime_alias_test_list (loop >> tree segment_length_a, segment_length_b; >> gimple *stmt_a, *stmt_b; >> >> + /* Ignore the alias if the VF we chose ended up being no greater >> + than the dependence distance. */ >> + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) >> + continue; >> + >> dr_a = DDR_A (ddr); >> stmt_a = DR_STMT (DDR_A (ddr)); >> dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); >> @@ -2993,10 +3100,6 @@ vect_prune_runtime_alias_test_list (loop >> return false; >> } >> >> - /* All alias checks have been resolved at compilation time. */ >> - if (!comp_alias_ddrs.length ()) >> - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); >> - >> return true; >> } >> >> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c >> =================================================================== >> --- /dev/null 2017-07-27 10:25:31.671280760 +0100 >> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-07-27 > 13:10:33.022912357 +0100 >> @@ -0,0 +1,120 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-additional-options "--param > vect-max-version-for-alias-checks=0 -fopenmp-simd" } */ >> + >> +/* Intended to be larger than any VF. */ >> +#define GAP 128 >> +#define N (GAP * 3) >> + >> +struct s { int x[N + 1]; }; >> +struct t { struct s x[N + 1]; }; >> +struct u { int x[N + 1]; int y; }; >> +struct v { struct s s; }; >> + >> +void >> +f1 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[i]; >> +} >> + >> +void >> +f2 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[1].x[i] += b[2].x[i]; >> +} >> + >> +void >> +f3 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[1].x[i] += b[i].x[i]; >> +} >> + >> +void >> +f4 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[i].x[i] += b[i].x[i]; >> +} >> + >> +void >> +f5 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[i + 1]; >> +} >> + >> +void >> +f6 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[1].x[i] += b[2].x[i + 1]; >> +} >> + >> +void >> +f7 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[1].x[i] += b[i].x[i + 1]; >> +} >> + >> +void >> +f8 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[i].x[i] += b[i].x[i + 1]; >> +} >> + >> +void >> +f9 (struct s *a, struct t *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[1].x[i]; >> +} >> + >> +void >> +f10 (struct s *a, struct t *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[i].x[i]; >> +} >> + >> +void >> +f11 (struct u *a, struct u *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[i] + b[i].y; >> +} >> + >> +void >> +f12 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < GAP; ++i) >> + a->x[i + GAP] += b->x[i]; >> +} >> + >> +void >> +f13 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < GAP * 2; ++i) >> + a->x[i + GAP] += b->x[i]; >> +} >> + >> +void >> +f14 (struct v *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->s.x[i] = b->x[i]; >> +} >> + >> +void >> +f15 (struct s *a, struct s *b) >> +{ >> + #pragma omp simd safelen(N) >> + for (int i = 0; i < N; ++i) >> + a->x[i + 1] += b->x[i]; >> +} >> + >> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 15 "vect" } } */ >> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c >> =================================================================== >> --- /dev/null 2017-07-27 10:25:31.671280760 +0100 >> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-07-27 > 13:10:33.022912357 +0100 >> @@ -0,0 +1,35 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ >> + >> +#define N 16 >> + >> +struct s1 { int a[N]; }; >> +struct s2 { struct s1 b; int c; }; >> +struct s3 { int d; struct s1 e; }; >> +union u { struct s2 f; struct s3 g; }; >> + >> +/* We allow a and b to overlap arbitrarily. */ >> + >> +void >> +f1 (int a[][N], int b[][N]) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[0][i] += b[0][i]; >> +} >> + >> +void >> +f2 (union u *a, union u *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->f.b.a[i] += b->g.e.a[i]; >> +} >> + >> +void >> +f3 (struct s1 *a, struct s1 *b) >> +{ >> + for (int i = 0; i < N - 1; ++i) >> + a->a[i + 1] += b->a[i]; >> +} >> + >> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ >> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c >> =================================================================== >> --- /dev/null 2017-07-27 10:25:31.671280760 +0100 >> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-07-27 > 13:10:33.022912357 +0100 >> @@ -0,0 +1,19 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_int } */ >> + >> +/* Intended to be larger than any VF. */ >> +#define GAP 128 >> +#define N (GAP * 3) >> + >> +struct s { int x[N]; }; >> + >> +void >> +f1 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < GAP * 2; ++i) >> + a->x[i + GAP] += b->x[i]; >> +} >> + >> +/* { dg-final { scan-tree-dump-times "consider run-time aliasing" 1 > "vect" } } */ >> +/* { dg-final { scan-tree-dump-times "improved number of alias checks > from 1 to 0" 1 "vect" } } */ >> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */
On Fri, Aug 4, 2017 at 11:28 AM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >> On Thu, Jul 27, 2017 at 2:19 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> Richard Sandiford <richard.sandiford@linaro.org> writes: >>>> Eric Botcazou <ebotcazou@adacore.com> writes: >>>>> [Sorry for missing the previous messages] >>>>> >>>>>> Thanks. Just been retesting, and I think I must have forgotten >>>>>> to include Ada last time. It turns out that the patch causes a dg-scan >>>>>> regression in gnat.dg/vect17.adb, because we now think that if the >>>>>> array RECORD_TYPEs *do* alias in: >>>>>> >>>>>> procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is >>>>>> begin >>>>>> for I in Sarray'Range loop >>>>>> R(I) := X(I) + Y(I); >>>>>> end loop; >>>>>> end; >>>>>> >>>>>> then the dependence distance must be zero. Eric, does that hold true >>>>>> for Ada? I.e. if X and R (or Y and R) alias, must it be the case that >>>>>> X(I) can only alias R(I) and not for example R(I-1) or R(I+1)? >>>>> >>>>> Yes, I'd think so (even without the artificial RECORD_TYPE around >> the arrays). >>>> >>>> Good! >>>> >>>>>> 2017-06-07 Richard Sandiford <richard.sandiford@linaro.org> >>>>>> >>>>>> gcc/testsuite/ >>>>>> * gnat.dg/vect17.ads (Sarray): Increase range to 1 .. 5. >>>>>> * gnat.dg/vect17.adb (Add): Create a dependence distance of 1 >>>>>> when X = R or Y = R. >>>>> >>>>> I think that you need to modify vect15 and vect16 the same way. >>>> >>>> Ah, yeah. And doing that shows that I'd not handled safelen for >>>> DDR_COULD_BE_INDEPENDENT_P. I've fixed that locally. >>>> >>>> How does this look? Tested on x86_64-linux-gnu both without the >>>> vectoriser changes and with the fixed vectoriser patch. >>> >>> Here's a version of the patch that handles safelen. I split the >>> handling out into a new function (vect_analyze_possibly_independent_ddr) >>> since it was getting too big to do inline. >>> >>> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> Ok. > > Thanks! > >> Did you check whether BB vectorization is affected? See >> vect_slp_analyze_instance_dependence >> and friends. It's quite conservative but given the prefetching change >> I wonder if we need >> to rule out DDR_COULD_BE_INDEPENDENT_P? > > I think it should be OK. When DDR_COULD_BE_INDEPENDENT_P is set, > we've effectively changed from DDR_ARE_DEPENDENT == chrec_dont_know > to a conservatively-correct distance vector. It looks like > vect_slp_analyze_data_ref_dependence handles both cases in the > same way (by returning true). Yes. Could be improved of course. Thanks for double-checking. Richard. > Thanks, > Richard > >> >> Thanks, >> Richard. >> >>> Thanks, >>> Richard >>> >>> >>> 2017-07-27 Richard Sandiford <richard.sandiford@linaro.org> >>> >>> gcc/ >>> * tree-data-ref.h (subscript): Add access_fn field. >>> (data_dependence_relation): Add could_be_independent_p. >>> (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. >>> (same_access_functions): Move to tree-data-ref.c. >>> * tree-data-ref.c (ref_contains_union_access_p): New function. >>> (access_fn_component_p): Likewise. >>> (access_fn_components_comparable_p): Likewise. >>> (dr_analyze_indices): Add a reference to access_fn_component_p. >>> (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of >>> DR_ACCESS_FN. >>> (constant_access_functions): Likewise. >>> (add_other_self_distances): Likewise. >>> (same_access_functions): Likewise. (Moved from tree-data-ref.h.) >>> (initialize_data_dependence_relation): Use XCNEW and remove >>> explicit zeroing of DDR_REVERSED_P. Look for a subsequence >>> of access functions that have the same type. Allow the >>> subsequence to end with different bases in some circumstances. >>> Record the chosen access functions in SUB_ACCESS_FN. >>> (build_classic_dist_vector_1): Replace ddr_a and ddr_b with >>> a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. >>> (subscript_dependence_tester_1): Likewise dra and drb. >>> (build_classic_dist_vector): Update calls accordingly. >>> (subscript_dependence_tester): Likewise. >>> * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check >>> DDR_COULD_BE_INDEPENDENT_P. >>> * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test >>> comp_alias_ddrs instead of may_alias_ddrs. >>> * tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr): >>> New function. >>> (vect_analyze_data_ref_dependence): Use it if >>> DDR_COULD_BE_INDEPENDENT_P, but fall back to using the recorded >>> distance vectors if that fails. >>> (dependence_distance_ge_vf): New function. >>> (vect_prune_runtime_alias_test_list): Use it. Don't clear >>> LOOP_VINFO_MAY_ALIAS_DDRS. >>> >>> gcc/testsuite/ >>> * gcc.dg/vect/vect-alias-check-3.c: New test. >>> * gcc.dg/vect/vect-alias-check-4.c: Likewise. >>> * gcc.dg/vect/vect-alias-check-5.c: Likewise. >>> >>> Index: gcc/tree-data-ref.h >>> =================================================================== >>> --- gcc/tree-data-ref.h 2017-07-27 13:10:29.620045506 +0100 >>> +++ gcc/tree-data-ref.h 2017-07-27 13:10:33.023912613 +0100 >>> @@ -260,6 +260,9 @@ struct conflict_function >>> >>> struct subscript >>> { >>> + /* The access functions of the two references. */ >>> + tree access_fn[2]; >>> + >>> /* A description of the iterations for which the elements are >>> accessed twice. */ >>> conflict_function *conflicting_iterations_in_a; >>> @@ -278,6 +281,7 @@ struct subscript >>> >>> typedef struct subscript *subscript_p; >>> >>> +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] >>> #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a >>> #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b >>> #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict >>> @@ -333,6 +337,33 @@ struct data_dependence_relation >>> /* Set to true when the dependence relation is on the same data >>> access. */ >>> bool self_reference_p; >>> + >>> + /* True if the dependence described is conservatively correct rather >>> + than exact, and if it is still possible for the accesses to be >>> + conditionally independent. For example, the a and b references in: >>> + >>> + struct s *a, *b; >>> + for (int i = 0; i < n; ++i) >>> + a->f[i] += b->f[i]; >>> + >>> + conservatively have a distance vector of (0), for the case in which >>> + a == b, but the accesses are independent if a != b. Similarly, >>> + the a and b references in: >>> + >>> + struct s *a, *b; >>> + for (int i = 0; i < n; ++i) >>> + a[0].f[i] += b[i].f[i]; >>> + >>> + conservatively have a distance vector of (0), but they are indepenent >>> + when a != b + i. In contrast, the references in: >>> + >>> + struct s *a; >>> + for (int i = 0; i < n; ++i) >>> + a->f[i] += a->f[i]; >>> + >>> + have the same distance vector of (0), but the accesses can never be >>> + independent. */ >>> + bool could_be_independent_p; >>> }; >>> >>> typedef struct data_dependence_relation *ddr_p; >>> @@ -363,6 +394,7 @@ #define DDR_DIR_VECT(DDR, I) \ >>> #define DDR_DIST_VECT(DDR, I) \ >>> DDR_DIST_VECTS (DDR)[I] >>> #define DDR_REVERSED_P(DDR) (DDR)->reversed_p >>> +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p >>> >>> >>> bool dr_analyze_innermost (innermost_loop_behavior *, tree, struct loop *); >>> @@ -457,22 +489,6 @@ same_data_refs (data_reference_p a, data >>> return false; >>> >>> return true; >>> -} >>> - >>> -/* Return true when the DDR contains two data references that have the >>> - same access functions. */ >>> - >>> -static inline bool >>> -same_access_functions (const struct data_dependence_relation *ddr) >>> -{ >>> - unsigned i; >>> - >>> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >>> - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), >>> - DR_ACCESS_FN (DDR_B (ddr), i))) >>> - return false; >>> - >>> - return true; >>> } >>> >>> /* Returns true when all the dependences are computable. */ >>> Index: gcc/tree-data-ref.c >>> =================================================================== >>> --- gcc/tree-data-ref.c 2017-07-27 13:10:29.620045506 +0100 >>> +++ gcc/tree-data-ref.c 2017-07-27 13:10:33.023912613 +0100 >>> @@ -124,8 +124,7 @@ Software Foundation; either version 3, o >>> } dependence_stats; >>> >>> static bool subscript_dependence_tester_1 (struct data_dependence_relation *, >>> - struct data_reference *, >>> - struct data_reference *, >>> + unsigned int, unsigned int, >>> struct loop *); >>> /* Returns true iff A divides B. */ >>> >>> @@ -145,6 +144,21 @@ int_divides_p (int a, int b) >>> return ((b % a) == 0); >>> } >>> >>> +/* Return true if reference REF contains a union access. */ >>> + >>> +static bool >>> +ref_contains_union_access_p (tree ref) >>> +{ >>> + while (handled_component_p (ref)) >>> + { >>> + ref = TREE_OPERAND (ref, 0); >>> + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE >>> + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) >>> + return true; >>> + } >>> + return false; >>> +} >>> + >>> >>> >>> /* Dump into FILE all the data references from DATAREFS. */ >>> @@ -434,13 +448,14 @@ dump_data_dependence_relation (FILE *out >>> unsigned int i; >>> struct loop *loopi; >>> >>> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >>> + subscript *sub; >>> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >>> { >>> fprintf (outf, " access_fn_A: "); >>> - print_generic_stmt (outf, DR_ACCESS_FN (dra, i)); >>> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0)); >>> fprintf (outf, " access_fn_B: "); >>> - print_generic_stmt (outf, DR_ACCESS_FN (drb, i)); >>> - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); >>> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1)); >>> + dump_subscript (outf, sub); >>> } >>> >>> fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); >>> @@ -920,6 +935,27 @@ dr_analyze_innermost (innermost_loop_beh >>> return true; >>> } >>> >>> +/* Return true if OP is a valid component reference for a DR access >>> + function. This accepts a subset of what handled_component_p accepts. */ >>> + >>> +static bool >>> +access_fn_component_p (tree op) >>> +{ >>> + switch (TREE_CODE (op)) >>> + { >>> + case REALPART_EXPR: >>> + case IMAGPART_EXPR: >>> + case ARRAY_REF: >>> + return true; >>> + >>> + case COMPONENT_REF: >>> + return TREE_CODE (TREE_TYPE (TREE_OPERAND (op, 0))) == RECORD_TYPE; >>> + >>> + default: >>> + return false; >>> + } >>> +} >>> + >>> /* Determines the base object and the list of indices of memory reference >>> DR, analyzed in LOOP and instantiated in loop nest NEST. */ >>> >>> @@ -957,7 +993,9 @@ dr_analyze_indices (struct data_referenc >>> access_fns.safe_push (integer_one_node); >>> } >>> >>> - /* Analyze access functions of dimensions we know to be independent. */ >>> + /* Analyze access functions of dimensions we know to be independent. >>> + The list of component references handled here should be kept in >>> + sync with access_fn_component_p. */ >>> while (handled_component_p (ref)) >>> { >>> if (TREE_CODE (ref) == ARRAY_REF) >>> @@ -2148,6 +2186,38 @@ dr_may_alias_p (const struct data_refere >>> return refs_may_alias_p (addr_a, addr_b); >>> } >>> >>> +/* REF_A and REF_B both satisfy access_fn_component_p. Return true >>> + if it is meaningful to compare their associated access functions >>> + when checking for dependencies. */ >>> + >>> +static bool >>> +access_fn_components_comparable_p (tree ref_a, tree ref_b) >>> +{ >>> + /* Allow pairs of component refs from the following sets: >>> + >>> + { REALPART_EXPR, IMAGPART_EXPR } >>> + { COMPONENT_REF } >>> + { ARRAY_REF }. */ >>> + tree_code code_a = TREE_CODE (ref_a); >>> + tree_code code_b = TREE_CODE (ref_b); >>> + if (code_a == IMAGPART_EXPR) >>> + code_a = REALPART_EXPR; >>> + if (code_b == IMAGPART_EXPR) >>> + code_b = REALPART_EXPR; >>> + if (code_a != code_b) >>> + return false; >>> + >>> + if (TREE_CODE (ref_a) == COMPONENT_REF) >>> + /* ??? We cannot simply use the type of operand #0 of the refs here as >>> + the Fortran compiler smuggles type punning into COMPONENT_REFs. >>> + Use the DECL_CONTEXT of the FIELD_DECLs instead. */ >>> + return (DECL_CONTEXT (TREE_OPERAND (ref_a, 1)) >>> + == DECL_CONTEXT (TREE_OPERAND (ref_b, 1))); >>> + >>> + return types_compatible_p (TREE_TYPE (TREE_OPERAND (ref_a, 0)), >>> + TREE_TYPE (TREE_OPERAND (ref_b, 0))); >>> +} >>> + >>> /* Initialize a data dependence relation between data accesses A and >>> B. NB_LOOPS is the number of loops surrounding the references: the >>> size of the classic distance/direction vectors. */ >>> @@ -2160,11 +2230,10 @@ initialize_data_dependence_relation (str >>> struct data_dependence_relation *res; >>> unsigned int i; >>> >>> - res = XNEW (struct data_dependence_relation); >>> + res = XCNEW (struct data_dependence_relation); >>> DDR_A (res) = a; >>> DDR_B (res) = b; >>> DDR_LOOP_NEST (res).create (0); >>> - DDR_REVERSED_P (res) = false; >>> DDR_SUBSCRIPTS (res).create (0); >>> DDR_DIR_VECTS (res).create (0); >>> DDR_DIST_VECTS (res).create (0); >>> @@ -2182,82 +2251,277 @@ initialize_data_dependence_relation (str >>> return res; >>> } >>> >>> - /* The case where the references are exactly the same. */ >>> - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) >>> + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); >>> + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); >>> + if (num_dimensions_a == 0 || num_dimensions_b == 0) >>> { >>> - if ((loop_nest.exists () >>> - && !object_address_invariant_in_loop_p (loop_nest[0], >>> - DR_BASE_OBJECT (a))) >>> - || DR_NUM_DIMENSIONS (a) == 0) >>> + DDR_ARE_DEPENDENT (res) = chrec_dont_know; >>> + return res; >>> + } >>> + >>> + /* For unconstrained bases, the root (highest-indexed) subscript >>> + describes a variation in the base of the original DR_REF rather >>> + than a component access. We have no type that accurately describes >>> + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* >>> + applying this subscript) so limit the search to the last real >>> + component access. >>> + >>> + E.g. for: >>> + >>> + void >>> + f (int a[][8], int b[][8]) >>> + { >>> + for (int i = 0; i < 8; ++i) >>> + a[i * 2][0] = b[i][0]; >>> + } >>> + >>> + the a and b accesses have a single ARRAY_REF component reference [0] >>> + but have two subscripts. */ >>> + if (DR_UNCONSTRAINED_BASE (a)) >>> + num_dimensions_a -= 1; >>> + if (DR_UNCONSTRAINED_BASE (b)) >>> + num_dimensions_b -= 1; >>> + >>> + /* These structures describe sequences of component references in >>> + DR_REF (A) and DR_REF (B). Each component reference is tied to a >>> + specific access function. */ >>> + struct { >>> + /* The sequence starts at DR_ACCESS_FN (A, START_A) of A and >>> + DR_ACCESS_FN (B, START_B) of B (inclusive) and extends to higher >>> + indices. In C notation, these are the indices of the rightmost >>> + component references; e.g. for a sequence .b.c.d, the start >>> + index is for .d. */ >>> + unsigned int start_a; >>> + unsigned int start_b; >>> + >>> + /* The sequence contains LENGTH consecutive access functions from >>> + each DR. */ >>> + unsigned int length; >>> + >>> + /* The enclosing objects for the A and B sequences respectively, >>> + i.e. the objects to which DR_ACCESS_FN (A, START_A + LENGTH - 1) >>> + and DR_ACCESS_FN (B, START_B + LENGTH - 1) are applied. */ >>> + tree object_a; >>> + tree object_b; >>> + } full_seq = {}, struct_seq = {}; >>> + >>> + /* Before each iteration of the loop: >>> + >>> + - REF_A is what you get after applying DR_ACCESS_FN (A, INDEX_A) and >>> + - REF_B is what you get after applying DR_ACCESS_FN (B, INDEX_B). */ >>> + unsigned int index_a = 0; >>> + unsigned int index_b = 0; >>> + tree ref_a = DR_REF (a); >>> + tree ref_b = DR_REF (b); >>> + >>> + /* Now walk the component references from the final DR_REFs back up to >>> + the enclosing base objects. Each component reference corresponds >>> + to one access function in the DR, with access function 0 being for >>> + the final DR_REF and the highest-indexed access function being the >>> + one that is applied to the base of the DR. >>> + >>> + Look for a sequence of component references whose access functions >>> + are comparable (see access_fn_components_comparable_p). If more >>> + than one such sequence exists, pick the one nearest the base >>> + (which is the leftmost sequence in C notation). Store this sequence >>> + in FULL_SEQ. >>> + >>> + For example, if we have: >>> + >>> + struct foo { struct bar s; ... } (*a)[10], (*b)[10]; >>> + >>> + A: a[0][i].s.c.d >>> + B: __real b[0][i].s.e[i].f >>> + >>> + (where d is the same type as the real component of f) then the access >>> + functions would be: >>> + >>> + 0 1 2 3 >>> + A: .d .c .s [i] >>> + >>> + 0 1 2 3 4 5 >>> + B: __real .f [i] .e .s [i] >>> + >>> + The A0/B2 column isn't comparable, since .d is a COMPONENT_REF >>> + and [i] is an ARRAY_REF. However, the A1/B3 column contains two >>> + COMPONENT_REF accesses for struct bar, so is comparable. Likewise >>> + the A2/B4 column contains two COMPONENT_REF accesses for struct foo, >>> + so is comparable. The A3/B5 column contains two ARRAY_REFs that >>> + index foo[10] arrays, so is again comparable. The sequence is >>> + therefore: >>> + >>> + A: [1, 3] (i.e. [i].s.c) >>> + B: [3, 5] (i.e. [i].s.e) >>> + >>> + Also look for sequences of component references whose access >>> + functions are comparable and whose enclosing objects have the same >>> + RECORD_TYPE. Store this sequence in STRUCT_SEQ. In the above >>> + example, STRUCT_SEQ would be: >>> + >>> + A: [1, 2] (i.e. s.c) >>> + B: [3, 4] (i.e. s.e) */ >>> + while (index_a < num_dimensions_a && index_b < num_dimensions_b) >>> + { >>> + /* REF_A and REF_B must be one of the component access types >>> + allowed by dr_analyze_indices. */ >>> + gcc_checking_assert (access_fn_component_p (ref_a)); >>> + gcc_checking_assert (access_fn_component_p (ref_b)); >>> + >>> + /* Get the immediately-enclosing objects for REF_A and REF_B, >>> + i.e. the references *before* applying DR_ACCESS_FN (A, INDEX_A) >>> + and DR_ACCESS_FN (B, INDEX_B). */ >>> + tree object_a = TREE_OPERAND (ref_a, 0); >>> + tree object_b = TREE_OPERAND (ref_b, 0); >>> + >>> + tree type_a = TREE_TYPE (object_a); >>> + tree type_b = TREE_TYPE (object_b); >>> + if (access_fn_components_comparable_p (ref_a, ref_b)) >>> + { >>> + /* This pair of component accesses is comparable for dependence >>> + analysis, so we can include DR_ACCESS_FN (A, INDEX_A) and >>> + DR_ACCESS_FN (B, INDEX_B) in the sequence. */ >>> + if (full_seq.start_a + full_seq.length != index_a >>> + || full_seq.start_b + full_seq.length != index_b) >>> + { >>> + /* The accesses don't extend the current sequence, >>> + so start a new one here. */ >>> + full_seq.start_a = index_a; >>> + full_seq.start_b = index_b; >>> + full_seq.length = 0; >>> + } >>> + >>> + /* Add this pair of references to the sequence. */ >>> + full_seq.length += 1; >>> + full_seq.object_a = object_a; >>> + full_seq.object_b = object_b; >>> + >>> + /* If the enclosing objects are structures (and thus have the >>> + same RECORD_TYPE), record the new sequence in STRUCT_SEQ. */ >>> + if (TREE_CODE (type_a) == RECORD_TYPE) >>> + struct_seq = full_seq; >>> + >>> + /* Move to the next containing reference for both A and B. */ >>> + ref_a = object_a; >>> + ref_b = object_b; >>> + index_a += 1; >>> + index_b += 1; >>> + continue; >>> + } >>> + >>> + /* Try to approach equal type sizes. */ >>> + if (!COMPLETE_TYPE_P (type_a) >>> + || !COMPLETE_TYPE_P (type_b) >>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) >>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) >>> + break; >>> + >>> + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); >>> + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); >>> + if (size_a <= size_b) >>> { >>> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >>> - return res; >>> + index_a += 1; >>> + ref_a = object_a; >>> + } >>> + if (size_b <= size_a) >>> + { >>> + index_b += 1; >>> + ref_b = object_b; >>> } >>> - DDR_AFFINE_P (res) = true; >>> - DDR_ARE_DEPENDENT (res) = NULL_TREE; >>> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); >>> - DDR_LOOP_NEST (res) = loop_nest; >>> - DDR_INNER_LOOP (res) = 0; >>> - DDR_SELF_REFERENCE (res) = true; >>> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) >>> - { >>> - struct subscript *subscript; >>> - >>> - subscript = XNEW (struct subscript); >>> - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); >>> - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); >>> - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; >>> - SUB_DISTANCE (subscript) = chrec_dont_know; >>> - DDR_SUBSCRIPTS (res).safe_push (subscript); >>> - } >>> - return res; >>> } >>> >>> - /* If the references do not access the same object, we do not know >>> - whether they alias or not. We do not care about TBAA or alignment >>> - info so we can use OEP_ADDRESS_OF to avoid false negatives. >>> - But the accesses have to use compatible types as otherwise the >>> - built indices would not match. */ >>> - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), >> OEP_ADDRESS_OF) >>> - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), >>> - TREE_TYPE (DR_BASE_OBJECT (b)))) >>> + /* See whether FULL_SEQ ends at the base and whether the two bases >>> + are equal. We do not care about TBAA or alignment info so we can >>> + use OEP_ADDRESS_OF to avoid false negatives. */ >>> + tree base_a = DR_BASE_OBJECT (a); >>> + tree base_b = DR_BASE_OBJECT (b); >>> + bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a >>> + && full_seq.start_b + full_seq.length == num_dimensions_b >>> + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) >>> + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) >>> + && types_compatible_p (TREE_TYPE (base_a), >>> + TREE_TYPE (base_b)) >>> + && (!loop_nest.exists () >>> + || (object_address_invariant_in_loop_p >>> + (loop_nest[0], base_a)))); >>> + >>> + /* If the bases are the same, we can include the base variation too. >>> + E.g. the b accesses in: >>> + >>> + for (int i = 0; i < n; ++i) >>> + b[i + 4][0] = b[i][0]; >>> + >>> + have a definite dependence distance of 4, while for: >>> + >>> + for (int i = 0; i < n; ++i) >>> + a[i + 4][0] = b[i][0]; >>> + >>> + the dependence distance depends on the gap between a and b. >>> + >>> + If the bases are different then we can only rely on the sequence >>> + rooted at a structure access, since arrays are allowed to overlap >>> + arbitrarily and change shape arbitrarily. E.g. we treat this as >>> + valid code: >>> + >>> + int a[256]; >>> + ... >>> + ((int (*)[4][3]) &a[1])[i][0] += ((int (*)[4][3]) &a[2])[i][0]; >>> + >>> + where two lvalues with the same int[4][3] type overlap, and where >>> + both lvalues are distinct from the object's declared type. */ >>> + if (same_base_p) >>> { >>> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >>> - return res; >>> + if (DR_UNCONSTRAINED_BASE (a)) >>> + full_seq.length += 1; >>> } >>> + else >>> + full_seq = struct_seq; >>> >>> - /* If the base of the object is not invariant in the loop nest, we cannot >>> - analyze it. TODO -- in fact, it would suffice to record that there may >>> - be arbitrary dependences in the loops where the base object varies. */ >>> - if ((loop_nest.exists () >>> - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT >> (a))) >>> - || DR_NUM_DIMENSIONS (a) == 0) >>> + /* Punt if we didn't find a suitable sequence. */ >>> + if (full_seq.length == 0) >>> { >>> DDR_ARE_DEPENDENT (res) = chrec_dont_know; >>> return res; >>> } >>> >>> - /* If the number of dimensions of the access to not agree we can have >>> - a pointer access to a component of the array element type and an >>> - array access while the base-objects are still the same. Punt. */ >>> - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) >>> + if (!same_base_p) >>> { >>> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >>> - return res; >>> + /* Partial overlap is possible for different bases when strict aliasing >>> + is not in effect. It's also possible if either base involves a union >>> + access; e.g. for: >>> + >>> + struct s1 { int a[2]; }; >>> + struct s2 { struct s1 b; int c; }; >>> + struct s3 { int d; struct s1 e; }; >>> + union u { struct s2 f; struct s3 g; } *p, *q; >>> + >>> + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at >>> + "p->g.e" (base "p->g") and might partially overlap the s1 at >>> + "q->g.e" (base "q->g"). */ >>> + if (!flag_strict_aliasing >>> + || ref_contains_union_access_p (full_seq.object_a) >>> + || ref_contains_union_access_p (full_seq.object_b)) >>> + { >>> + DDR_ARE_DEPENDENT (res) = chrec_dont_know; >>> + return res; >>> + } >>> + >>> + DDR_COULD_BE_INDEPENDENT_P (res) = true; >>> } >>> >>> DDR_AFFINE_P (res) = true; >>> DDR_ARE_DEPENDENT (res) = NULL_TREE; >>> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); >>> + DDR_SUBSCRIPTS (res).create (full_seq.length); >>> DDR_LOOP_NEST (res) = loop_nest; >>> DDR_INNER_LOOP (res) = 0; >>> DDR_SELF_REFERENCE (res) = false; >>> >>> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) >>> + for (i = 0; i < full_seq.length; ++i) >>> { >>> struct subscript *subscript; >>> >>> subscript = XNEW (struct subscript); >>> + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i); >>> + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i); >>> SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); >>> SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); >>> SUB_LAST_CONFLICT (subscript) = chrec_dont_know; >>> @@ -3839,14 +4103,15 @@ add_outer_distances (struct data_depende >>> } >>> >>> /* Return false when fail to represent the data dependence as a >>> - distance vector. INIT_B is set to true when a component has been >>> + distance vector. A_INDEX is the index of the first reference >>> + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the >>> + second reference. INIT_B is set to true when a component has been >>> added to the distance vector DIST_V. INDEX_CARRY is then set to >>> the index in DIST_V that carries the dependence. */ >>> >>> static bool >>> build_classic_dist_vector_1 (struct data_dependence_relation *ddr, >>> - struct data_reference *ddr_a, >>> - struct data_reference *ddr_b, >>> + unsigned int a_index, unsigned int b_index, >>> lambda_vector dist_v, bool *init_b, >>> int *index_carry) >>> { >>> @@ -3864,8 +4129,8 @@ build_classic_dist_vector_1 (struct data >>> return false; >>> } >>> >>> - access_fn_a = DR_ACCESS_FN (ddr_a, i); >>> - access_fn_b = DR_ACCESS_FN (ddr_b, i); >>> + access_fn_a = SUB_ACCESS_FN (subscript, a_index); >>> + access_fn_b = SUB_ACCESS_FN (subscript, b_index); >>> >>> if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC >>> && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) >>> @@ -3925,10 +4190,11 @@ build_classic_dist_vector_1 (struct data >>> constant_access_functions (const struct data_dependence_relation *ddr) >>> { >>> unsigned i; >>> + subscript *sub; >>> >>> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >>> - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) >>> - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) >>> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >>> + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) >>> + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) >>> return false; >>> >>> return true; >>> @@ -3991,10 +4257,11 @@ add_other_self_distances (struct data_de >>> lambda_vector dist_v; >>> unsigned i; >>> int index_carry = DDR_NB_LOOPS (ddr); >>> + subscript *sub; >>> >>> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >>> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >>> { >>> - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); >>> + tree access_fun = SUB_ACCESS_FN (sub, 0); >>> >>> if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) >>> { >>> @@ -4006,7 +4273,7 @@ add_other_self_distances (struct data_de >>> return; >>> } >>> >>> - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); >>> + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); >>> >>> if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) >>> add_multivariate_self_dist (ddr, access_fun); >>> @@ -4077,6 +4344,23 @@ add_distance_for_zero_overlaps (struct d >>> } >>> } >>> >>> +/* Return true when the DDR contains two data references that have the >>> + same access functions. */ >>> + >>> +static inline bool >>> +same_access_functions (const struct data_dependence_relation *ddr) >>> +{ >>> + unsigned i; >>> + subscript *sub; >>> + >>> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >>> + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), >>> + SUB_ACCESS_FN (sub, 1))) >>> + return false; >>> + >>> + return true; >>> +} >>> + >>> /* Compute the classic per loop distance vector. DDR is the data >>> dependence relation to build a vector from. Return false when fail >>> to represent the data dependence as a distance vector. */ >>> @@ -4108,8 +4392,7 @@ build_classic_dist_vector (struct data_d >>> } >>> >>> dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >>> - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), >>> - dist_v, &init_b, &index_carry)) >>> + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, >> &index_carry)) >>> return false; >>> >>> /* Save the distance vector if we initialized one. */ >>> @@ -4142,12 +4425,11 @@ build_classic_dist_vector (struct data_d >>> if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) >>> { >>> lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >>> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), >>> - loop_nest)) >>> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) >>> return false; >>> compute_subscript_distance (ddr); >>> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), >>> - save_v, &init_b, &index_carry)) >>> + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, >>> + &index_carry)) >>> return false; >>> save_dist_v (ddr, save_v); >>> DDR_REVERSED_P (ddr) = true; >>> @@ -4183,12 +4465,10 @@ build_classic_dist_vector (struct data_d >>> { >>> lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >>> >>> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), >>> - DDR_A (ddr), loop_nest)) >>> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) >>> return false; >>> compute_subscript_distance (ddr); >>> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), >>> - opposite_v, &init_b, >>> + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, >>> &index_carry)) >>> return false; >>> >>> @@ -4267,13 +4547,13 @@ build_classic_dir_vector (struct data_de >>> } >>> } >>> >>> -/* Helper function. Returns true when there is a dependence between >>> - data references DRA and DRB. */ >>> +/* Helper function. Returns true when there is a dependence between the >>> + data references. A_INDEX is the index of the first reference (0 for >>> + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ >>> >>> static bool >>> subscript_dependence_tester_1 (struct data_dependence_relation *ddr, >>> - struct data_reference *dra, >>> - struct data_reference *drb, >>> + unsigned int a_index, unsigned int b_index, >>> struct loop *loop_nest) >>> { >>> unsigned int i; >>> @@ -4285,8 +4565,8 @@ subscript_dependence_tester_1 (struct da >>> { >>> conflict_function *overlaps_a, *overlaps_b; >>> >>> - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), >>> - DR_ACCESS_FN (drb, i), >>> + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), >>> + SUB_ACCESS_FN (subscript, b_index), >>> &overlaps_a, &overlaps_b, >>> &last_conflicts, loop_nest); >>> >>> @@ -4335,7 +4615,7 @@ subscript_dependence_tester_1 (struct da >>> subscript_dependence_tester (struct data_dependence_relation *ddr, >>> struct loop *loop_nest) >>> { >>> - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), >> loop_nest)) >>> + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) >>> dependence_stats.num_dependence_dependent++; >>> >>> compute_subscript_distance (ddr); >>> Index: gcc/tree-ssa-loop-prefetch.c >>> =================================================================== >>> --- gcc/tree-ssa-loop-prefetch.c 2017-07-27 13:10:29.620045506 +0100 >>> +++ gcc/tree-ssa-loop-prefetch.c 2017-07-27 13:10:33.023912613 +0100 >>> @@ -1668,6 +1668,7 @@ determine_loop_nest_reuse (struct loop * >>> refb = (struct mem_ref *) DDR_B (dep)->aux; >>> >>> if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know >>> + || DDR_COULD_BE_INDEPENDENT_P (dep) >>> || DDR_NUM_DIST_VECTS (dep) == 0) >>> { >>> /* If the dependence cannot be analyzed, assume that there might be >>> Index: gcc/tree-vectorizer.h >>> =================================================================== >>> --- gcc/tree-vectorizer.h 2017-07-27 13:10:29.620045506 +0100 >>> +++ gcc/tree-vectorizer.h 2017-07-27 13:10:33.024912868 +0100 >>> @@ -358,7 +358,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) >>> #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ >>> ((L)->may_misalign_stmts.length () > 0) >>> #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ >>> - ((L)->may_alias_ddrs.length () > 0) >>> + ((L)->comp_alias_ddrs.length () > 0) >>> #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ >>> (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) >>> #define LOOP_REQUIRES_VERSIONING(L) \ >>> Index: gcc/tree-vect-data-refs.c >>> =================================================================== >>> --- gcc/tree-vect-data-refs.c 2017-07-27 13:10:29.620045506 +0100 >>> +++ gcc/tree-vect-data-refs.c 2017-07-27 13:10:33.024912868 +0100 >>> @@ -160,6 +160,60 @@ vect_mark_for_runtime_alias_test (ddr_p >>> } >>> >>> >>> +/* A subroutine of vect_analyze_data_ref_dependence. Handle >>> + DDR_COULD_BE_INDEPENDENT_P ddr DDR that has a known set of dependence >>> + distances. These distances are conservatively correct but they don't >>> + reflect a guaranteed dependence. >>> + >>> + Return true if this function does all the work necessary to avoid >>> + an alias or false if the caller should use the dependence distances >>> + to limit the vectorization factor in the usual way. LOOP_DEPTH is >>> + the depth of the loop described by LOOP_VINFO and the other arguments >>> + are as for vect_analyze_data_ref_dependence. */ >>> + >>> +static bool >>> +vect_analyze_possibly_independent_ddr (data_dependence_relation *ddr, >>> + loop_vec_info loop_vinfo, >>> + int loop_depth, int *max_vf) >>> +{ >>> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); >>> + lambda_vector dist_v; >>> + unsigned int i; >>> + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) >>> + { >>> + int dist = dist_v[loop_depth]; >>> + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) >>> + { >>> + /* If the user asserted safelen >= DIST consecutive iterations >>> + can be executed concurrently, assume independence. >>> + >>> + ??? An alternative would be to add the alias check even >>> + in this case, and vectorize the fallback loop with the >>> + maximum VF set to safelen. However, if the user has >>> + explicitly given a length, it's less likely that that >>> + would be a win. */ >>> + if (loop->safelen >= 2 && abs_hwi (dist) <= loop->safelen) >>> + { >>> + if (loop->safelen < *max_vf) >>> + *max_vf = loop->safelen; >>> + LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo) = false; >>> + continue; >>> + } >>> + >>> + /* For dependence distances of 2 or more, we have the option >>> + of limiting VF or checking for an alias at runtime. >>> + Prefer to check at runtime if we can, to avoid limiting >>> + the VF unnecessarily when the bases are in fact independent. >>> + >>> + Note that the alias checks will be removed if the VF ends up >>> + being small enough. */ >>> + return vect_mark_for_runtime_alias_test (ddr, loop_vinfo); >>> + } >>> + } >>> + return true; >>> +} >>> + >>> + >>> /* Function vect_analyze_data_ref_dependence. >>> >>> Return TRUE if there (might) exist a dependence between a memory-reference >>> @@ -305,6 +359,12 @@ vect_analyze_data_ref_dependence (struct >>> } >>> >>> loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); >>> + >>> + if (DDR_COULD_BE_INDEPENDENT_P (ddr) >>> + && vect_analyze_possibly_independent_ddr (ddr, loop_vinfo, >>> + loop_depth, max_vf)) >>> + return false; >>> + >>> FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) >>> { >>> int dist = dist_v[loop_depth]; >>> @@ -2878,6 +2938,44 @@ vect_no_alias_p (struct data_reference * >>> return false; >>> } >>> >>> +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH >>> + in DDR is >= VF. */ >>> + >>> +static bool >>> +dependence_distance_ge_vf (data_dependence_relation *ddr, >>> + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) >>> +{ >>> + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE >>> + || DDR_NUM_DIST_VECTS (ddr) == 0) >>> + return false; >>> + >>> + /* If the dependence is exact, we should have limited the VF instead. */ >>> + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); >>> + >>> + unsigned int i; >>> + lambda_vector dist_v; >>> + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) >>> + { >>> + HOST_WIDE_INT dist = dist_v[loop_depth]; >>> + if (dist != 0 >>> + && !(dist > 0 && DDR_REVERSED_P (ddr)) >>> + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) >>> + return false; >>> + } >>> + >>> + if (dump_enabled_p ()) >>> + { >>> + dump_printf_loc (MSG_NOTE, vect_location, >>> + "dependence distance between "); >>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); >>> + dump_printf (MSG_NOTE, " and "); >>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); >>> + dump_printf (MSG_NOTE, " is >= VF\n"); >>> + } >>> + >>> + return true; >>> +} >>> + >>> /* Function vect_prune_runtime_alias_test_list. >>> >>> Prune a list of ddrs to be tested at run-time by versioning for alias. >>> @@ -2908,6 +3006,10 @@ vect_prune_runtime_alias_test_list (loop >>> >>> comp_alias_ddrs.create (may_alias_ddrs.length ()); >>> >>> + unsigned int loop_depth >>> + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, >>> + LOOP_VINFO_LOOP_NEST (loop_vinfo)); >>> + >>> /* First, we collect all data ref pairs for aliasing checks. */ >>> FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) >>> { >>> @@ -2917,6 +3019,11 @@ vect_prune_runtime_alias_test_list (loop >>> tree segment_length_a, segment_length_b; >>> gimple *stmt_a, *stmt_b; >>> >>> + /* Ignore the alias if the VF we chose ended up being no greater >>> + than the dependence distance. */ >>> + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) >>> + continue; >>> + >>> dr_a = DDR_A (ddr); >>> stmt_a = DR_STMT (DDR_A (ddr)); >>> dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); >>> @@ -2993,10 +3100,6 @@ vect_prune_runtime_alias_test_list (loop >>> return false; >>> } >>> >>> - /* All alias checks have been resolved at compilation time. */ >>> - if (!comp_alias_ddrs.length ()) >>> - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); >>> - >>> return true; >>> } >>> >>> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c >>> =================================================================== >>> --- /dev/null 2017-07-27 10:25:31.671280760 +0100 >>> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-07-27 >> 13:10:33.022912357 +0100 >>> @@ -0,0 +1,120 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-require-effective-target vect_int } */ >>> +/* { dg-additional-options "--param >> vect-max-version-for-alias-checks=0 -fopenmp-simd" } */ >>> + >>> +/* Intended to be larger than any VF. */ >>> +#define GAP 128 >>> +#define N (GAP * 3) >>> + >>> +struct s { int x[N + 1]; }; >>> +struct t { struct s x[N + 1]; }; >>> +struct u { int x[N + 1]; int y; }; >>> +struct v { struct s s; }; >>> + >>> +void >>> +f1 (struct s *a, struct s *b) >>> +{ >>> + for (int i = 0; i < N; ++i) >>> + a->x[i] += b->x[i]; >>> +} >>> + >>> +void >>> +f2 (struct s *a, struct s *b) >>> +{ >>> + for (int i = 0; i < N; ++i) >>> + a[1].x[i] += b[2].x[i]; >>> +} >>> + >>> +void >>> +f3 (struct s *a, struct s *b) >>> +{ >>> + for (int i = 0; i < N; ++i) >>> + a[1].x[i] += b[i].x[i]; >>> +} >>> + >>> +void >>> +f4 (struct s *a, struct s *b) >>> +{ >>> + for (int i = 0; i < N; ++i) >>> + a[i].x[i] += b[i].x[i]; >>> +} >>> + >>> +void >>> +f5 (struct s *a, struct s *b) >>> +{ >>> + for (int i = 0; i < N; ++i) >>> + a->x[i] += b->x[i + 1]; >>> +} >>> + >>> +void >>> +f6 (struct s *a, struct s *b) >>> +{ >>> + for (int i = 0; i < N; ++i) >>> + a[1].x[i] += b[2].x[i + 1]; >>> +} >>> + >>> +void >>> +f7 (struct s *a, struct s *b) >>> +{ >>> + for (int i = 0; i < N; ++i) >>> + a[1].x[i] += b[i].x[i + 1]; >>> +} >>> + >>> +void >>> +f8 (struct s *a, struct s *b) >>> +{ >>> + for (int i = 0; i < N; ++i) >>> + a[i].x[i] += b[i].x[i + 1]; >>> +} >>> + >>> +void >>> +f9 (struct s *a, struct t *b) >>> +{ >>> + for (int i = 0; i < N; ++i) >>> + a->x[i] += b->x[1].x[i]; >>> +} >>> + >>> +void >>> +f10 (struct s *a, struct t *b) >>> +{ >>> + for (int i = 0; i < N; ++i) >>> + a->x[i] += b->x[i].x[i]; >>> +} >>> + >>> +void >>> +f11 (struct u *a, struct u *b) >>> +{ >>> + for (int i = 0; i < N; ++i) >>> + a->x[i] += b->x[i] + b[i].y; >>> +} >>> + >>> +void >>> +f12 (struct s *a, struct s *b) >>> +{ >>> + for (int i = 0; i < GAP; ++i) >>> + a->x[i + GAP] += b->x[i]; >>> +} >>> + >>> +void >>> +f13 (struct s *a, struct s *b) >>> +{ >>> + for (int i = 0; i < GAP * 2; ++i) >>> + a->x[i + GAP] += b->x[i]; >>> +} >>> + >>> +void >>> +f14 (struct v *a, struct s *b) >>> +{ >>> + for (int i = 0; i < N; ++i) >>> + a->s.x[i] = b->x[i]; >>> +} >>> + >>> +void >>> +f15 (struct s *a, struct s *b) >>> +{ >>> + #pragma omp simd safelen(N) >>> + for (int i = 0; i < N; ++i) >>> + a->x[i + 1] += b->x[i]; >>> +} >>> + >>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 15 "vect" } } */ >>> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c >>> =================================================================== >>> --- /dev/null 2017-07-27 10:25:31.671280760 +0100 >>> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-07-27 >> 13:10:33.022912357 +0100 >>> @@ -0,0 +1,35 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-require-effective-target vect_int } */ >>> +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ >>> + >>> +#define N 16 >>> + >>> +struct s1 { int a[N]; }; >>> +struct s2 { struct s1 b; int c; }; >>> +struct s3 { int d; struct s1 e; }; >>> +union u { struct s2 f; struct s3 g; }; >>> + >>> +/* We allow a and b to overlap arbitrarily. */ >>> + >>> +void >>> +f1 (int a[][N], int b[][N]) >>> +{ >>> + for (int i = 0; i < N; ++i) >>> + a[0][i] += b[0][i]; >>> +} >>> + >>> +void >>> +f2 (union u *a, union u *b) >>> +{ >>> + for (int i = 0; i < N; ++i) >>> + a->f.b.a[i] += b->g.e.a[i]; >>> +} >>> + >>> +void >>> +f3 (struct s1 *a, struct s1 *b) >>> +{ >>> + for (int i = 0; i < N - 1; ++i) >>> + a->a[i + 1] += b->a[i]; >>> +} >>> + >>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ >>> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c >>> =================================================================== >>> --- /dev/null 2017-07-27 10:25:31.671280760 +0100 >>> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-07-27 >> 13:10:33.022912357 +0100 >>> @@ -0,0 +1,19 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-require-effective-target vect_int } */ >>> + >>> +/* Intended to be larger than any VF. */ >>> +#define GAP 128 >>> +#define N (GAP * 3) >>> + >>> +struct s { int x[N]; }; >>> + >>> +void >>> +f1 (struct s *a, struct s *b) >>> +{ >>> + for (int i = 0; i < GAP * 2; ++i) >>> + a->x[i + GAP] += b->x[i]; >>> +} >>> + >>> +/* { dg-final { scan-tree-dump-times "consider run-time aliasing" 1 >> "vect" } } */ >>> +/* { dg-final { scan-tree-dump-times "improved number of alias checks >> from 1 to 0" 1 "vect" } } */ >>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */
Index: gcc/tree-data-ref.h =================================================================== --- gcc/tree-data-ref.h 2017-07-27 13:10:29.620045506 +0100 +++ gcc/tree-data-ref.h 2017-07-27 13:10:33.023912613 +0100 @@ -260,6 +260,9 @@ struct conflict_function struct subscript { + /* The access functions of the two references. */ + tree access_fn[2]; + /* A description of the iterations for which the elements are accessed twice. */ conflict_function *conflicting_iterations_in_a; @@ -278,6 +281,7 @@ struct subscript typedef struct subscript *subscript_p; +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict @@ -333,6 +337,33 @@ struct data_dependence_relation /* Set to true when the dependence relation is on the same data access. */ bool self_reference_p; + + /* True if the dependence described is conservatively correct rather + than exact, and if it is still possible for the accesses to be + conditionally independent. For example, the a and b references in: + + struct s *a, *b; + for (int i = 0; i < n; ++i) + a->f[i] += b->f[i]; + + conservatively have a distance vector of (0), for the case in which + a == b, but the accesses are independent if a != b. Similarly, + the a and b references in: + + struct s *a, *b; + for (int i = 0; i < n; ++i) + a[0].f[i] += b[i].f[i]; + + conservatively have a distance vector of (0), but they are indepenent + when a != b + i. In contrast, the references in: + + struct s *a; + for (int i = 0; i < n; ++i) + a->f[i] += a->f[i]; + + have the same distance vector of (0), but the accesses can never be + independent. */ + bool could_be_independent_p; }; typedef struct data_dependence_relation *ddr_p; @@ -363,6 +394,7 @@ #define DDR_DIR_VECT(DDR, I) \ #define DDR_DIST_VECT(DDR, I) \ DDR_DIST_VECTS (DDR)[I] #define DDR_REVERSED_P(DDR) (DDR)->reversed_p +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p bool dr_analyze_innermost (innermost_loop_behavior *, tree, struct loop *); @@ -457,22 +489,6 @@ same_data_refs (data_reference_p a, data return false; return true; -} - -/* Return true when the DDR contains two data references that have the - same access functions. */ - -static inline bool -same_access_functions (const struct data_dependence_relation *ddr) -{ - unsigned i; - - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), - DR_ACCESS_FN (DDR_B (ddr), i))) - return false; - - return true; } /* Returns true when all the dependences are computable. */ Index: gcc/tree-data-ref.c =================================================================== --- gcc/tree-data-ref.c 2017-07-27 13:10:29.620045506 +0100 +++ gcc/tree-data-ref.c 2017-07-27 13:10:33.023912613 +0100 @@ -124,8 +124,7 @@ Software Foundation; either version 3, o } dependence_stats; static bool subscript_dependence_tester_1 (struct data_dependence_relation *, - struct data_reference *, - struct data_reference *, + unsigned int, unsigned int, struct loop *); /* Returns true iff A divides B. */ @@ -145,6 +144,21 @@ int_divides_p (int a, int b) return ((b % a) == 0); } +/* Return true if reference REF contains a union access. */ + +static bool +ref_contains_union_access_p (tree ref) +{ + while (handled_component_p (ref)) + { + ref = TREE_OPERAND (ref, 0); + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) + return true; + } + return false; +} + /* Dump into FILE all the data references from DATAREFS. */ @@ -434,13 +448,14 @@ dump_data_dependence_relation (FILE *out unsigned int i; struct loop *loopi; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) + subscript *sub; + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) { fprintf (outf, " access_fn_A: "); - print_generic_stmt (outf, DR_ACCESS_FN (dra, i)); + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0)); fprintf (outf, " access_fn_B: "); - print_generic_stmt (outf, DR_ACCESS_FN (drb, i)); - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1)); + dump_subscript (outf, sub); } fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); @@ -920,6 +935,27 @@ dr_analyze_innermost (innermost_loop_beh return true; } +/* Return true if OP is a valid component reference for a DR access + function. This accepts a subset of what handled_component_p accepts. */ + +static bool +access_fn_component_p (tree op) +{ + switch (TREE_CODE (op)) + { + case REALPART_EXPR: + case IMAGPART_EXPR: + case ARRAY_REF: + return true; + + case COMPONENT_REF: + return TREE_CODE (TREE_TYPE (TREE_OPERAND (op, 0))) == RECORD_TYPE; + + default: + return false; + } +} + /* Determines the base object and the list of indices of memory reference DR, analyzed in LOOP and instantiated in loop nest NEST. */ @@ -957,7 +993,9 @@ dr_analyze_indices (struct data_referenc access_fns.safe_push (integer_one_node); } - /* Analyze access functions of dimensions we know to be independent. */ + /* Analyze access functions of dimensions we know to be independent. + The list of component references handled here should be kept in + sync with access_fn_component_p. */ while (handled_component_p (ref)) { if (TREE_CODE (ref) == ARRAY_REF) @@ -2148,6 +2186,38 @@ dr_may_alias_p (const struct data_refere return refs_may_alias_p (addr_a, addr_b); } +/* REF_A and REF_B both satisfy access_fn_component_p. Return true + if it is meaningful to compare their associated access functions + when checking for dependencies. */ + +static bool +access_fn_components_comparable_p (tree ref_a, tree ref_b) +{ + /* Allow pairs of component refs from the following sets: + + { REALPART_EXPR, IMAGPART_EXPR } + { COMPONENT_REF } + { ARRAY_REF }. */ + tree_code code_a = TREE_CODE (ref_a); + tree_code code_b = TREE_CODE (ref_b); + if (code_a == IMAGPART_EXPR) + code_a = REALPART_EXPR; + if (code_b == IMAGPART_EXPR) + code_b = REALPART_EXPR; + if (code_a != code_b) + return false; + + if (TREE_CODE (ref_a) == COMPONENT_REF) + /* ??? We cannot simply use the type of operand #0 of the refs here as + the Fortran compiler smuggles type punning into COMPONENT_REFs. + Use the DECL_CONTEXT of the FIELD_DECLs instead. */ + return (DECL_CONTEXT (TREE_OPERAND (ref_a, 1)) + == DECL_CONTEXT (TREE_OPERAND (ref_b, 1))); + + return types_compatible_p (TREE_TYPE (TREE_OPERAND (ref_a, 0)), + TREE_TYPE (TREE_OPERAND (ref_b, 0))); +} + /* Initialize a data dependence relation between data accesses A and B. NB_LOOPS is the number of loops surrounding the references: the size of the classic distance/direction vectors. */ @@ -2160,11 +2230,10 @@ initialize_data_dependence_relation (str struct data_dependence_relation *res; unsigned int i; - res = XNEW (struct data_dependence_relation); + res = XCNEW (struct data_dependence_relation); DDR_A (res) = a; DDR_B (res) = b; DDR_LOOP_NEST (res).create (0); - DDR_REVERSED_P (res) = false; DDR_SUBSCRIPTS (res).create (0); DDR_DIR_VECTS (res).create (0); DDR_DIST_VECTS (res).create (0); @@ -2182,82 +2251,277 @@ initialize_data_dependence_relation (str return res; } - /* The case where the references are exactly the same. */ - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); + if (num_dimensions_a == 0 || num_dimensions_b == 0) { - if ((loop_nest.exists () - && !object_address_invariant_in_loop_p (loop_nest[0], - DR_BASE_OBJECT (a))) - || DR_NUM_DIMENSIONS (a) == 0) + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + /* For unconstrained bases, the root (highest-indexed) subscript + describes a variation in the base of the original DR_REF rather + than a component access. We have no type that accurately describes + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* + applying this subscript) so limit the search to the last real + component access. + + E.g. for: + + void + f (int a[][8], int b[][8]) + { + for (int i = 0; i < 8; ++i) + a[i * 2][0] = b[i][0]; + } + + the a and b accesses have a single ARRAY_REF component reference [0] + but have two subscripts. */ + if (DR_UNCONSTRAINED_BASE (a)) + num_dimensions_a -= 1; + if (DR_UNCONSTRAINED_BASE (b)) + num_dimensions_b -= 1; + + /* These structures describe sequences of component references in + DR_REF (A) and DR_REF (B). Each component reference is tied to a + specific access function. */ + struct { + /* The sequence starts at DR_ACCESS_FN (A, START_A) of A and + DR_ACCESS_FN (B, START_B) of B (inclusive) and extends to higher + indices. In C notation, these are the indices of the rightmost + component references; e.g. for a sequence .b.c.d, the start + index is for .d. */ + unsigned int start_a; + unsigned int start_b; + + /* The sequence contains LENGTH consecutive access functions from + each DR. */ + unsigned int length; + + /* The enclosing objects for the A and B sequences respectively, + i.e. the objects to which DR_ACCESS_FN (A, START_A + LENGTH - 1) + and DR_ACCESS_FN (B, START_B + LENGTH - 1) are applied. */ + tree object_a; + tree object_b; + } full_seq = {}, struct_seq = {}; + + /* Before each iteration of the loop: + + - REF_A is what you get after applying DR_ACCESS_FN (A, INDEX_A) and + - REF_B is what you get after applying DR_ACCESS_FN (B, INDEX_B). */ + unsigned int index_a = 0; + unsigned int index_b = 0; + tree ref_a = DR_REF (a); + tree ref_b = DR_REF (b); + + /* Now walk the component references from the final DR_REFs back up to + the enclosing base objects. Each component reference corresponds + to one access function in the DR, with access function 0 being for + the final DR_REF and the highest-indexed access function being the + one that is applied to the base of the DR. + + Look for a sequence of component references whose access functions + are comparable (see access_fn_components_comparable_p). If more + than one such sequence exists, pick the one nearest the base + (which is the leftmost sequence in C notation). Store this sequence + in FULL_SEQ. + + For example, if we have: + + struct foo { struct bar s; ... } (*a)[10], (*b)[10]; + + A: a[0][i].s.c.d + B: __real b[0][i].s.e[i].f + + (where d is the same type as the real component of f) then the access + functions would be: + + 0 1 2 3 + A: .d .c .s [i] + + 0 1 2 3 4 5 + B: __real .f [i] .e .s [i] + + The A0/B2 column isn't comparable, since .d is a COMPONENT_REF + and [i] is an ARRAY_REF. However, the A1/B3 column contains two + COMPONENT_REF accesses for struct bar, so is comparable. Likewise + the A2/B4 column contains two COMPONENT_REF accesses for struct foo, + so is comparable. The A3/B5 column contains two ARRAY_REFs that + index foo[10] arrays, so is again comparable. The sequence is + therefore: + + A: [1, 3] (i.e. [i].s.c) + B: [3, 5] (i.e. [i].s.e) + + Also look for sequences of component references whose access + functions are comparable and whose enclosing objects have the same + RECORD_TYPE. Store this sequence in STRUCT_SEQ. In the above + example, STRUCT_SEQ would be: + + A: [1, 2] (i.e. s.c) + B: [3, 4] (i.e. s.e) */ + while (index_a < num_dimensions_a && index_b < num_dimensions_b) + { + /* REF_A and REF_B must be one of the component access types + allowed by dr_analyze_indices. */ + gcc_checking_assert (access_fn_component_p (ref_a)); + gcc_checking_assert (access_fn_component_p (ref_b)); + + /* Get the immediately-enclosing objects for REF_A and REF_B, + i.e. the references *before* applying DR_ACCESS_FN (A, INDEX_A) + and DR_ACCESS_FN (B, INDEX_B). */ + tree object_a = TREE_OPERAND (ref_a, 0); + tree object_b = TREE_OPERAND (ref_b, 0); + + tree type_a = TREE_TYPE (object_a); + tree type_b = TREE_TYPE (object_b); + if (access_fn_components_comparable_p (ref_a, ref_b)) + { + /* This pair of component accesses is comparable for dependence + analysis, so we can include DR_ACCESS_FN (A, INDEX_A) and + DR_ACCESS_FN (B, INDEX_B) in the sequence. */ + if (full_seq.start_a + full_seq.length != index_a + || full_seq.start_b + full_seq.length != index_b) + { + /* The accesses don't extend the current sequence, + so start a new one here. */ + full_seq.start_a = index_a; + full_seq.start_b = index_b; + full_seq.length = 0; + } + + /* Add this pair of references to the sequence. */ + full_seq.length += 1; + full_seq.object_a = object_a; + full_seq.object_b = object_b; + + /* If the enclosing objects are structures (and thus have the + same RECORD_TYPE), record the new sequence in STRUCT_SEQ. */ + if (TREE_CODE (type_a) == RECORD_TYPE) + struct_seq = full_seq; + + /* Move to the next containing reference for both A and B. */ + ref_a = object_a; + ref_b = object_b; + index_a += 1; + index_b += 1; + continue; + } + + /* Try to approach equal type sizes. */ + if (!COMPLETE_TYPE_P (type_a) + || !COMPLETE_TYPE_P (type_b) + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) + break; + + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); + if (size_a <= size_b) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + index_a += 1; + ref_a = object_a; + } + if (size_b <= size_a) + { + index_b += 1; + ref_b = object_b; } - DDR_AFFINE_P (res) = true; - DDR_ARE_DEPENDENT (res) = NULL_TREE; - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); - DDR_LOOP_NEST (res) = loop_nest; - DDR_INNER_LOOP (res) = 0; - DDR_SELF_REFERENCE (res) = true; - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) - { - struct subscript *subscript; - - subscript = XNEW (struct subscript); - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; - SUB_DISTANCE (subscript) = chrec_dont_know; - DDR_SUBSCRIPTS (res).safe_push (subscript); - } - return res; } - /* If the references do not access the same object, we do not know - whether they alias or not. We do not care about TBAA or alignment - info so we can use OEP_ADDRESS_OF to avoid false negatives. - But the accesses have to use compatible types as otherwise the - built indices would not match. */ - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF) - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), - TREE_TYPE (DR_BASE_OBJECT (b)))) + /* See whether FULL_SEQ ends at the base and whether the two bases + are equal. We do not care about TBAA or alignment info so we can + use OEP_ADDRESS_OF to avoid false negatives. */ + tree base_a = DR_BASE_OBJECT (a); + tree base_b = DR_BASE_OBJECT (b); + bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a + && full_seq.start_b + full_seq.length == num_dimensions_b + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) + && types_compatible_p (TREE_TYPE (base_a), + TREE_TYPE (base_b)) + && (!loop_nest.exists () + || (object_address_invariant_in_loop_p + (loop_nest[0], base_a)))); + + /* If the bases are the same, we can include the base variation too. + E.g. the b accesses in: + + for (int i = 0; i < n; ++i) + b[i + 4][0] = b[i][0]; + + have a definite dependence distance of 4, while for: + + for (int i = 0; i < n; ++i) + a[i + 4][0] = b[i][0]; + + the dependence distance depends on the gap between a and b. + + If the bases are different then we can only rely on the sequence + rooted at a structure access, since arrays are allowed to overlap + arbitrarily and change shape arbitrarily. E.g. we treat this as + valid code: + + int a[256]; + ... + ((int (*)[4][3]) &a[1])[i][0] += ((int (*)[4][3]) &a[2])[i][0]; + + where two lvalues with the same int[4][3] type overlap, and where + both lvalues are distinct from the object's declared type. */ + if (same_base_p) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + if (DR_UNCONSTRAINED_BASE (a)) + full_seq.length += 1; } + else + full_seq = struct_seq; - /* If the base of the object is not invariant in the loop nest, we cannot - analyze it. TODO -- in fact, it would suffice to record that there may - be arbitrary dependences in the loops where the base object varies. */ - if ((loop_nest.exists () - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a))) - || DR_NUM_DIMENSIONS (a) == 0) + /* Punt if we didn't find a suitable sequence. */ + if (full_seq.length == 0) { DDR_ARE_DEPENDENT (res) = chrec_dont_know; return res; } - /* If the number of dimensions of the access to not agree we can have - a pointer access to a component of the array element type and an - array access while the base-objects are still the same. Punt. */ - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) + if (!same_base_p) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + /* Partial overlap is possible for different bases when strict aliasing + is not in effect. It's also possible if either base involves a union + access; e.g. for: + + struct s1 { int a[2]; }; + struct s2 { struct s1 b; int c; }; + struct s3 { int d; struct s1 e; }; + union u { struct s2 f; struct s3 g; } *p, *q; + + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at + "p->g.e" (base "p->g") and might partially overlap the s1 at + "q->g.e" (base "q->g"). */ + if (!flag_strict_aliasing + || ref_contains_union_access_p (full_seq.object_a) + || ref_contains_union_access_p (full_seq.object_b)) + { + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + DDR_COULD_BE_INDEPENDENT_P (res) = true; } DDR_AFFINE_P (res) = true; DDR_ARE_DEPENDENT (res) = NULL_TREE; - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); + DDR_SUBSCRIPTS (res).create (full_seq.length); DDR_LOOP_NEST (res) = loop_nest; DDR_INNER_LOOP (res) = 0; DDR_SELF_REFERENCE (res) = false; - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) + for (i = 0; i < full_seq.length; ++i) { struct subscript *subscript; subscript = XNEW (struct subscript); + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i); + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i); SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); SUB_LAST_CONFLICT (subscript) = chrec_dont_know; @@ -3839,14 +4103,15 @@ add_outer_distances (struct data_depende } /* Return false when fail to represent the data dependence as a - distance vector. INIT_B is set to true when a component has been + distance vector. A_INDEX is the index of the first reference + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the + second reference. INIT_B is set to true when a component has been added to the distance vector DIST_V. INDEX_CARRY is then set to the index in DIST_V that carries the dependence. */ static bool build_classic_dist_vector_1 (struct data_dependence_relation *ddr, - struct data_reference *ddr_a, - struct data_reference *ddr_b, + unsigned int a_index, unsigned int b_index, lambda_vector dist_v, bool *init_b, int *index_carry) { @@ -3864,8 +4129,8 @@ build_classic_dist_vector_1 (struct data return false; } - access_fn_a = DR_ACCESS_FN (ddr_a, i); - access_fn_b = DR_ACCESS_FN (ddr_b, i); + access_fn_a = SUB_ACCESS_FN (subscript, a_index); + access_fn_b = SUB_ACCESS_FN (subscript, b_index); if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) @@ -3925,10 +4190,11 @@ build_classic_dist_vector_1 (struct data constant_access_functions (const struct data_dependence_relation *ddr) { unsigned i; + subscript *sub; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) return false; return true; @@ -3991,10 +4257,11 @@ add_other_self_distances (struct data_de lambda_vector dist_v; unsigned i; int index_carry = DDR_NB_LOOPS (ddr); + subscript *sub; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) { - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); + tree access_fun = SUB_ACCESS_FN (sub, 0); if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) { @@ -4006,7 +4273,7 @@ add_other_self_distances (struct data_de return; } - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) add_multivariate_self_dist (ddr, access_fun); @@ -4077,6 +4344,23 @@ add_distance_for_zero_overlaps (struct d } } +/* Return true when the DDR contains two data references that have the + same access functions. */ + +static inline bool +same_access_functions (const struct data_dependence_relation *ddr) +{ + unsigned i; + subscript *sub; + + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), + SUB_ACCESS_FN (sub, 1))) + return false; + + return true; +} + /* Compute the classic per loop distance vector. DDR is the data dependence relation to build a vector from. Return false when fail to represent the data dependence as a distance vector. */ @@ -4108,8 +4392,7 @@ build_classic_dist_vector (struct data_d } dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), - dist_v, &init_b, &index_carry)) + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry)) return false; /* Save the distance vector if we initialized one. */ @@ -4142,12 +4425,11 @@ build_classic_dist_vector (struct data_d if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) { lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), - loop_nest)) + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) return false; compute_subscript_distance (ddr); - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), - save_v, &init_b, &index_carry)) + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, + &index_carry)) return false; save_dist_v (ddr, save_v); DDR_REVERSED_P (ddr) = true; @@ -4183,12 +4465,10 @@ build_classic_dist_vector (struct data_d { lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), - DDR_A (ddr), loop_nest)) + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) return false; compute_subscript_distance (ddr); - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), - opposite_v, &init_b, + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, &index_carry)) return false; @@ -4267,13 +4547,13 @@ build_classic_dir_vector (struct data_de } } -/* Helper function. Returns true when there is a dependence between - data references DRA and DRB. */ +/* Helper function. Returns true when there is a dependence between the + data references. A_INDEX is the index of the first reference (0 for + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ static bool subscript_dependence_tester_1 (struct data_dependence_relation *ddr, - struct data_reference *dra, - struct data_reference *drb, + unsigned int a_index, unsigned int b_index, struct loop *loop_nest) { unsigned int i; @@ -4285,8 +4565,8 @@ subscript_dependence_tester_1 (struct da { conflict_function *overlaps_a, *overlaps_b; - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), - DR_ACCESS_FN (drb, i), + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), + SUB_ACCESS_FN (subscript, b_index), &overlaps_a, &overlaps_b, &last_conflicts, loop_nest); @@ -4335,7 +4615,7 @@ subscript_dependence_tester_1 (struct da subscript_dependence_tester (struct data_dependence_relation *ddr, struct loop *loop_nest) { - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest)) + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) dependence_stats.num_dependence_dependent++; compute_subscript_distance (ddr); Index: gcc/tree-ssa-loop-prefetch.c =================================================================== --- gcc/tree-ssa-loop-prefetch.c 2017-07-27 13:10:29.620045506 +0100 +++ gcc/tree-ssa-loop-prefetch.c 2017-07-27 13:10:33.023912613 +0100 @@ -1668,6 +1668,7 @@ determine_loop_nest_reuse (struct loop * refb = (struct mem_ref *) DDR_B (dep)->aux; if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know + || DDR_COULD_BE_INDEPENDENT_P (dep) || DDR_NUM_DIST_VECTS (dep) == 0) { /* If the dependence cannot be analyzed, assume that there might be Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2017-07-27 13:10:29.620045506 +0100 +++ gcc/tree-vectorizer.h 2017-07-27 13:10:33.024912868 +0100 @@ -358,7 +358,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ ((L)->may_misalign_stmts.length () > 0) #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ - ((L)->may_alias_ddrs.length () > 0) + ((L)->comp_alias_ddrs.length () > 0) #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) #define LOOP_REQUIRES_VERSIONING(L) \ Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-07-27 13:10:29.620045506 +0100 +++ gcc/tree-vect-data-refs.c 2017-07-27 13:10:33.024912868 +0100 @@ -160,6 +160,60 @@ vect_mark_for_runtime_alias_test (ddr_p } +/* A subroutine of vect_analyze_data_ref_dependence. Handle + DDR_COULD_BE_INDEPENDENT_P ddr DDR that has a known set of dependence + distances. These distances are conservatively correct but they don't + reflect a guaranteed dependence. + + Return true if this function does all the work necessary to avoid + an alias or false if the caller should use the dependence distances + to limit the vectorization factor in the usual way. LOOP_DEPTH is + the depth of the loop described by LOOP_VINFO and the other arguments + are as for vect_analyze_data_ref_dependence. */ + +static bool +vect_analyze_possibly_independent_ddr (data_dependence_relation *ddr, + loop_vec_info loop_vinfo, + int loop_depth, int *max_vf) +{ + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + lambda_vector dist_v; + unsigned int i; + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) + { + int dist = dist_v[loop_depth]; + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) + { + /* If the user asserted safelen >= DIST consecutive iterations + can be executed concurrently, assume independence. + + ??? An alternative would be to add the alias check even + in this case, and vectorize the fallback loop with the + maximum VF set to safelen. However, if the user has + explicitly given a length, it's less likely that that + would be a win. */ + if (loop->safelen >= 2 && abs_hwi (dist) <= loop->safelen) + { + if (loop->safelen < *max_vf) + *max_vf = loop->safelen; + LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo) = false; + continue; + } + + /* For dependence distances of 2 or more, we have the option + of limiting VF or checking for an alias at runtime. + Prefer to check at runtime if we can, to avoid limiting + the VF unnecessarily when the bases are in fact independent. + + Note that the alias checks will be removed if the VF ends up + being small enough. */ + return vect_mark_for_runtime_alias_test (ddr, loop_vinfo); + } + } + return true; +} + + /* Function vect_analyze_data_ref_dependence. Return TRUE if there (might) exist a dependence between a memory-reference @@ -305,6 +359,12 @@ vect_analyze_data_ref_dependence (struct } loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); + + if (DDR_COULD_BE_INDEPENDENT_P (ddr) + && vect_analyze_possibly_independent_ddr (ddr, loop_vinfo, + loop_depth, max_vf)) + return false; + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) { int dist = dist_v[loop_depth]; @@ -2878,6 +2938,44 @@ vect_no_alias_p (struct data_reference * return false; } +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH + in DDR is >= VF. */ + +static bool +dependence_distance_ge_vf (data_dependence_relation *ddr, + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) +{ + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE + || DDR_NUM_DIST_VECTS (ddr) == 0) + return false; + + /* If the dependence is exact, we should have limited the VF instead. */ + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); + + unsigned int i; + lambda_vector dist_v; + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) + { + HOST_WIDE_INT dist = dist_v[loop_depth]; + if (dist != 0 + && !(dist > 0 && DDR_REVERSED_P (ddr)) + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) + return false; + } + + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + "dependence distance between "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); + dump_printf (MSG_NOTE, " and "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); + dump_printf (MSG_NOTE, " is >= VF\n"); + } + + return true; +} + /* Function vect_prune_runtime_alias_test_list. Prune a list of ddrs to be tested at run-time by versioning for alias. @@ -2908,6 +3006,10 @@ vect_prune_runtime_alias_test_list (loop comp_alias_ddrs.create (may_alias_ddrs.length ()); + unsigned int loop_depth + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, + LOOP_VINFO_LOOP_NEST (loop_vinfo)); + /* First, we collect all data ref pairs for aliasing checks. */ FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) { @@ -2917,6 +3019,11 @@ vect_prune_runtime_alias_test_list (loop tree segment_length_a, segment_length_b; gimple *stmt_a, *stmt_b; + /* Ignore the alias if the VF we chose ended up being no greater + than the dependence distance. */ + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) + continue; + dr_a = DDR_A (ddr); stmt_a = DR_STMT (DDR_A (ddr)); dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); @@ -2993,10 +3100,6 @@ vect_prune_runtime_alias_test_list (loop return false; } - /* All alias checks have been resolved at compilation time. */ - if (!comp_alias_ddrs.length ()) - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); - return true; } Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c =================================================================== --- /dev/null 2017-07-27 10:25:31.671280760 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-07-27 13:10:33.022912357 +0100 @@ -0,0 +1,120 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0 -fopenmp-simd" } */ + +/* Intended to be larger than any VF. */ +#define GAP 128 +#define N (GAP * 3) + +struct s { int x[N + 1]; }; +struct t { struct s x[N + 1]; }; +struct u { int x[N + 1]; int y; }; +struct v { struct s s; }; + +void +f1 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i]; +} + +void +f2 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[2].x[i]; +} + +void +f3 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[i].x[i]; +} + +void +f4 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[i].x[i] += b[i].x[i]; +} + +void +f5 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i + 1]; +} + +void +f6 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[2].x[i + 1]; +} + +void +f7 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[i].x[i + 1]; +} + +void +f8 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[i].x[i] += b[i].x[i + 1]; +} + +void +f9 (struct s *a, struct t *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[1].x[i]; +} + +void +f10 (struct s *a, struct t *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i].x[i]; +} + +void +f11 (struct u *a, struct u *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i] + b[i].y; +} + +void +f12 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP; ++i) + a->x[i + GAP] += b->x[i]; +} + +void +f13 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP * 2; ++i) + a->x[i + GAP] += b->x[i]; +} + +void +f14 (struct v *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a->s.x[i] = b->x[i]; +} + +void +f15 (struct s *a, struct s *b) +{ + #pragma omp simd safelen(N) + for (int i = 0; i < N; ++i) + a->x[i + 1] += b->x[i]; +} + +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 15 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c =================================================================== --- /dev/null 2017-07-27 10:25:31.671280760 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-07-27 13:10:33.022912357 +0100 @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ + +#define N 16 + +struct s1 { int a[N]; }; +struct s2 { struct s1 b; int c; }; +struct s3 { int d; struct s1 e; }; +union u { struct s2 f; struct s3 g; }; + +/* We allow a and b to overlap arbitrarily. */ + +void +f1 (int a[][N], int b[][N]) +{ + for (int i = 0; i < N; ++i) + a[0][i] += b[0][i]; +} + +void +f2 (union u *a, union u *b) +{ + for (int i = 0; i < N; ++i) + a->f.b.a[i] += b->g.e.a[i]; +} + +void +f3 (struct s1 *a, struct s1 *b) +{ + for (int i = 0; i < N - 1; ++i) + a->a[i + 1] += b->a[i]; +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c =================================================================== --- /dev/null 2017-07-27 10:25:31.671280760 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-07-27 13:10:33.022912357 +0100 @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +/* Intended to be larger than any VF. */ +#define GAP 128 +#define N (GAP * 3) + +struct s { int x[N]; }; + +void +f1 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP * 2; ++i) + a->x[i + GAP] += b->x[i]; +} + +/* { dg-final { scan-tree-dump-times "consider run-time aliasing" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */