Message ID | CAK=A3=0dZGhTt_amFBCPZB0_yVCUVnPG0Zh+wf5vVhnsteiW8A@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Tue, Oct 22, 2013 at 11:12 PM, Cong Hou <congh@google.com> wrote: > On Fri, Oct 18, 2013 at 6:09 AM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Tue, Oct 15, 2013 at 12:43 AM, Cong Hou <congh@google.com> wrote: >>> Sorry for forgetting using plain-text mode. Resend it. >>> >>> >>> ---------- Forwarded message ---------- >>> From: Cong Hou <congh@google.com> >>> Date: Mon, Oct 14, 2013 at 3:29 PM >>> Subject: Re: [PATCH] Reducing number of alias checks in vectorization. >>> To: Richard Biener <rguenther@suse.de>, GCC Patches <gcc-patches@gcc.gnu.org> >>> Cc: Jakub Jelinek <jakub@redhat.com> >>> >>> >>> I have made a new patch for this issue according to your comments. >>> >>> There are several modifications to my previous patch: >>> >>> >>> 1. Remove the use of STL features such as vector and sort. Use GCC's >>> vec and qsort instead. >> >> I think that using <utility> and thus std::pair and std::swap is ok. Including >> of <utility> should be done via system.h though. >> > > > Unfortunately, std::swap is defined in <algorithm>. Can I still use it? Hmm, let's defer this issue for now, adding swap and pair templates locally in GCC is better for now. That is, I don't want a discussion of what parts of the standard library to use inside this thread. You may want to rise the this issue in a new thread on gcc@gcc.gnu.org. >>> 2. Comparisons between tree nodes are not based on their addresses any >>> more. Use compare_tree() function instead. >> >> Ok. >> >>> 3. The function vect_create_cond_for_alias_checks() now returns the >>> number of alias checks. If its second parameter cond_expr is NULL, >>> then this function only calculate the number of alias checks after the >>> merging and won't generate comparison expressions. >> >> We now perform the merging twice - that's easily avoided by recording >> the merge result in loop_vinfo alongside may_alias_ddrs (which you >> should be able to drop as they are already contained in the data >> structures you build). >> >> Which means you can split the function and move the merging >> part to vect_prune_runtime_alias_test_list. > > > I have added one more field to loop_vec_info to store the dr pairs > from which alias checks are built. And I also split the functionality > of vect_create_cond_for_alias_checks() into two functions, so that we > now perform the merging once instead of twice. > > >> >>> 4. The function vect_prune_runtime_alias_test_list() now uses >>> vect_create_cond_for_alias_checks() to get the number of alias checks. >>> >>> >>> The patch is attached as a text file. >>> >>> Please give me your comment on this patch. Thank you! >> >> + if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0) >> + || TREE_CODE (dr_a1->offset) != INTEGER_CST >> + || TREE_CODE (dr_a2->offset) != INTEGER_CST) >> + continue; >> + >> + HOST_WIDEST_INT diff = widest_int_cst_value (dr_a2->offset) - >> + widest_int_cst_value (dr_a1->offset); >> >> use !host_integerp (dr_a1->offset) as check and then int_cst_value. > > > Done. > > >> >> + if (diff <= vect_factor >> + || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST >> + && diff - widest_int_cst_value (dr_a1->seg_len) < vect_factor) >> + || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST >> + && TREE_CODE (dr_b1->seg_len) == INTEGER_CST >> + && diff - widest_int_cst_value (dr_a1->seg_len) < >> + widest_int_cst_value (dr_b1->seg_len))) >> >> can you add a comment above this why it is safe to combine under >> this condition? You seem to combine offsets but not data-ref steps >> for example. The segment length is computed quite optimistically >> and you don't seem to adjust that. That is, generally the alias >> check, if implemented as simple overlap, would need to intersect >> [init, init + niter * step] intervals, losing some non-aliasing cases. >> The current segment length is optimistically computed and thus >> can recover some more cases, but you have to be careful with >> merging to not break its implicit assumptions. > > > I have changed the code and made it more clear as shown below. I still > use the condition > > > DR_A2->OFFSET - DR_A1->OFFSET - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B > > > But in case that SEGMENT_LENGTH_A or SEGMENT_LENGTH_B is not constant, > we can still check the following two conditions: > > > DR_A2->OFFSET - DR_A1->OFFSET < SEGMENT_LENGTH_B > (if SEGMENT_LENGTH_A is not a constant). > > > DR_A2->OFFSET - DR_A1->OFFSET - SEGMENT_LENGTH_A < VECT_FACTOR > (if SEGMENT_LENGTH_B is not a constant, then its minimum value should > be VECT_FACTOR). > > > > The corresponding code is: > > > /* Now we check if the following condition is satisfied: > > DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B > > where DIFF = DR_A2->OFFSET - DR_A1->OFFSET. However, > SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we > have to make a best estimation. We can get the minimum value > of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B, > then either of the following two conditions can guarantee the > one above: > > 1: DIFF <= MIN_SEG_LEN_B > 2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B > > */ > > HOST_WIDEST_INT > min_seg_len_b = (TREE_CODE (dr_b1->seg_len) == INTEGER_CST) ? > widest_int_cst_value (dr_b1->seg_len) : > vect_factor; > > if (diff <= min_seg_len_b > || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST > && diff - widest_int_cst_value (dr_a1->seg_len) < > min_seg_len_b)) > { ... } > > > > The new patch is pasted here again. Bootstrapping and regression tests > are passed. > > > thanks, > Cong > > > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 8a38316..7333f6a 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,12 @@ > +2013-10-14 Cong Hou <congh@google.com> > + > + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): > + Combine alias checks if it is possible to amortize the runtime > + overhead. Return the number of alias checks after merging. > + * tree-vect-data-refs.c (vect_prune_runtime_alias_test_list): > + Use the function vect_create_cond_for_alias_checks () to check > + the number of alias checks. > + Your MUA eats tabs ... ;) > 2013-10-14 David Malcolm <dmalcolm@redhat.com> > > * dumpfile.h (gcc::dump_manager): New class, to hold state > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > index 075d071..ea2782f 100644 > --- a/gcc/testsuite/ChangeLog > +++ b/gcc/testsuite/ChangeLog > @@ -1,3 +1,7 @@ > +2013-10-14 Cong Hou <congh@google.com> > + > + * gcc.dg/vect/vect-alias-check.c: New. > + > 2013-10-14 Tobias Burnus <burnus@net-b.de> > > PR fortran/58658 > diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check.c > b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c > new file mode 100644 > index 0000000..bf9eb6b > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c > @@ -0,0 +1,19 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize > + --param=vect-max-version-for-alias-checks=2 > + -fdump-tree-vect-details" } */ > + > +/* A test case showing three potential alias checks between > + a[i] and b[i], b[i+7], b[i+14]. With alias checks merging > + enabled, those tree checks can be merged into one, and the > + loop will be vectorized with vect-max-version-for-alias-checks=2. */ > + > +void foo (int *a, int *b) > +{ > + int i; > + for (i = 0; i < 1000; ++i) > + a[i] = b[i] + b[i+7] + b[i+14]; > +} > + > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ > +/* { dg-final { cleanup-tree-dump "vect" } } */ > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c > index e7b2f33..5f70d58 100644 > --- a/gcc/tree-vect-data-refs.c > +++ b/gcc/tree-vect-data-refs.c > @@ -128,41 +128,6 @@ vect_get_smallest_scalar_type (gimple stmt, > HOST_WIDE_INT *lhs_size_unit, > } > > > -/* Check if data references pointed by DR_I and DR_J are same or > - belong to same interleaving group. Return FALSE if drs are > - different, otherwise return TRUE. */ > - > -static bool > -vect_same_range_drs (data_reference_p dr_i, data_reference_p dr_j) > -{ > - gimple stmt_i = DR_STMT (dr_i); > - gimple stmt_j = DR_STMT (dr_j); > - > - if (operand_equal_p (DR_REF (dr_i), DR_REF (dr_j), 0) > - || (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i)) > - && GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j)) > - && (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i)) > - == GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j))))) > - return true; > - else > - return false; > -} > - > -/* If address ranges represented by DDR_I and DDR_J are equal, > - return TRUE, otherwise return FALSE. */ > - > -static bool > -vect_vfa_range_equal (ddr_p ddr_i, ddr_p ddr_j) > -{ > - if ((vect_same_range_drs (DDR_A (ddr_i), DDR_A (ddr_j)) > - && vect_same_range_drs (DDR_B (ddr_i), DDR_B (ddr_j))) > - || (vect_same_range_drs (DDR_A (ddr_i), DDR_B (ddr_j)) > - && vect_same_range_drs (DDR_B (ddr_i), DDR_A (ddr_j)))) > - return true; > - else > - return false; > -} > - > /* Insert DDR into LOOP_VINFO list of ddrs that may alias and need to be > tested at run-time. Return TRUE if DDR was successfully inserted. > Return false if versioning is not supported. */ > @@ -2647,69 +2612,331 @@ vect_analyze_data_ref_accesses (loop_vec_info > loop_vinfo, bb_vec_info bb_vinfo) > return true; > } > > + > +/* Operator == between two dr_addr_with_seg_len objects. > + > + This equality operator is used to make sure two data refs > + are the same one so that we will consider to combine the > + aliasing checks of those two pairs of data dependent data > + refs. */ > + > +static bool > +operator == (const dr_addr_with_seg_len& d1, > + const dr_addr_with_seg_len& d2) > +{ > + return operand_equal_p (d1.basic_addr, d2.basic_addr, 0) > + && compare_tree (d1.offset, d2.offset) == 0 > + && compare_tree (d1.seg_len, d2.seg_len) == 0; > +} > + > +/* Function comp_dr_addr_with_seg_len_pair. > + > + Comparison function for sorting objects of dr_addr_with_seg_len_pair_t > + so that we can combine aliasing checks in one scan. */ > + > +static int > +comp_dr_addr_with_seg_len_pair (const void *p1_, const void *p2_) > +{ > + const dr_addr_with_seg_len_pair_t* p1 = > + (const dr_addr_with_seg_len_pair_t *) p1_; > + const dr_addr_with_seg_len_pair_t* p2 = > + (const dr_addr_with_seg_len_pair_t *) p2_; > + > + const dr_addr_with_seg_len &p11 = p1->first, > + &p12 = p1->second, > + &p21 = p2->first, > + &p22 = p2->second; > + > + int comp_res = compare_tree (p11.basic_addr, p21.basic_addr); > + if (comp_res != 0) > + return comp_res; > + > + comp_res = compare_tree (p12.basic_addr, p22.basic_addr); > + if (comp_res != 0) > + return comp_res; > + > + if (TREE_CODE (p11.offset) != INTEGER_CST > + || TREE_CODE (p21.offset) != INTEGER_CST) > + { > + comp_res = compare_tree (p11.offset, p21.offset); > + if (comp_res != 0) > + return comp_res; > + } > + if (tree_int_cst_compare (p11.offset, p21.offset) < 0) > + return -1; > + if (tree_int_cst_compare (p11.offset, p21.offset) > 0) > + return 1; > + if (TREE_CODE (p12.offset) != INTEGER_CST > + || TREE_CODE (p22.offset) != INTEGER_CST) > + { > + comp_res = compare_tree (p12.offset, p22.offset); > + if (comp_res != 0) > + return comp_res; > + } > + if (tree_int_cst_compare (p12.offset, p22.offset) < 0) > + return -1; > + if (tree_int_cst_compare (p12.offset, p22.offset) > 0) > + return 1; > + > + return 0; > +} > + > +template <class T> static void > +swap (T& a, T& b) > +{ > + T c (a); > + a = b; > + b = c; > +} > + > +/* Function vect_vfa_segment_size. > + > + Create an expression that computes the size of segment > + that will be accessed for a data reference. The functions takes into > + account that realignment loads may access one more vector. > + > + Input: > + DR: The data reference. > + LENGTH_FACTOR: segment length to consider. > + > + Return an expression whose value is the size of segment which will be > + accessed by DR. */ > + > +static tree > +vect_vfa_segment_size (struct data_reference *dr, tree length_factor) > +{ > + tree segment_length; > + > + if (integer_zerop (DR_STEP (dr))) > + segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr))); > + else > + segment_length = size_binop (MULT_EXPR, > + fold_convert (sizetype, DR_STEP (dr)), > + fold_convert (sizetype, length_factor)); > + > + if (vect_supportable_dr_alignment (dr, false) > + == dr_explicit_realign_optimized) > + { > + tree vector_size = TYPE_SIZE_UNIT > + (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)))); > + > + segment_length = size_binop (PLUS_EXPR, segment_length, vector_size); > + } > + return segment_length; > +} > + > /* Function vect_prune_runtime_alias_test_list. > > Prune a list of ddrs to be tested at run-time by versioning for alias. > + Merge several alias checks into one if possible. > Return FALSE if resulting list of ddrs is longer then allowed by > PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS, otherwise return TRUE. */ > > bool > vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo) > { > - vec<ddr_p> ddrs = > + vec<ddr_p> may_alias_ddrs = > LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo); > - unsigned i, j; > + vec<dr_addr_with_seg_len_pair_t>& comp_alias_ddrs = > + LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo); > + int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > + tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo); > + > + ddr_p ddr; > + unsigned int i; > + tree length_factor; > > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > "=== vect_prune_runtime_alias_test_list ===\n"); > > - for (i = 0; i < ddrs.length (); ) > + if (may_alias_ddrs.is_empty ()) > + return true; > + > + /* Basically, for each pair of dependent data refs store_ptr_0 > + and load_ptr_0, we create an expression: > + > + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) > + || (load_ptr_0 + load_segment_length_0) <= store_ptr_0)) > + > + for aliasing checks. However, in some cases we can decrease > + the number of checks by combining two checks into one. For > + example, suppose we have another pair of data refs store_ptr_0 > + and load_ptr_1, and if the following condition is satisfied: > + > + load_ptr_0 < load_ptr_1 && > + load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0 > + > + (this condition means, in each iteration of vectorized loop, > + the accessed memory of store_ptr_0 cannot be between the memory > + of load_ptr_0 and load_ptr_1.) > + > + we then can use only the following expression to finish the > + alising checks between store_ptr_0 & load_ptr_0 and > + store_ptr_0 & load_ptr_1: > + > + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) > + || (load_ptr_1 + load_segment_length_1 <= store_ptr_0)) > + > + Note that we only consider that load_ptr_0 and load_ptr_1 have the > + same basic address. */ > + > + comp_alias_ddrs.create (may_alias_ddrs.length ()); > + > + /* First, we collect all data ref pairs for aliasing checks. */ > + FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) > { > - bool found; > - ddr_p ddr_i; > + struct data_reference *dr_a, *dr_b; > + gimple dr_group_first_a, dr_group_first_b; > + tree segment_length_a, segment_length_b; > + gimple stmt_a, stmt_b; > + > + dr_a = DDR_A (ddr); > + stmt_a = DR_STMT (DDR_A (ddr)); > + dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); > + if (dr_group_first_a) > + { > + stmt_a = dr_group_first_a; > + dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a)); > + } > > - ddr_i = ddrs[i]; > - found = false; > + dr_b = DDR_B (ddr); > + stmt_b = DR_STMT (DDR_B (ddr)); > + dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b)); > + if (dr_group_first_b) > + { > + stmt_b = dr_group_first_b; > + dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b)); > + } > > - for (j = 0; j < i; j++) > - { > - ddr_p ddr_j = ddrs[j]; > + if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)) > + length_factor = scalar_loop_iters; > + else > + length_factor = size_int (vect_factor); ^^^^ this is the thing I am worried about. When the steps are equal we build reduced size segments to handle overlaps with a dependence distance bigger than vect_factor. > + segment_length_a = vect_vfa_segment_size (dr_a, length_factor); > + segment_length_b = vect_vfa_segment_size (dr_b, length_factor); > + > + dr_addr_with_seg_len_pair_t dr_with_seg_len_pair > + (dr_addr_with_seg_len > + (dr_a, DR_BASE_ADDRESS (dr_a), > + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)), > + segment_length_a), > + dr_addr_with_seg_len > + (dr_b, DR_BASE_ADDRESS (dr_b), > + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)), > + segment_length_b)); > + > + if (compare_tree (dr_with_seg_len_pair.first.basic_addr, > + dr_with_seg_len_pair.second.basic_addr) > 0) > + swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second); > + > + comp_alias_ddrs.safe_push (dr_with_seg_len_pair); > + } > > - if (vect_vfa_range_equal (ddr_i, ddr_j)) > + /* Second, we sort the collected data ref pairs so that we can scan > + them once to combine all possible aliasing checks. */ > + comp_alias_ddrs.qsort (comp_dr_addr_with_seg_len_pair); > + > + /* Third, we scan the sorted dr pairs and check if we can combine > + alias checks of two neighbouring dr pairs. */ > + for (size_t i = 1; i < comp_alias_ddrs.length (); ++i) > + { > + /* Deal with two ddrs (dr_a1, dr_b1) and (dr_a2, dr_b2). */ > + dr_addr_with_seg_len *dr_a1 = &comp_alias_ddrs[i-1].first, > + *dr_b1 = &comp_alias_ddrs[i-1].second, > + *dr_a2 = &comp_alias_ddrs[i].first, > + *dr_b2 = &comp_alias_ddrs[i].second; > + > + /* Remove duplicate data ref pairs. */ > + if (*dr_a1 == *dr_a2 && *dr_b1 == *dr_b2) > + { > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, > + "found equal ranges "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, > + DR_REF (dr_a1->dr)); > + dump_printf (MSG_NOTE, ", "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, > + DR_REF (dr_b1->dr)); > + dump_printf (MSG_NOTE, " and "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, > + DR_REF (dr_a2->dr)); > + dump_printf (MSG_NOTE, ", "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, > + DR_REF (dr_b2->dr)); > + dump_printf (MSG_NOTE, "\n"); > + } > + > + comp_alias_ddrs.ordered_remove (i--); > + continue; > + } > + > + if (*dr_a1 == *dr_a2 || *dr_b1 == *dr_b2) > + { > + /* We consider the case that DR_B1 and DR_B2 are same memrefs, > + and DR_A1 and DR_A2 are two consecutive memrefs. */ > + if (*dr_a1 == *dr_a2) > + { > + swap (dr_a1, dr_b1); > + swap (dr_a2, dr_b2); > + } > + > + if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0) > + || !host_integerp (dr_a1->offset, 0) > + || !host_integerp (dr_a2->offset, 0)) > + continue; > + > + HOST_WIDEST_INT diff = widest_int_cst_value (dr_a2->offset) - > + widest_int_cst_value (dr_a1->offset); > + > + > + /* Now we check if the following condition is satisfied: > + > + DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B > + > + where DIFF = DR_A2->OFFSET - DR_A1->OFFSET. However, > + SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we > + have to make a best estimation. We can get the minimum value > + of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B, > + then either of the following two conditions can guarantee the > + one above: > + > + 1: DIFF <= MIN_SEG_LEN_B > + 2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B > + > + */ > + > + HOST_WIDEST_INT > + min_seg_len_b = (TREE_CODE (dr_b1->seg_len) == INTEGER_CST) ? > + widest_int_cst_value (dr_b1->seg_len) : don't use widest_int_cst_value but tree_int_cst_low () if you previously checked host_integerp. > + vect_factor; > + > + if (diff <= min_seg_len_b > + || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST > + && diff - widest_int_cst_value (dr_a1->seg_len) < > + min_seg_len_b)) > { > if (dump_enabled_p ()) > { > - dump_printf_loc (MSG_NOTE, vect_location, > - "found equal ranges "); > - dump_generic_expr (MSG_NOTE, TDF_SLIM, > - DR_REF (DDR_A (ddr_i))); > - dump_printf (MSG_NOTE, ", "); > - dump_generic_expr (MSG_NOTE, TDF_SLIM, > - DR_REF (DDR_B (ddr_i))); > - dump_printf (MSG_NOTE, " and "); > - dump_generic_expr (MSG_NOTE, TDF_SLIM, > - DR_REF (DDR_A (ddr_j))); > - dump_printf (MSG_NOTE, ", "); > - dump_generic_expr (MSG_NOTE, TDF_SLIM, > - DR_REF (DDR_B (ddr_j))); > + dump_printf_loc > + (MSG_NOTE, vect_location, > + "merging two runtime checks for data references "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1->dr)); > + dump_printf (MSG_NOTE, " and "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2->dr)); > dump_printf (MSG_NOTE, "\n"); > } > - found = true; > - break; > + > + dr_a1->seg_len = size_binop (PLUS_EXPR, > + dr_a2->seg_len, size_int (diff)); > + comp_alias_ddrs.ordered_remove (i--); > } > } > - > - if (found) > - { > - ddrs.ordered_remove (i); > - continue; > - } > - i++; > } > > - if (ddrs.length () > > - (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS)) > + if ((int) comp_alias_ddrs.length () > > + PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS)) > { > if (dump_enabled_p ()) > { > @@ -2718,8 +2945,6 @@ vect_prune_runtime_alias_test_list > (loop_vec_info loop_vinfo) > "generated checks exceeded.\n"); > } > > - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); > - > return false; > } > > diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c > index 574446a..1e03853 100644 > --- a/gcc/tree-vect-loop-manip.c > +++ b/gcc/tree-vect-loop-manip.c > @@ -2211,44 +2211,6 @@ vect_create_cond_for_align_checks > (loop_vec_info loop_vinfo, > *cond_expr = part_cond_expr; > } > > - > -/* Function vect_vfa_segment_size. > - > - Create an expression that computes the size of segment > - that will be accessed for a data reference. The functions takes into > - account that realignment loads may access one more vector. > - > - Input: > - DR: The data reference. > - LENGTH_FACTOR: segment length to consider. > - > - Return an expression whose value is the size of segment which will be > - accessed by DR. */ > - > -static tree > -vect_vfa_segment_size (struct data_reference *dr, tree length_factor) > -{ > - tree segment_length; > - > - if (integer_zerop (DR_STEP (dr))) > - segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr))); > - else > - segment_length = size_binop (MULT_EXPR, > - fold_convert (sizetype, DR_STEP (dr)), > - fold_convert (sizetype, length_factor)); > - > - if (vect_supportable_dr_alignment (dr, false) > - == dr_explicit_realign_optimized) > - { > - tree vector_size = TYPE_SIZE_UNIT > - (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)))); > - > - segment_length = size_binop (PLUS_EXPR, segment_length, vector_size); > - } > - return segment_length; > -} > - > - > /* Function vect_create_cond_for_alias_checks. > > Create a conditional expression that represents the run-time checks for > @@ -2257,28 +2219,24 @@ vect_vfa_segment_size (struct data_reference > *dr, tree length_factor) > > Input: > COND_EXPR - input conditional expression. New conditions will be chained > - with logical AND operation. > + with logical AND operation. If it is NULL, then the function > + is used to return the number of alias checks. > LOOP_VINFO - field LOOP_VINFO_MAY_ALIAS_STMTS contains the list of ddrs > to be checked. > > Output: > COND_EXPR - conditional expression. > > - The returned value is the conditional expression to be used in the if > + The returned COND_EXPR is the conditional expression to be used in the if > statement that controls which version of the loop gets executed at runtime. > */ > > -static void > +void > vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr) > { > - vec<ddr_p> may_alias_ddrs = > - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo); > - int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > - tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo); > - > - ddr_p ddr; > - unsigned int i; > - tree part_cond_expr, length_factor; > + vec<dr_addr_with_seg_len_pair_t> comp_alias_ddrs = > + LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo); > + tree part_cond_expr; > > /* Create expression > ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) > @@ -2289,70 +2247,39 @@ vect_create_cond_for_alias_checks > (loop_vec_info loop_vinfo, tree * cond_expr) > ((store_ptr_n + store_segment_length_n) <= load_ptr_n) > || (load_ptr_n + load_segment_length_n) <= store_ptr_n)) */ > > - if (may_alias_ddrs.is_empty ()) > + if (comp_alias_ddrs.is_empty ()) > return; > > - FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) > + for (size_t i = 0, s = comp_alias_ddrs.length (); i < s; ++i) > { > - struct data_reference *dr_a, *dr_b; > - gimple dr_group_first_a, dr_group_first_b; > - tree addr_base_a, addr_base_b; > - tree segment_length_a, segment_length_b; > - gimple stmt_a, stmt_b; > - tree seg_a_min, seg_a_max, seg_b_min, seg_b_max; > - > - dr_a = DDR_A (ddr); > - stmt_a = DR_STMT (DDR_A (ddr)); > - dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); > - if (dr_group_first_a) > - { > - stmt_a = dr_group_first_a; > - dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a)); > - } > - > - dr_b = DDR_B (ddr); > - stmt_b = DR_STMT (DDR_B (ddr)); > - dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b)); > - if (dr_group_first_b) > - { > - stmt_b = dr_group_first_b; > - dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b)); > - } > + const dr_addr_with_seg_len& dr_a = comp_alias_ddrs[i].first; > + const dr_addr_with_seg_len& dr_b = comp_alias_ddrs[i].second; > + tree segment_length_a = dr_a.seg_len; > + tree segment_length_b = dr_b.seg_len; > > - addr_base_a > - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a), > - size_binop (PLUS_EXPR, DR_OFFSET (dr_a), > - DR_INIT (dr_a))); > - addr_base_b > - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b), > - size_binop (PLUS_EXPR, DR_OFFSET (dr_b), > - DR_INIT (dr_b))); > - > - if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)) > - length_factor = scalar_loop_iters; > - else > - length_factor = size_int (vect_factor); > - segment_length_a = vect_vfa_segment_size (dr_a, length_factor); > - segment_length_b = vect_vfa_segment_size (dr_b, length_factor); > + tree addr_base_a > + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset); > + tree addr_base_b > + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset); > > if (dump_enabled_p ()) > { > dump_printf_loc (MSG_NOTE, vect_location, > - "create runtime check for data references "); > - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a)); > + "create runtime check for data references "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr)); > dump_printf (MSG_NOTE, " and "); > - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b)); > - dump_printf (MSG_NOTE, "\n"); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr)); > + dump_printf (MSG_NOTE, "\n"); > } > > - seg_a_min = addr_base_a; > - seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); > - if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0) > + tree seg_a_min = addr_base_a; > + tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); > + if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0) > seg_a_min = seg_a_max, seg_a_max = addr_base_a; > > - seg_b_min = addr_base_b; > - seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); > - if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0) > + tree seg_b_min = addr_base_b; > + tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); > + if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0) > seg_b_min = seg_b_max, seg_b_max = addr_base_b; > > part_cond_expr = > @@ -2370,7 +2297,9 @@ vect_create_cond_for_alias_checks (loop_vec_info > loop_vinfo, tree * cond_expr) > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > "created %u versioning for alias checks.\n", > - may_alias_ddrs.length ()); > + comp_alias_ddrs.length ()); > + > + comp_alias_ddrs.release (); > } > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index 8b7b345..f231b62 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -175,6 +175,36 @@ typedef struct _slp_oprnd_info > > > > +/* This struct is used to store the information of a data reference, > + including the data ref itself, its basic address, the access offset > + and the segment length for aliasing checks. This is used to generate > + alias checks. */ > + > +struct dr_addr_with_seg_len > +{ > + dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len) > + : dr (d), basic_addr (addr), offset (off), seg_len (len) {} > + > + data_reference *dr; > + tree basic_addr; > + tree offset; > + tree seg_len; > +}; > + > +/* This struct contains two dr_addr_with_seg_len objects with aliasing data > + refs. Two comparisons are generated from them. */ > + > +struct dr_addr_with_seg_len_pair_t > +{ > + dr_addr_with_seg_len_pair_t (const dr_addr_with_seg_len& d1, > + const dr_addr_with_seg_len& d2) > + : first (d1), second (d2) {} > + > + dr_addr_with_seg_len first; > + dr_addr_with_seg_len second; > +}; > + > + > typedef struct _vect_peel_info > { > int npeel; > @@ -274,6 +304,10 @@ typedef struct _loop_vec_info { > for a run-time aliasing check. */ > vec<ddr_p> may_alias_ddrs; > > + /* Data Dependence Relations defining address ranges together with segment > + lengths from which the run-time aliasing check is built. */ > + vec<dr_addr_with_seg_len_pair_t> comp_alias_ddrs; > + Are you sure you release those on all paths that exit the vectorizer? Thanks, Richard. > /* Statements in the loop that have data references that are candidates for a > runtime (loop versioning) misalignment check. */ > vec<gimple> may_misalign_stmts; > @@ -336,6 +370,7 @@ typedef struct _loop_vec_info { > #define LOOP_VINFO_MAY_MISALIGN_STMTS(L) (L)->may_misalign_stmts > #define LOOP_VINFO_LOC(L) (L)->loop_line_number > #define LOOP_VINFO_MAY_ALIAS_DDRS(L) (L)->may_alias_ddrs > +#define LOOP_VINFO_COMP_ALIAS_DDRS(L) (L)->comp_alias_ddrs > #define LOOP_VINFO_GROUPED_STORES(L) (L)->grouped_stores > #define LOOP_VINFO_SLP_INSTANCES(L) (L)->slp_instances > #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor > > > > > >> >> Thanks, >> Richard. >> >>> >>> Cong >>> >>> >>> On Thu, Oct 3, 2013 at 2:35 PM, Cong Hou <congh@google.com> wrote: >>>> >>>> Forget about this "aux" idea as the segment length for one data ref >>>> can be different in different dr pairs. >>>> >>>> In my patch I created a struct as shown below: >>>> >>>> struct dr_addr_with_seg_len >>>> { >>>> data_reference *dr; >>>> tree basic_addr; >>>> tree offset; >>>> tree seg_len; >>>> }; >>>> >>>> >>>> Note that basic_addr and offset can always obtained from dr, but we >>>> need to store two segment lengths for each dr pair. It is improper to >>>> add a field to data_dependence_relation as it is defined outside of >>>> vectorizer. We can change the type (a new one combining >>>> data_dependence_relation and segment length) of may_alias_ddrs in >>>> loop_vec_info to include such information, but we have to add a new >>>> type to tree-vectorizer.h which is only used in two places - still too >>>> much. >>>> >>>> One possible solution is that we create a local struct as shown above >>>> and a new function which returns the merged alias check information. >>>> This function will be called twice: once during analysis phase and >>>> once in transformation phase. Then we don't have to store the merged >>>> alias check information during those two phases. The additional time >>>> cost is minimal as there will not be too many data dependent dr pairs >>>> in a loop. >>>> >>>> Any comment? >>>> >>>> >>>> thanks, >>>> Cong >>>> >>>> >>>> On Thu, Oct 3, 2013 at 10:57 AM, Cong Hou <congh@google.com> wrote: >>>> > I noticed that there is a "struct dataref_aux" defined in >>>> > tree-vectorizer.h which is specific to the vectorizer pass and is >>>> > stored in (void*)aux in "struct data_reference". Can we add one more >>>> > field "segment_length" to dataref_aux so that we can pass this >>>> > information for merging alias checks? Then we can avoid to modify or >>>> > create other structures. >>>> > >>>> > >>>> > thanks, >>>> > Cong >>>> > >>>> > >>>> > On Wed, Oct 2, 2013 at 2:34 PM, Cong Hou <congh@google.com> wrote: >>>> >> On Wed, Oct 2, 2013 at 4:24 AM, Richard Biener <rguenther@suse.de> wrote: >>>> >>> On Tue, 1 Oct 2013, Cong Hou wrote: >>>> >>> >>>> >>>> When alias exists between data refs in a loop, to vectorize it GCC >>>> >>>> does loop versioning and adds runtime alias checks. Basically for each >>>> >>>> pair of data refs with possible data dependence, there will be two >>>> >>>> comparisons generated to make sure there is no aliasing between them >>>> >>>> in each iteration of the vectorized loop. If there are many such data >>>> >>>> refs pairs, the number of comparisons can be very large, which is a >>>> >>>> big overhead. >>>> >>>> >>>> >>>> However, in some cases it is possible to reduce the number of those >>>> >>>> comparisons. For example, for the following loop, we can detect that >>>> >>>> b[0] and b[1] are two consecutive member accesses so that we can >>>> >>>> combine the alias check between a[0:100]&b[0] and a[0:100]&b[1] into >>>> >>>> checking a[0:100]&b[0:2]: >>>> >>>> >>>> >>>> void foo(int*a, int* b) >>>> >>>> { >>>> >>>> for (int i = 0; i < 100; ++i) >>>> >>>> a[i] = b[0] + b[1]; >>>> >>>> } >>>> >>>> >>>> >>>> Actually, the requirement of consecutive memory accesses is too >>>> >>>> strict. For the following loop, we can still combine the alias checks >>>> >>>> between a[0:100]&b[0] and a[0:100]&b[100]: >>>> >>>> >>>> >>>> void foo(int*a, int* b) >>>> >>>> { >>>> >>>> for (int i = 0; i < 100; ++i) >>>> >>>> a[i] = b[0] + b[100]; >>>> >>>> } >>>> >>>> >>>> >>>> This is because if b[0] is not in a[0:100] and b[100] is not in >>>> >>>> a[0:100] then a[0:100] cannot be between b[0] and b[100]. We only need >>>> >>>> to check a[0:100] and b[0:101] don't overlap. >>>> >>>> >>>> >>>> More generally, consider two pairs of data refs (a, b1) and (a, b2). >>>> >>>> Suppose addr_b1 and addr_b2 are basic addresses of data ref b1 and b2; >>>> >>>> offset_b1 and offset_b2 (offset_b1 < offset_b2) are offsets of b1 and >>>> >>>> b2, and segment_length_a, segment_length_b1, and segment_length_b2 are >>>> >>>> segment length of a, b1, and b2. Then we can combine the two >>>> >>>> comparisons into one if the following condition is satisfied: >>>> >>>> >>>> >>>> offset_b2- offset_b1 - segment_length_b1 < segment_length_a >>>> >>>> >>>> >>>> >>>> >>>> This patch detects those combination opportunities to reduce the >>>> >>>> number of alias checks. It is tested on an x86-64 machine. >>>> >>> >>>> >>> Apart from the other comments you got (to which I agree) the patch >>>> >>> seems to do two things, namely also: >>>> >>> >>>> >>> + /* Extract load and store statements on pointers with zero-stride >>>> >>> + accesses. */ >>>> >>> + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) >>>> >>> + { >>>> >>> >>>> >>> which I'd rather see in a separate patch (and done also when >>>> >>> the loop doesn't require versioning for alias). >>>> >>> >>>> >> >>>> >> >>>> >> My mistake.. I am working on those two patches at the same time and >>>> >> pasted that one also here by mistake. I will send another patch about >>>> >> the "hoist" topic. >>>> >> >>>> >> >>>> >>> Also combining the alias checks in vect_create_cond_for_alias_checks >>>> >>> is nice but doesn't properly fix the use of the >>>> >>> vect-max-version-for-alias-checks param which currently inhibits >>>> >>> vectorization of the HIMENO benchmark by default (and make us look bad >>>> >>> compared to LLVM). >>>> >>> >>>> >>> So I believe this merging should be done incrementally when >>>> >>> we collect the DDRs we need to test in vect_mark_for_runtime_alias_test. >>>> >>> >>>> >> >>>> >> >>>> >> I agree that vect-max-version-for-alias-checks param should count the >>>> >> number of checks after the merge. However, the struct >>>> >> data_dependence_relation could not record the new information produced >>>> >> by the merge. The new information I mentioned contains the new segment >>>> >> length for comparisons. This length is calculated right in >>>> >> vect_create_cond_for_alias_checks() function. Since >>>> >> vect-max-version-for-alias-checks is used during analysis phase, shall >>>> >> we move all those (get segment length for each data ref and merge >>>> >> alias checks) from transformation to analysis phase? If we cannot >>>> >> store the result properly (data_dependence_relation is not enough), >>>> >> shall we do it twice in both phases? >>>> >> >>>> >> I also noticed a possible bug in the function vect_same_range_drs() >>>> >> called by vect_prune_runtime_alias_test_list(). For the following code >>>> >> I get two pairs of data refs after >>>> >> vect_prune_runtime_alias_test_list(), but in >>>> >> vect_create_cond_for_alias_checks() after detecting grouped accesses I >>>> >> got two identical pairs of data refs. The consequence is two identical >>>> >> alias checks are produced. >>>> >> >>>> >> >>>> >> void yuv2yuyv_ref (int *d, int *src, int n) >>>> >> { >>>> >> char *dest = (char *)d; >>>> >> int i; >>>> >> >>>> >> for(i=0;i<n/2;i++){ >>>> >> dest[i*4 + 0] = (src[i*2 + 0])>>16; >>>> >> dest[i*4 + 1] = (src[i*2 + 1])>>8; >>>> >> dest[i*4 + 2] = (src[i*2 + 0])>>16; >>>> >> dest[i*4 + 3] = (src[i*2 + 0])>>0; >>>> >> } >>>> >> } >>>> >> >>>> >> >>>> >> I think the solution to this problem is changing >>>> >> >>>> >> GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i)) >>>> >> == GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j) >>>> >> >>>> >> into >>>> >> >>>> >> STMT_VINFO_DATA_REF (vinfo_for_stmt (GROUP_FIRST_ELEMENT >>>> >> (vinfo_for_stmt (stmt_i)))) >>>> >> == STMT_VINFO_DATA_REF (vinfo_for_stmt (GROUP_FIRST_ELEMENT >>>> >> (vinfo_for_stmt (stmt_j))) >>>> >> >>>> >> >>>> >> in function vect_same_range_drs(). What do you think about it? >>>> >> >>>> >> >>>> >> thanks, >>>> >> Cong >>>> >> >>>> >> >>>> >> >>>> >>> Thanks for working on this, >>>> >>> Richard. >>>> >>> >>>> >>>> >>>> >>>> thanks, >>>> >>>> Cong >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Index: gcc/tree-vect-loop-manip.c >>>> >>>> =================================================================== >>>> >>>> --- gcc/tree-vect-loop-manip.c (revision 202662) >>>> >>>> +++ gcc/tree-vect-loop-manip.c (working copy) >>>> >>>> @@ -19,6 +19,10 @@ You should have received a copy of the G >>>> >>>> along with GCC; see the file COPYING3. If not see >>>> >>>> <http://www.gnu.org/licenses/>. */ >>>> >>>> >>>> >>>> +#include <vector> >>>> >>>> +#include <utility> >>>> >>>> +#include <algorithm> >>>> >>>> + >>>> >>>> #include "config.h" >>>> >>>> #include "system.h" >>>> >>>> #include "coretypes.h" >>>> >>>> @@ -2248,6 +2252,74 @@ vect_vfa_segment_size (struct data_refer >>>> >>>> return segment_length; >>>> >>>> } >>>> >>>> >>>> >>>> +namespace >>>> >>>> +{ >>>> >>>> + >>>> >>>> +/* struct dr_addr_with_seg_len >>>> >>>> + >>>> >>>> + A struct storing information of a data reference, including the data >>>> >>>> + ref itself, its basic address, the access offset and the segment length >>>> >>>> + for aliasing checks. */ >>>> >>>> + >>>> >>>> +struct dr_addr_with_seg_len >>>> >>>> +{ >>>> >>>> + dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len) >>>> >>>> + : dr (d), basic_addr (addr), offset (off), seg_len (len) {} >>>> >>>> + >>>> >>>> + data_reference* dr; >>>> >>>> + tree basic_addr; >>>> >>>> + tree offset; >>>> >>>> + tree seg_len; >>>> >>>> +}; >>>> >>>> + >>>> >>>> +/* Operator == between two dr_addr_with_seg_len objects. >>>> >>>> + >>>> >>>> + This equality operator is used to make sure two data refs >>>> >>>> + are the same one so that we will consider to combine the >>>> >>>> + aliasing checks of those two pairs of data dependent data >>>> >>>> + refs. */ >>>> >>>> + >>>> >>>> +bool operator == (const dr_addr_with_seg_len& d1, >>>> >>>> + const dr_addr_with_seg_len& d2) >>>> >>>> +{ >>>> >>>> + return operand_equal_p (d1.basic_addr, d2.basic_addr, 0) >>>> >>>> + && operand_equal_p (d1.offset, d2.offset, 0) >>>> >>>> + && operand_equal_p (d1.seg_len, d2.seg_len, 0); >>>> >>>> +} >>>> >>>> + >>>> >>>> +typedef std::pair <dr_addr_with_seg_len, dr_addr_with_seg_len> >>>> >>>> + dr_addr_with_seg_len_pair_t; >>>> >>>> + >>>> >>>> + >>>> >>>> +/* Operator < between two dr_addr_with_seg_len_pair_t objects. >>>> >>>> + >>>> >>>> + This operator is used to sort objects of dr_addr_with_seg_len_pair_t >>>> >>>> + so that we can combine aliasing checks during one scan. */ >>>> >>>> + >>>> >>>> +bool operator < (const dr_addr_with_seg_len_pair_t& p1, >>>> >>>> + const dr_addr_with_seg_len_pair_t& p2) >>>> >>>> +{ >>>> >>>> + const dr_addr_with_seg_len& p11 = p1.first; >>>> >>>> + const dr_addr_with_seg_len& p12 = p1.second; >>>> >>>> + const dr_addr_with_seg_len& p21 = p2.first; >>>> >>>> + const dr_addr_with_seg_len& p22 = p2.second; >>>> >>>> + >>>> >>>> + if (p11.basic_addr != p21.basic_addr) >>>> >>>> + return p11.basic_addr < p21.basic_addr; >>>> >>>> + if (p12.basic_addr != p22.basic_addr) >>>> >>>> + return p12.basic_addr < p22.basic_addr; >>>> >>>> + if (TREE_CODE (p11.offset) != INTEGER_CST >>>> >>>> + || TREE_CODE (p21.offset) != INTEGER_CST) >>>> >>>> + return p11.offset < p21.offset; >>>> >>>> + if (int_cst_value (p11.offset) != int_cst_value (p21.offset)) >>>> >>>> + return int_cst_value (p11.offset) < int_cst_value (p21.offset); >>>> >>>> + if (TREE_CODE (p12.offset) != INTEGER_CST >>>> >>>> + || TREE_CODE (p22.offset) != INTEGER_CST) >>>> >>>> + return p12.offset < p22.offset; >>>> >>>> + return int_cst_value (p12.offset) < int_cst_value (p22.offset); >>>> >>>> +} >>>> >>>> + >>>> >>>> +} >>>> >>>> >>>> >>>> /* Function vect_create_cond_for_alias_checks. >>>> >>>> >>>> >>>> @@ -2292,20 +2364,51 @@ vect_create_cond_for_alias_checks (loop_ >>>> >>>> if (may_alias_ddrs.is_empty ()) >>>> >>>> return; >>>> >>>> >>>> >>>> + >>>> >>>> + /* Basically, for each pair of dependent data refs store_ptr_0 >>>> >>>> + and load_ptr_0, we create an expression: >>>> >>>> + >>>> >>>> + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) >>>> >>>> + || (load_ptr_0 + load_segment_length_0) <= store_ptr_0)) >>>> >>>> + >>>> >>>> + for aliasing checks. However, in some cases we can decrease >>>> >>>> + the number of checks by combining two checks into one. For >>>> >>>> + example, suppose we have another pair of data refs store_ptr_0 >>>> >>>> + and load_ptr_1, and if the following condition is satisfied: >>>> >>>> + >>>> >>>> + load_ptr_0 < load_ptr_1 && >>>> >>>> + load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0 >>>> >>>> + >>>> >>>> + (this condition means, in each iteration of vectorized loop, >>>> >>>> + the accessed memory of store_ptr_0 cannot be between the memory >>>> >>>> + of load_ptr_0 and load_ptr_1.) >>>> >>>> + >>>> >>>> + we then can use only the following expression to finish the >>>> >>>> + alising checks between store_ptr_0 & load_ptr_0 and >>>> >>>> + store_ptr_0 & load_ptr_1: >>>> >>>> + >>>> >>>> + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) >>>> >>>> + || (load_ptr_1 + load_segment_length_1 <= store_ptr_0)) >>>> >>>> + >>>> >>>> + Note that we only consider that load_ptr_0 and load_ptr_1 have the >>>> >>>> + same basic address. */ >>>> >>>> + >>>> >>>> + std::vector<dr_addr_with_seg_len_pair_t> ddrs_with_seg_len; >>>> >>>> + >>>> >>>> + /* First, we collect all data ref pairs for aliasing checks. */ >>>> >>>> + >>>> >>>> FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) >>>> >>>> { >>>> >>>> struct data_reference *dr_a, *dr_b; >>>> >>>> gimple dr_group_first_a, dr_group_first_b; >>>> >>>> - tree addr_base_a, addr_base_b; >>>> >>>> tree segment_length_a, segment_length_b; >>>> >>>> gimple stmt_a, stmt_b; >>>> >>>> - tree seg_a_min, seg_a_max, seg_b_min, seg_b_max; >>>> >>>> >>>> >>>> dr_a = DDR_A (ddr); >>>> >>>> stmt_a = DR_STMT (DDR_A (ddr)); >>>> >>>> dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); >>>> >>>> if (dr_group_first_a) >>>> >>>> - { >>>> >>>> + { >>>> >>>> stmt_a = dr_group_first_a; >>>> >>>> dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a)); >>>> >>>> } >>>> >>>> @@ -2314,20 +2417,11 @@ vect_create_cond_for_alias_checks (loop_ >>>> >>>> stmt_b = DR_STMT (DDR_B (ddr)); >>>> >>>> dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b)); >>>> >>>> if (dr_group_first_b) >>>> >>>> - { >>>> >>>> + { >>>> >>>> stmt_b = dr_group_first_b; >>>> >>>> dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b)); >>>> >>>> } >>>> >>>> >>>> >>>> - addr_base_a >>>> >>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a), >>>> >>>> - size_binop (PLUS_EXPR, DR_OFFSET (dr_a), >>>> >>>> - DR_INIT (dr_a))); >>>> >>>> - addr_base_b >>>> >>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b), >>>> >>>> - size_binop (PLUS_EXPR, DR_OFFSET (dr_b), >>>> >>>> - DR_INIT (dr_b))); >>>> >>>> - >>>> >>>> if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)) >>>> >>>> length_factor = scalar_loop_iters; >>>> >>>> else >>>> >>>> @@ -2335,24 +2429,149 @@ vect_create_cond_for_alias_checks (loop_ >>>> >>>> segment_length_a = vect_vfa_segment_size (dr_a, length_factor); >>>> >>>> segment_length_b = vect_vfa_segment_size (dr_b, length_factor); >>>> >>>> >>>> >>>> + dr_addr_with_seg_len_pair_t dr_with_seg_len_pair >>>> >>>> + (dr_addr_with_seg_len >>>> >>>> + (dr_a, DR_BASE_ADDRESS (dr_a), >>>> >>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)), >>>> >>>> + segment_length_a), >>>> >>>> + dr_addr_with_seg_len >>>> >>>> + (dr_b, DR_BASE_ADDRESS (dr_b), >>>> >>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)), >>>> >>>> + segment_length_b)); >>>> >>>> + >>>> >>>> + if (dr_with_seg_len_pair.first.basic_addr > >>>> >>>> + dr_with_seg_len_pair.second.basic_addr) >>>> >>>> + std::swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second); >>>> >>>> + >>>> >>>> + ddrs_with_seg_len.push_back (dr_with_seg_len_pair); >>>> >>>> + } >>>> >>>> + >>>> >>>> + /* Second, we sort the collected data ref pairs so that we can scan >>>> >>>> + them once to combine all possible aliasing checks. */ >>>> >>>> + >>>> >>>> + std::sort (ddrs_with_seg_len.begin(), ddrs_with_seg_len.end()); >>>> >>>> + >>>> >>>> + /* Remove duplicate data ref pairs. */ >>>> >>>> + ddrs_with_seg_len.erase (std::unique (ddrs_with_seg_len.begin(), >>>> >>>> + ddrs_with_seg_len.end()), >>>> >>>> + ddrs_with_seg_len.end()); >>>> >>>> + >>>> >>>> + /* We then scan the sorted dr pairs and check if we can combine >>>> >>>> + alias checks of two neighbouring dr pairs. */ >>>> >>>> + >>>> >>>> + for (size_t i = 1; i < ddrs_with_seg_len.size (); ++i) >>>> >>>> + { >>>> >>>> + dr_addr_with_seg_len& dr_a1 = ddrs_with_seg_len[i-1].first; >>>> >>>> + dr_addr_with_seg_len& dr_b1 = ddrs_with_seg_len[i-1].second; >>>> >>>> + dr_addr_with_seg_len& dr_a2 = ddrs_with_seg_len[i].first; >>>> >>>> + dr_addr_with_seg_len& dr_b2 = ddrs_with_seg_len[i].second; >>>> >>>> + >>>> >>>> + if (dr_a1 == dr_a2) >>>> >>>> + { >>>> >>>> + if (dr_b1.basic_addr != dr_b2.basic_addr >>>> >>>> + || TREE_CODE (dr_b1.offset) != INTEGER_CST >>>> >>>> + || TREE_CODE (dr_b2.offset) != INTEGER_CST) >>>> >>>> + continue; >>>> >>>> + >>>> >>>> + int diff = int_cst_value (dr_b2.offset) - >>>> >>>> + int_cst_value (dr_b1.offset); >>>> >>>> + >>>> >>>> + gcc_assert (diff > 0); >>>> >>>> + >>>> >>>> + if (diff <= vect_factor >>>> >>>> + || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST >>>> >>>> + && diff - int_cst_value (dr_b1.seg_len) < vect_factor) >>>> >>>> + || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST >>>> >>>> + && TREE_CODE (dr_a1.seg_len) == INTEGER_CST >>>> >>>> + && diff - int_cst_value (dr_b1.seg_len) < >>>> >>>> + int_cst_value (dr_a1.seg_len))) >>>> >>>> + { >>>> >>>> + if (dump_enabled_p ()) >>>> >>>> + { >>>> >>>> + dump_printf_loc >>>> >>>> + (MSG_NOTE, vect_location, >>>> >>>> + "combining two runtime checks for data references "); >>>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b1.dr)); >>>> >>>> + dump_printf (MSG_NOTE, " and "); >>>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b2.dr)); >>>> >>>> + dump_printf (MSG_NOTE, "\n"); >>>> >>>> + } >>>> >>>> + >>>> >>>> + dr_b1.seg_len = size_binop (PLUS_EXPR, >>>> >>>> + dr_b2.seg_len, size_int (diff)); >>>> >>>> + ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i); >>>> >>>> + --i; >>>> >>>> + } >>>> >>>> + } >>>> >>>> + else if (dr_b1 == dr_b2) >>>> >>>> + { >>>> >>>> + if (dr_a1.basic_addr != dr_a2.basic_addr >>>> >>>> + || TREE_CODE (dr_a1.offset) != INTEGER_CST >>>> >>>> + || TREE_CODE (dr_a2.offset) != INTEGER_CST) >>>> >>>> + continue; >>>> >>>> + >>>> >>>> + int diff = int_cst_value (dr_a2.offset) - >>>> >>>> + int_cst_value (dr_a1.offset); >>>> >>>> + >>>> >>>> + gcc_assert (diff > 0); >>>> >>>> + >>>> >>>> + if (diff <= vect_factor >>>> >>>> + || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST >>>> >>>> + && diff - int_cst_value (dr_a1.seg_len) < vect_factor) >>>> >>>> + || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST >>>> >>>> + && TREE_CODE (dr_b1.seg_len) == INTEGER_CST >>>> >>>> + && diff - int_cst_value (dr_a1.seg_len) < >>>> >>>> + int_cst_value (dr_b1.seg_len))) >>>> >>>> + { >>>> >>>> + if (dump_enabled_p ()) >>>> >>>> + { >>>> >>>> + dump_printf_loc >>>> >>>> + (MSG_NOTE, vect_location, >>>> >>>> + "combining two runtime checks for data references "); >>>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1.dr)); >>>> >>>> + dump_printf (MSG_NOTE, " and "); >>>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2.dr)); >>>> >>>> + dump_printf (MSG_NOTE, "\n"); >>>> >>>> + } >>>> >>>> + >>>> >>>> + dr_a1.seg_len = size_binop (PLUS_EXPR, >>>> >>>> + dr_a2.seg_len, size_int (diff)); >>>> >>>> + ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i); >>>> >>>> + --i; >>>> >>>> + } >>>> >>>> + } >>>> >>>> + } >>>> >>>> + >>>> >>>> + for (size_t i = 0, s = ddrs_with_seg_len.size (); i < s; ++i) >>>> >>>> + { >>>> >>>> + const dr_addr_with_seg_len& dr_a = ddrs_with_seg_len[i].first; >>>> >>>> + const dr_addr_with_seg_len& dr_b = ddrs_with_seg_len[i].second; >>>> >>>> + tree segment_length_a = dr_a.seg_len; >>>> >>>> + tree segment_length_b = dr_b.seg_len; >>>> >>>> + >>>> >>>> + tree addr_base_a >>>> >>>> + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset); >>>> >>>> + tree addr_base_b >>>> >>>> + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset); >>>> >>>> + >>>> >>>> if (dump_enabled_p ()) >>>> >>>> { >>>> >>>> dump_printf_loc (MSG_NOTE, vect_location, >>>> >>>> - "create runtime check for data references "); >>>> >>>> - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a)); >>>> >>>> + "create runtime check for data references "); >>>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr)); >>>> >>>> dump_printf (MSG_NOTE, " and "); >>>> >>>> - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b)); >>>> >>>> - dump_printf (MSG_NOTE, "\n"); >>>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr)); >>>> >>>> + dump_printf (MSG_NOTE, "\n"); >>>> >>>> } >>>> >>>> >>>> >>>> - seg_a_min = addr_base_a; >>>> >>>> - seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); >>>> >>>> - if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0) >>>> >>>> + tree seg_a_min = addr_base_a; >>>> >>>> + tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); >>>> >>>> + if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0) >>>> >>>> seg_a_min = seg_a_max, seg_a_max = addr_base_a; >>>> >>>> >>>> >>>> - seg_b_min = addr_base_b; >>>> >>>> - seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); >>>> >>>> - if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0) >>>> >>>> + tree seg_b_min = addr_base_b; >>>> >>>> + tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); >>>> >>>> + if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0) >>>> >>>> seg_b_min = seg_b_max, seg_b_max = addr_base_b; >>>> >>>> >>>> >>>> part_cond_expr = >>>> >>>> @@ -2477,6 +2696,81 @@ vect_loop_versioning (loop_vec_info loop >>>> >>>> adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); >>>> >>>> } >>>> >>>> >>>> >>>> + /* Extract load and store statements on pointers with zero-stride >>>> >>>> + accesses. */ >>>> >>>> + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) >>>> >>>> + { >>>> >>>> + >>>> >>>> + /* In the loop body, we iterate each statement to check if it is a load >>>> >>>> + or store. Then we check the DR_STEP of the data reference. If >>>> >>>> + DR_STEP is zero, then we will hoist the load statement to the loop >>>> >>>> + preheader, and move the store statement to the loop exit. */ >>>> >>>> + >>>> >>>> + for (gimple_stmt_iterator si = gsi_start_bb (loop->header); >>>> >>>> + !gsi_end_p (si); ) >>>> >>>> + { >>>> >>>> + gimple stmt = gsi_stmt (si); >>>> >>>> + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >>>> >>>> + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); >>>> >>>> + >>>> >>>> + >>>> >>>> + if (dr && integer_zerop (DR_STEP (dr))) >>>> >>>> + { >>>> >>>> + if (DR_IS_READ (dr)) >>>> >>>> + { >>>> >>>> + if (dump_file) >>>> >>>> + { >>>> >>>> + fprintf (dump_file, >>>> >>>> + "Hoist the load to outside of the loop:\n"); >>>> >>>> + print_gimple_stmt (dump_file, stmt, 0, >>>> >>>> + TDF_VOPS|TDF_MEMSYMS); >>>> >>>> + } >>>> >>>> + >>>> >>>> + basic_block preheader = loop_preheader_edge (loop)->src; >>>> >>>> + gimple_stmt_iterator si_dst = gsi_last_bb (preheader); >>>> >>>> + gsi_move_after (&si, &si_dst); >>>> >>>> + } >>>> >>>> + else >>>> >>>> + { >>>> >>>> + gimple_stmt_iterator si_dst = >>>> >>>> + gsi_last_bb (single_exit (loop)->dest); >>>> >>>> + gsi_move_after (&si, &si_dst); >>>> >>>> + } >>>> >>>> + continue; >>>> >>>> + } >>>> >>>> + else if (!dr) >>>> >>>> + { >>>> >>>> + bool hoist = true; >>>> >>>> + for (size_t i = 0; i < gimple_num_ops (stmt); i++) >>>> >>>> + { >>>> >>>> + tree op = gimple_op (stmt, i); >>>> >>>> + if (TREE_CODE (op) == INTEGER_CST >>>> >>>> + || TREE_CODE (op) == REAL_CST) >>>> >>>> + continue; >>>> >>>> + if (TREE_CODE (op) == SSA_NAME) >>>> >>>> + { >>>> >>>> + gimple def = SSA_NAME_DEF_STMT (op); >>>> >>>> + if (def == stmt >>>> >>>> + || gimple_nop_p (def) >>>> >>>> + || !flow_bb_inside_loop_p (loop, gimple_bb (def))) >>>> >>>> + continue; >>>> >>>> + } >>>> >>>> + hoist = false; >>>> >>>> + break; >>>> >>>> + } >>>> >>>> + >>>> >>>> + if (hoist) >>>> >>>> + { >>>> >>>> + basic_block preheader = loop_preheader_edge (loop)->src; >>>> >>>> + gimple_stmt_iterator si_dst = gsi_last_bb (preheader); >>>> >>>> + gsi_move_after (&si, &si_dst); >>>> >>>> + continue; >>>> >>>> + } >>>> >>>> + } >>>> >>>> + gsi_next (&si); >>>> >>>> + } >>>> >>>> + } >>>> >>>> + >>>> >>>> /* End loop-exit-fixes after versioning. */ >>>> >>>> >>>> >>>> if (cond_expr_stmt_list) >>>> >>>> Index: gcc/ChangeLog >>>> >>>> =================================================================== >>>> >>>> --- gcc/ChangeLog (revision 202663) >>>> >>>> +++ gcc/ChangeLog (working copy) >>>> >>>> @@ -1,3 +1,8 @@ >>>> >>>> +2013-10-01 Cong Hou <congh@google.com> >>>> >>>> + >>>> >>>> + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Combine >>>> >>>> + alias checks if it is possible to amortize the runtime overhead. >>>> >>>> + >>>> >>>> >>>> >>>> >>>> >>> >>>> >>> -- >>>> >>> Richard Biener <rguenther@suse.de> >>>> >>> SUSE / SUSE Labs >>>> >>> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 >>>> >>> GF: Jeff Hawn, Jennifer Guild, Felix Imend
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..7333f6a 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2013-10-14 Cong Hou <congh@google.com> + + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): + Combine alias checks if it is possible to amortize the runtime + overhead. Return the number of alias checks after merging. + * tree-vect-data-refs.c (vect_prune_runtime_alias_test_list): + Use the function vect_create_cond_for_alias_checks () to check + the number of alias checks. + 2013-10-14 David Malcolm <dmalcolm@redhat.com> * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 075d071..ea2782f 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2013-10-14 Cong Hou <congh@google.com> + + * gcc.dg/vect/vect-alias-check.c: New. + 2013-10-14 Tobias Burnus <burnus@net-b.de> PR fortran/58658 diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c new file mode 100644 index 0000000..bf9eb6b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize + --param=vect-max-version-for-alias-checks=2 + -fdump-tree-vect-details" } */ + +/* A test case showing three potential alias checks between + a[i] and b[i], b[i+7], b[i+14]. With alias checks merging + enabled, those tree checks can be merged into one, and the + loop will be vectorized with vect-max-version-for-alias-checks=2. */ + +void foo (int *a, int *b) +{ + int i; + for (i = 0; i < 1000; ++i) + a[i] = b[i] + b[i+7] + b[i+14]; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index e7b2f33..5f70d58 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -128,41 +128,6 @@ vect_get_smallest_scalar_type (gimple stmt, HOST_WIDE_INT *lhs_size_unit, } -/* Check if data references pointed by DR_I and DR_J are same or - belong to same interleaving group. Return FALSE if drs are - different, otherwise return TRUE. */ - -static bool -vect_same_range_drs (data_reference_p dr_i, data_reference_p dr_j) -{ - gimple stmt_i = DR_STMT (dr_i); - gimple stmt_j = DR_STMT (dr_j); - - if (operand_equal_p (DR_REF (dr_i), DR_REF (dr_j), 0) - || (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i)) - && GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j)) - && (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i)) - == GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j))))) - return true; - else - return false; -} - -/* If address ranges represented by DDR_I and DDR_J are equal, - return TRUE, otherwise return FALSE. */ - -static bool -vect_vfa_range_equal (ddr_p ddr_i, ddr_p ddr_j) -{ - if ((vect_same_range_drs (DDR_A (ddr_i), DDR_A (ddr_j)) - && vect_same_range_drs (DDR_B (ddr_i), DDR_B (ddr_j))) - || (vect_same_range_drs (DDR_A (ddr_i), DDR_B (ddr_j)) - && vect_same_range_drs (DDR_B (ddr_i), DDR_A (ddr_j)))) - return true; - else - return false; -} - /* Insert DDR into LOOP_VINFO list of ddrs that may alias and need to be tested at run-time. Return TRUE if DDR was successfully inserted. Return false if versioning is not supported. */ @@ -2647,69 +2612,331 @@ vect_analyze_data_ref_accesses (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo) return true; } + +/* Operator == between two dr_addr_with_seg_len objects. + + This equality operator is used to make sure two data refs + are the same one so that we will consider to combine the + aliasing checks of those two pairs of data dependent data + refs. */ + +static bool +operator == (const dr_addr_with_seg_len& d1, + const dr_addr_with_seg_len& d2) +{ + return operand_equal_p (d1.basic_addr, d2.basic_addr, 0) + && compare_tree (d1.offset, d2.offset) == 0 + && compare_tree (d1.seg_len, d2.seg_len) == 0; +} + +/* Function comp_dr_addr_with_seg_len_pair. + + Comparison function for sorting objects of dr_addr_with_seg_len_pair_t + so that we can combine aliasing checks in one scan. */ + +static int +comp_dr_addr_with_seg_len_pair (const void *p1_, const void *p2_) +{ + const dr_addr_with_seg_len_pair_t* p1 = + (const dr_addr_with_seg_len_pair_t *) p1_; + const dr_addr_with_seg_len_pair_t* p2 = + (const dr_addr_with_seg_len_pair_t *) p2_; + + const dr_addr_with_seg_len &p11 = p1->first, + &p12 = p1->second, + &p21 = p2->first, + &p22 = p2->second; + + int comp_res = compare_tree (p11.basic_addr, p21.basic_addr); + if (comp_res != 0) + return comp_res; + + comp_res = compare_tree (p12.basic_addr, p22.basic_addr); + if (comp_res != 0) + return comp_res; + + if (TREE_CODE (p11.offset) != INTEGER_CST + || TREE_CODE (p21.offset) != INTEGER_CST) + { + comp_res = compare_tree (p11.offset, p21.offset); + if (comp_res != 0) + return comp_res; + } + if (tree_int_cst_compare (p11.offset, p21.offset) < 0) + return -1; + if (tree_int_cst_compare (p11.offset, p21.offset) > 0) + return 1; + if (TREE_CODE (p12.offset) != INTEGER_CST + || TREE_CODE (p22.offset) != INTEGER_CST) + { + comp_res = compare_tree (p12.offset, p22.offset); + if (comp_res != 0) + return comp_res; + } + if (tree_int_cst_compare (p12.offset, p22.offset) < 0) + return -1; + if (tree_int_cst_compare (p12.offset, p22.offset) > 0) + return 1; + + return 0; +} + +template <class T> static void +swap (T& a, T& b) +{ + T c (a); + a = b; + b = c; +} + +/* Function vect_vfa_segment_size. + + Create an expression that computes the size of segment + that will be accessed for a data reference. The functions takes into + account that realignment loads may access one more vector. + + Input: + DR: The data reference. + LENGTH_FACTOR: segment length to consider. + + Return an expression whose value is the size of segment which will be + accessed by DR. */ + +static tree +vect_vfa_segment_size (struct data_reference *dr, tree length_factor) +{ + tree segment_length; + + if (integer_zerop (DR_STEP (dr))) + segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr))); + else + segment_length = size_binop (MULT_EXPR, + fold_convert (sizetype, DR_STEP (dr)), + fold_convert (sizetype, length_factor)); + + if (vect_supportable_dr_alignment (dr, false) + == dr_explicit_realign_optimized) + { + tree vector_size = TYPE_SIZE_UNIT + (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)))); + + segment_length = size_binop (PLUS_EXPR, segment_length, vector_size); + } + return segment_length; +} + /* Function vect_prune_runtime_alias_test_list. Prune a list of ddrs to be tested at run-time by versioning for alias. + Merge several alias checks into one if possible. Return FALSE if resulting list of ddrs is longer then allowed by PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS, otherwise return TRUE. */ bool vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo) { - vec<ddr_p> ddrs = + vec<ddr_p> may_alias_ddrs = LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo); - unsigned i, j; + vec<dr_addr_with_seg_len_pair_t>& comp_alias_ddrs = + LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo); + int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); + tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo); + + ddr_p ddr; + unsigned int i; + tree length_factor; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "=== vect_prune_runtime_alias_test_list ===\n"); - for (i = 0; i < ddrs.length (); ) + if (may_alias_ddrs.is_empty ()) + return true; + + /* Basically, for each pair of dependent data refs store_ptr_0 + and load_ptr_0, we create an expression: + + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) + || (load_ptr_0 + load_segment_length_0) <= store_ptr_0)) + + for aliasing checks. However, in some cases we can decrease + the number of checks by combining two checks into one. For + example, suppose we have another pair of data refs store_ptr_0 + and load_ptr_1, and if the following condition is satisfied: + + load_ptr_0 < load_ptr_1 && + load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0 + + (this condition means, in each iteration of vectorized loop, + the accessed memory of store_ptr_0 cannot be between the memory + of load_ptr_0 and load_ptr_1.) + + we then can use only the following expression to finish the + alising checks between store_ptr_0 & load_ptr_0 and + store_ptr_0 & load_ptr_1: + + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) + || (load_ptr_1 + load_segment_length_1 <= store_ptr_0)) + + Note that we only consider that load_ptr_0 and load_ptr_1 have the + same basic address. */ + + comp_alias_ddrs.create (may_alias_ddrs.length ()); + + /* First, we collect all data ref pairs for aliasing checks. */ + FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) { - bool found; - ddr_p ddr_i; + struct data_reference *dr_a, *dr_b; + gimple dr_group_first_a, dr_group_first_b; + tree segment_length_a, segment_length_b; + gimple stmt_a, stmt_b; + + dr_a = DDR_A (ddr); + stmt_a = DR_STMT (DDR_A (ddr)); + dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); + if (dr_group_first_a) + { + stmt_a = dr_group_first_a; + dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a)); + } - ddr_i = ddrs[i]; - found = false; + dr_b = DDR_B (ddr); + stmt_b = DR_STMT (DDR_B (ddr)); + dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b)); + if (dr_group_first_b) + { + stmt_b = dr_group_first_b; + dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b)); + } - for (j = 0; j < i; j++) - { - ddr_p ddr_j = ddrs[j]; + if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)) + length_factor = scalar_loop_iters; + else + length_factor = size_int (vect_factor); + segment_length_a = vect_vfa_segment_size (dr_a, length_factor); + segment_length_b = vect_vfa_segment_size (dr_b, length_factor); + + dr_addr_with_seg_len_pair_t dr_with_seg_len_pair + (dr_addr_with_seg_len + (dr_a, DR_BASE_ADDRESS (dr_a), + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)), + segment_length_a), + dr_addr_with_seg_len + (dr_b, DR_BASE_ADDRESS (dr_b), + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)), + segment_length_b)); + + if (compare_tree (dr_with_seg_len_pair.first.basic_addr, + dr_with_seg_len_pair.second.basic_addr) > 0) + swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second); + + comp_alias_ddrs.safe_push (dr_with_seg_len_pair); + } - if (vect_vfa_range_equal (ddr_i, ddr_j)) + /* Second, we sort the collected data ref pairs so that we can scan + them once to combine all possible aliasing checks. */ + comp_alias_ddrs.qsort (comp_dr_addr_with_seg_len_pair); + + /* Third, we scan the sorted dr pairs and check if we can combine + alias checks of two neighbouring dr pairs. */ + for (size_t i = 1; i < comp_alias_ddrs.length (); ++i) + { + /* Deal with two ddrs (dr_a1, dr_b1) and (dr_a2, dr_b2). */ + dr_addr_with_seg_len *dr_a1 = &comp_alias_ddrs[i-1].first, + *dr_b1 = &comp_alias_ddrs[i-1].second, + *dr_a2 = &comp_alias_ddrs[i].first, + *dr_b2 = &comp_alias_ddrs[i].second; + + /* Remove duplicate data ref pairs. */ + if (*dr_a1 == *dr_a2 && *dr_b1 == *dr_b2) + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + "found equal ranges "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, + DR_REF (dr_a1->dr)); + dump_printf (MSG_NOTE, ", "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, + DR_REF (dr_b1->dr)); + dump_printf (MSG_NOTE, " and "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, + DR_REF (dr_a2->dr)); + dump_printf (MSG_NOTE, ", "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, + DR_REF (dr_b2->dr)); + dump_printf (MSG_NOTE, "\n"); + } + + comp_alias_ddrs.ordered_remove (i--); + continue; + } + + if (*dr_a1 == *dr_a2 || *dr_b1 == *dr_b2) + { + /* We consider the case that DR_B1 and DR_B2 are same memrefs, + and DR_A1 and DR_A2 are two consecutive memrefs. */ + if (*dr_a1 == *dr_a2) + { + swap (dr_a1, dr_b1); + swap (dr_a2, dr_b2); + } + + if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0) + || !host_integerp (dr_a1->offset, 0) + || !host_integerp (dr_a2->offset, 0)) + continue; + + HOST_WIDEST_INT diff = widest_int_cst_value (dr_a2->offset) - + widest_int_cst_value (dr_a1->offset); + + + /* Now we check if the following condition is satisfied: + + DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B + + where DIFF = DR_A2->OFFSET - DR_A1->OFFSET. However, + SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we + have to make a best estimation. We can get the minimum value + of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B, + then either of the following two conditions can guarantee the + one above: + + 1: DIFF <= MIN_SEG_LEN_B + 2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B + + */ + + HOST_WIDEST_INT + min_seg_len_b = (TREE_CODE (dr_b1->seg_len) == INTEGER_CST) ? + widest_int_cst_value (dr_b1->seg_len) : + vect_factor; + + if (diff <= min_seg_len_b + || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST + && diff - widest_int_cst_value (dr_a1->seg_len) < + min_seg_len_b)) { if (dump_enabled_p ()) { - dump_printf_loc (MSG_NOTE, vect_location, - "found equal ranges "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, - DR_REF (DDR_A (ddr_i))); - dump_printf (MSG_NOTE, ", "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, - DR_REF (DDR_B (ddr_i))); - dump_printf (MSG_NOTE, " and "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, - DR_REF (DDR_A (ddr_j))); - dump_printf (MSG_NOTE, ", "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, - DR_REF (DDR_B (ddr_j))); + dump_printf_loc + (MSG_NOTE, vect_location, + "merging two runtime checks for data references "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1->dr)); + dump_printf (MSG_NOTE, " and "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2->dr)); dump_printf (MSG_NOTE, "\n"); } - found = true; - break; + + dr_a1->seg_len = size_binop (PLUS_EXPR, + dr_a2->seg_len, size_int (diff)); + comp_alias_ddrs.ordered_remove (i--); } } - - if (found) - { - ddrs.ordered_remove (i); - continue; - } - i++; } - if (ddrs.length () > - (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS)) + if ((int) comp_alias_ddrs.length () > + PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS)) { if (dump_enabled_p ()) { @@ -2718,8 +2945,6 @@ vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo) "generated checks exceeded.\n"); } - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); - return false; } diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 574446a..1e03853 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -2211,44 +2211,6 @@ vect_create_cond_for_align_checks (loop_vec_info loop_vinfo, *cond_expr = part_cond_expr; } - -/* Function vect_vfa_segment_size. - - Create an expression that computes the size of segment - that will be accessed for a data reference. The functions takes into - account that realignment loads may access one more vector. - - Input: - DR: The data reference. - LENGTH_FACTOR: segment length to consider. - - Return an expression whose value is the size of segment which will be - accessed by DR. */ - -static tree -vect_vfa_segment_size (struct data_reference *dr, tree length_factor) -{ - tree segment_length; - - if (integer_zerop (DR_STEP (dr))) - segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr))); - else - segment_length = size_binop (MULT_EXPR, - fold_convert (sizetype, DR_STEP (dr)), - fold_convert (sizetype, length_factor)); - - if (vect_supportable_dr_alignment (dr, false) - == dr_explicit_realign_optimized) - { - tree vector_size = TYPE_SIZE_UNIT - (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)))); - - segment_length = size_binop (PLUS_EXPR, segment_length, vector_size); - } - return segment_length; -} - - /* Function vect_create_cond_for_alias_checks. Create a conditional expression that represents the run-time checks for @@ -2257,28 +2219,24 @@ vect_vfa_segment_size (struct data_reference *dr, tree length_factor) Input: COND_EXPR - input conditional expression. New conditions will be chained - with logical AND operation. + with logical AND operation. If it is NULL, then the function + is used to return the number of alias checks. LOOP_VINFO - field LOOP_VINFO_MAY_ALIAS_STMTS contains the list of ddrs to be checked. Output: COND_EXPR - conditional expression. - The returned value is the conditional expression to be used in the if + The returned COND_EXPR is the conditional expression to be used in the if statement that controls which version of the loop gets executed at runtime. */ -static void +void vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr) { - vec<ddr_p> may_alias_ddrs = - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo); - int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); - tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo); - - ddr_p ddr; - unsigned int i; - tree part_cond_expr, length_factor; + vec<dr_addr_with_seg_len_pair_t> comp_alias_ddrs = + LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo); + tree part_cond_expr; /* Create expression ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) @@ -2289,70 +2247,39 @@ vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr) ((store_ptr_n + store_segment_length_n) <= load_ptr_n) || (load_ptr_n + load_segment_length_n) <= store_ptr_n)) */ - if (may_alias_ddrs.is_empty ()) + if (comp_alias_ddrs.is_empty ()) return; - FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) + for (size_t i = 0, s = comp_alias_ddrs.length (); i < s; ++i) { - struct data_reference *dr_a, *dr_b; - gimple dr_group_first_a, dr_group_first_b; - tree addr_base_a, addr_base_b; - tree segment_length_a, segment_length_b; - gimple stmt_a, stmt_b; - tree seg_a_min, seg_a_max, seg_b_min, seg_b_max; - - dr_a = DDR_A (ddr); - stmt_a = DR_STMT (DDR_A (ddr)); - dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); - if (dr_group_first_a) - { - stmt_a = dr_group_first_a; - dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a)); - } - - dr_b = DDR_B (ddr); - stmt_b = DR_STMT (DDR_B (ddr)); - dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b)); - if (dr_group_first_b) - { - stmt_b = dr_group_first_b; - dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b)); - } + const dr_addr_with_seg_len& dr_a = comp_alias_ddrs[i].first; + const dr_addr_with_seg_len& dr_b = comp_alias_ddrs[i].second; + tree segment_length_a = dr_a.seg_len; + tree segment_length_b = dr_b.seg_len; - addr_base_a - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a), - size_binop (PLUS_EXPR, DR_OFFSET (dr_a), - DR_INIT (dr_a))); - addr_base_b - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b), - size_binop (PLUS_EXPR, DR_OFFSET (dr_b), - DR_INIT (dr_b))); - - if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)) - length_factor = scalar_loop_iters; - else - length_factor = size_int (vect_factor); - segment_length_a = vect_vfa_segment_size (dr_a, length_factor); - segment_length_b = vect_vfa_segment_size (dr_b, length_factor); + tree addr_base_a + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset); + tree addr_base_b + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset); if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, - "create runtime check for data references "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a)); + "create runtime check for data references "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr)); dump_printf (MSG_NOTE, " and "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b)); - dump_printf (MSG_NOTE, "\n"); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr)); + dump_printf (MSG_NOTE, "\n"); } - seg_a_min = addr_base_a; - seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); - if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0) + tree seg_a_min = addr_base_a; + tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); + if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0) seg_a_min = seg_a_max, seg_a_max = addr_base_a; - seg_b_min = addr_base_b; - seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); - if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0) + tree seg_b_min = addr_base_b; + tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); + if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0) seg_b_min = seg_b_max, seg_b_max = addr_base_b; part_cond_expr = @@ -2370,7 +2297,9 @@ vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr) if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "created %u versioning for alias checks.\n", - may_alias_ddrs.length ()); + comp_alias_ddrs.length ()); + + comp_alias_ddrs.release (); } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 8b7b345..f231b62 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -175,6 +175,36 @@ typedef struct _slp_oprnd_info +/* This struct is used to store the information of a data reference, + including the data ref itself, its basic address, the access offset + and the segment length for aliasing checks. This is used to generate + alias checks. */ + +struct dr_addr_with_seg_len +{ + dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len) + : dr (d), basic_addr (addr), offset (off), seg_len (len) {} + + data_reference *dr; + tree basic_addr; + tree offset; + tree seg_len; +}; + +/* This struct contains two dr_addr_with_seg_len objects with aliasing data + refs. Two comparisons are generated from them. */ + +struct dr_addr_with_seg_len_pair_t +{ + dr_addr_with_seg_len_pair_t (const dr_addr_with_seg_len& d1, + const dr_addr_with_seg_len& d2) + : first (d1), second (d2) {} + + dr_addr_with_seg_len first; + dr_addr_with_seg_len second; +}; + + typedef struct _vect_peel_info { int npeel; @@ -274,6 +304,10 @@ typedef struct _loop_vec_info { for a run-time aliasing check. */ vec<ddr_p> may_alias_ddrs; + /* Data Dependence Relations defining address ranges together with segment + lengths from which the run-time aliasing check is built. */ + vec<dr_addr_with_seg_len_pair_t> comp_alias_ddrs; + /* Statements in the loop that have data references that are candidates for a runtime (loop versioning) misalignment check. */ vec<gimple> may_misalign_stmts; @@ -336,6 +370,7 @@ typedef struct _loop_vec_info { #define LOOP_VINFO_MAY_MISALIGN_STMTS(L) (L)->may_misalign_stmts #define LOOP_VINFO_LOC(L) (L)->loop_line_number #define LOOP_VINFO_MAY_ALIAS_DDRS(L) (L)->may_alias_ddrs +#define LOOP_VINFO_COMP_ALIAS_DDRS(L) (L)->comp_alias_ddrs #define LOOP_VINFO_GROUPED_STORES(L) (L)->grouped_stores #define LOOP_VINFO_SLP_INSTANCES(L) (L)->slp_instances #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor