diff mbox

Reducing number of alias checks in vectorization.

Message ID CAK=A3=1Tui9OfnsZNBOr7qa8SB_1rn0nHDXdg1K84r4Dk5ZCiw@mail.gmail.com
State New
Headers show

Commit Message

Cong Hou Oct. 14, 2013, 10:43 p.m. UTC
Sorry for forgetting using plain-text mode. Resend it.


---------- Forwarded message ----------
From: Cong Hou <congh@google.com>
Date: Mon, Oct 14, 2013 at 3:29 PM
Subject: Re: [PATCH] Reducing number of alias checks in vectorization.
To: Richard Biener <rguenther@suse.de>, GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Jakub Jelinek <jakub@redhat.com>


I have made a new patch for this issue according to your comments.

There are several modifications to my previous patch:


1. Remove the use of STL features such as vector and sort. Use GCC's
vec and qsort instead.

2. Comparisons between tree nodes are not based on their addresses any
more. Use compare_tree() function instead.

3. The function vect_create_cond_for_alias_checks() now returns the
number of alias checks. If its second parameter cond_expr is NULL,
then this function only calculate the number of alias checks after the
merging and won't generate comparison expressions.

4. The function vect_prune_runtime_alias_test_list() now uses
vect_create_cond_for_alias_checks() to get the number of alias checks.


The patch is attached as a text file.

Please give me your comment on this patch. Thank you!


Cong


On Thu, Oct 3, 2013 at 2:35 PM, Cong Hou <congh@google.com> wrote:
>
> Forget about this "aux" idea as the segment length for one data ref
> can be different in different dr pairs.
>
> In my patch I created a struct as shown below:
>
> struct dr_addr_with_seg_len
> {
>   data_reference *dr;
>   tree basic_addr;
>   tree offset;
>   tree seg_len;
> };
>
>
> Note that basic_addr and offset can always obtained from dr, but we
> need to store two segment lengths for each dr pair. It is improper to
> add a field to data_dependence_relation as it is defined outside of
> vectorizer. We can change the type (a new one combining
> data_dependence_relation and segment length) of may_alias_ddrs in
> loop_vec_info to include such information, but we have to add a new
> type to tree-vectorizer.h which is only used in two places - still too
> much.
>
> One possible solution is that we create a local struct as shown above
> and a new function which returns the merged alias check information.
> This function will be called twice: once during analysis phase and
> once in transformation phase. Then we don't have to store the merged
> alias check information during those two phases. The additional time
> cost is minimal as there will not be too many data dependent dr pairs
> in a loop.
>
> Any comment?
>
>
> thanks,
> Cong
>
>
> On Thu, Oct 3, 2013 at 10:57 AM, Cong Hou <congh@google.com> wrote:
> > I noticed that there is a "struct dataref_aux" defined in
> > tree-vectorizer.h which is specific to the vectorizer pass and is
> > stored in (void*)aux in "struct data_reference". Can we add one more
> > field "segment_length" to dataref_aux so that we can pass this
> > information for merging alias checks? Then we can avoid to modify or
> > create other structures.
> >
> >
> > thanks,
> > Cong
> >
> >
> > On Wed, Oct 2, 2013 at 2:34 PM, Cong Hou <congh@google.com> wrote:
> >> On Wed, Oct 2, 2013 at 4:24 AM, Richard Biener <rguenther@suse.de> wrote:
> >>> On Tue, 1 Oct 2013, Cong Hou wrote:
> >>>
> >>>> When alias exists between data refs in a loop, to vectorize it GCC
> >>>> does loop versioning and adds runtime alias checks. Basically for each
> >>>> pair of data refs with possible data dependence, there will be two
> >>>> comparisons generated to make sure there is no aliasing between them
> >>>> in each iteration of the vectorized loop. If there are many such data
> >>>> refs pairs, the number of comparisons can be very large, which is a
> >>>> big overhead.
> >>>>
> >>>> However, in some cases it is possible to reduce the number of those
> >>>> comparisons. For example, for the following loop, we can detect that
> >>>> b[0] and b[1] are two consecutive member accesses so that we can
> >>>> combine the alias check between a[0:100]&b[0] and a[0:100]&b[1] into
> >>>> checking a[0:100]&b[0:2]:
> >>>>
> >>>> void foo(int*a, int* b)
> >>>> {
> >>>>    for (int i = 0; i < 100; ++i)
> >>>>     a[i] = b[0] + b[1];
> >>>> }
> >>>>
> >>>> Actually, the requirement of consecutive memory accesses is too
> >>>> strict. For the following loop, we can still combine the alias checks
> >>>> between a[0:100]&b[0] and a[0:100]&b[100]:
> >>>>
> >>>> void foo(int*a, int* b)
> >>>> {
> >>>>    for (int i = 0; i < 100; ++i)
> >>>>     a[i] = b[0] + b[100];
> >>>> }
> >>>>
> >>>> This is because if b[0] is not in a[0:100] and b[100] is not in
> >>>> a[0:100] then a[0:100] cannot be between b[0] and b[100]. We only need
> >>>> to check a[0:100] and b[0:101] don't overlap.
> >>>>
> >>>> More generally, consider two pairs of data refs (a, b1) and (a, b2).
> >>>> Suppose addr_b1 and addr_b2 are basic addresses of data ref b1 and b2;
> >>>> offset_b1 and offset_b2 (offset_b1 < offset_b2) are offsets of b1 and
> >>>> b2, and segment_length_a, segment_length_b1, and segment_length_b2 are
> >>>> segment length of a, b1, and b2. Then we can combine the two
> >>>> comparisons into one if the following condition is satisfied:
> >>>>
> >>>> offset_b2- offset_b1 - segment_length_b1 < segment_length_a
> >>>>
> >>>>
> >>>> This patch detects those combination opportunities to reduce the
> >>>> number of alias checks. It is tested on an x86-64 machine.
> >>>
> >>> Apart from the other comments you got (to which I agree) the patch
> >>> seems to do two things, namely also:
> >>>
> >>> +  /* Extract load and store statements on pointers with zero-stride
> >>> +     accesses.  */
> >>> +  if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
> >>> +    {
> >>>
> >>> which I'd rather see in a separate patch (and done also when
> >>> the loop doesn't require versioning for alias).
> >>>
> >>
> >>
> >> My mistake.. I am working on those two patches at the same time and
> >> pasted that one also here by mistake. I will send another patch about
> >> the "hoist" topic.
> >>
> >>
> >>> Also combining the alias checks in vect_create_cond_for_alias_checks
> >>> is nice but doesn't properly fix the use of the
> >>> vect-max-version-for-alias-checks param which currently inhibits
> >>> vectorization of the HIMENO benchmark by default (and make us look bad
> >>> compared to LLVM).
> >>>
> >>> So I believe this merging should be done incrementally when
> >>> we collect the DDRs we need to test in vect_mark_for_runtime_alias_test.
> >>>
> >>
> >>
> >> I agree that vect-max-version-for-alias-checks param should count the
> >> number of checks after the merge. However, the struct
> >> data_dependence_relation could not record the new information produced
> >> by the merge. The new information I mentioned contains the new segment
> >> length for comparisons. This length is calculated right in
> >> vect_create_cond_for_alias_checks() function. Since
> >> vect-max-version-for-alias-checks is used during analysis phase, shall
> >> we move all those (get segment length for each data ref and merge
> >> alias checks) from transformation to analysis phase? If we cannot
> >> store the result properly (data_dependence_relation is not enough),
> >> shall we do it twice in both phases?
> >>
> >> I also noticed a possible bug in the function vect_same_range_drs()
> >> called by vect_prune_runtime_alias_test_list(). For the following code
> >> I get two pairs of data refs after
> >> vect_prune_runtime_alias_test_list(), but in
> >> vect_create_cond_for_alias_checks() after detecting grouped accesses I
> >> got two identical pairs of data refs. The consequence is two identical
> >> alias checks are produced.
> >>
> >>
> >> void yuv2yuyv_ref (int *d, int *src, int n)
> >> {
> >>   char *dest = (char *)d;
> >>   int i;
> >>
> >>   for(i=0;i<n/2;i++){
> >>     dest[i*4 + 0] = (src[i*2 + 0])>>16;
> >>     dest[i*4 + 1] = (src[i*2 + 1])>>8;
> >>     dest[i*4 + 2] = (src[i*2 + 0])>>16;
> >>     dest[i*4 + 3] = (src[i*2 + 0])>>0;
> >>   }
> >> }
> >>
> >>
> >> I think the solution to this problem is changing
> >>
> >> GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i))
> >> == GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j)
> >>
> >> into
> >>
> >> STMT_VINFO_DATA_REF (vinfo_for_stmt (GROUP_FIRST_ELEMENT
> >> (vinfo_for_stmt (stmt_i))))
> >> == STMT_VINFO_DATA_REF (vinfo_for_stmt (GROUP_FIRST_ELEMENT
> >> (vinfo_for_stmt (stmt_j)))
> >>
> >>
> >> in function vect_same_range_drs(). What do you think about it?
> >>
> >>
> >> thanks,
> >> Cong
> >>
> >>
> >>
> >>> Thanks for working on this,
> >>> Richard.
> >>>
> >>>>
> >>>> thanks,
> >>>> Cong
> >>>>
> >>>>
> >>>>
> >>>> Index: gcc/tree-vect-loop-manip.c
> >>>> ===================================================================
> >>>> --- gcc/tree-vect-loop-manip.c (revision 202662)
> >>>> +++ gcc/tree-vect-loop-manip.c (working copy)
> >>>> @@ -19,6 +19,10 @@ You should have received a copy of the G
> >>>>  along with GCC; see the file COPYING3.  If not see
> >>>>  <http://www.gnu.org/licenses/>.  */
> >>>>
> >>>> +#include <vector>
> >>>> +#include <utility>
> >>>> +#include <algorithm>
> >>>> +
> >>>>  #include "config.h"
> >>>>  #include "system.h"
> >>>>  #include "coretypes.h"
> >>>> @@ -2248,6 +2252,74 @@ vect_vfa_segment_size (struct data_refer
> >>>>    return segment_length;
> >>>>  }
> >>>>
> >>>> +namespace
> >>>> +{
> >>>> +
> >>>> +/* struct dr_addr_with_seg_len
> >>>> +
> >>>> +   A struct storing information of a data reference, including the data
> >>>> +   ref itself, its basic address, the access offset and the segment length
> >>>> +   for aliasing checks.  */
> >>>> +
> >>>> +struct dr_addr_with_seg_len
> >>>> +{
> >>>> +  dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len)
> >>>> +    : dr (d), basic_addr (addr), offset (off), seg_len (len) {}
> >>>> +
> >>>> +  data_reference* dr;
> >>>> +  tree basic_addr;
> >>>> +  tree offset;
> >>>> +  tree seg_len;
> >>>> +};
> >>>> +
> >>>> +/* Operator == between two dr_addr_with_seg_len objects.
> >>>> +
> >>>> +   This equality operator is used to make sure two data refs
> >>>> +   are the same one so that we will consider to combine the
> >>>> +   aliasing checks of those two pairs of data dependent data
> >>>> +   refs.  */
> >>>> +
> >>>> +bool operator == (const dr_addr_with_seg_len& d1,
> >>>> +  const dr_addr_with_seg_len& d2)
> >>>> +{
> >>>> +  return operand_equal_p (d1.basic_addr, d2.basic_addr, 0)
> >>>> + && operand_equal_p (d1.offset, d2.offset, 0)
> >>>> + && operand_equal_p (d1.seg_len, d2.seg_len, 0);
> >>>> +}
> >>>> +
> >>>> +typedef std::pair <dr_addr_with_seg_len, dr_addr_with_seg_len>
> >>>> + dr_addr_with_seg_len_pair_t;
> >>>> +
> >>>> +
> >>>> +/* Operator < between two dr_addr_with_seg_len_pair_t objects.
> >>>> +
> >>>> +   This operator is used to sort objects of dr_addr_with_seg_len_pair_t
> >>>> +   so that we can combine aliasing checks during one scan.  */
> >>>> +
> >>>> +bool operator < (const dr_addr_with_seg_len_pair_t& p1,
> >>>> + const dr_addr_with_seg_len_pair_t& p2)
> >>>> +{
> >>>> +  const dr_addr_with_seg_len& p11 = p1.first;
> >>>> +  const dr_addr_with_seg_len& p12 = p1.second;
> >>>> +  const dr_addr_with_seg_len& p21 = p2.first;
> >>>> +  const dr_addr_with_seg_len& p22 = p2.second;
> >>>> +
> >>>> +  if (p11.basic_addr != p21.basic_addr)
> >>>> +    return p11.basic_addr < p21.basic_addr;
> >>>> +  if (p12.basic_addr != p22.basic_addr)
> >>>> +    return p12.basic_addr < p22.basic_addr;
> >>>> +  if (TREE_CODE (p11.offset) != INTEGER_CST
> >>>> +      || TREE_CODE (p21.offset) != INTEGER_CST)
> >>>> +    return p11.offset < p21.offset;
> >>>> +  if (int_cst_value (p11.offset) != int_cst_value (p21.offset))
> >>>> +    return int_cst_value (p11.offset) < int_cst_value (p21.offset);
> >>>> +  if (TREE_CODE (p12.offset) != INTEGER_CST
> >>>> +      || TREE_CODE (p22.offset) != INTEGER_CST)
> >>>> +    return p12.offset < p22.offset;
> >>>> +  return int_cst_value (p12.offset) < int_cst_value (p22.offset);
> >>>> +}
> >>>> +
> >>>> +}
> >>>>
> >>>>  /* Function vect_create_cond_for_alias_checks.
> >>>>
> >>>> @@ -2292,20 +2364,51 @@ vect_create_cond_for_alias_checks (loop_
> >>>>    if (may_alias_ddrs.is_empty ())
> >>>>      return;
> >>>>
> >>>> +
> >>>> +  /* Basically, for each pair of dependent data refs store_ptr_0
> >>>> +     and load_ptr_0, we create an expression:
> >>>> +
> >>>> +     ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
> >>>> +     || (load_ptr_0 + load_segment_length_0) <= store_ptr_0))
> >>>> +
> >>>> +     for aliasing checks. However, in some cases we can decrease
> >>>> +     the number of checks by combining two checks into one. For
> >>>> +     example, suppose we have another pair of data refs store_ptr_0
> >>>> +     and load_ptr_1, and if the following condition is satisfied:
> >>>> +
> >>>> +     load_ptr_0 < load_ptr_1  &&
> >>>> +     load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0
> >>>> +
> >>>> +     (this condition means, in each iteration of vectorized loop,
> >>>> +     the accessed memory of store_ptr_0 cannot be between the memory
> >>>> +     of load_ptr_0 and load_ptr_1.)
> >>>> +
> >>>> +     we then can use only the following expression to finish the
> >>>> +     alising checks between store_ptr_0 & load_ptr_0 and
> >>>> +     store_ptr_0 & load_ptr_1:
> >>>> +
> >>>> +     ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
> >>>> +     || (load_ptr_1 + load_segment_length_1 <= store_ptr_0))
> >>>> +
> >>>> +     Note that we only consider that load_ptr_0 and load_ptr_1 have the
> >>>> +     same basic address.  */
> >>>> +
> >>>> +  std::vector<dr_addr_with_seg_len_pair_t> ddrs_with_seg_len;
> >>>> +
> >>>> +  /* First, we collect all data ref pairs for aliasing checks.  */
> >>>> +
> >>>>    FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
> >>>>      {
> >>>>        struct data_reference *dr_a, *dr_b;
> >>>>        gimple dr_group_first_a, dr_group_first_b;
> >>>> -      tree addr_base_a, addr_base_b;
> >>>>        tree segment_length_a, segment_length_b;
> >>>>        gimple stmt_a, stmt_b;
> >>>> -      tree seg_a_min, seg_a_max, seg_b_min, seg_b_max;
> >>>>
> >>>>        dr_a = DDR_A (ddr);
> >>>>        stmt_a = DR_STMT (DDR_A (ddr));
> >>>>        dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
> >>>>        if (dr_group_first_a)
> >>>> -        {
> >>>> + {
> >>>>    stmt_a = dr_group_first_a;
> >>>>    dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
> >>>>   }
> >>>> @@ -2314,20 +2417,11 @@ vect_create_cond_for_alias_checks (loop_
> >>>>        stmt_b = DR_STMT (DDR_B (ddr));
> >>>>        dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
> >>>>        if (dr_group_first_b)
> >>>> -        {
> >>>> + {
> >>>>    stmt_b = dr_group_first_b;
> >>>>    dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
> >>>>   }
> >>>>
> >>>> -      addr_base_a
> >>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a),
> >>>> -   size_binop (PLUS_EXPR, DR_OFFSET (dr_a),
> >>>> -       DR_INIT (dr_a)));
> >>>> -      addr_base_b
> >>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b),
> >>>> -   size_binop (PLUS_EXPR, DR_OFFSET (dr_b),
> >>>> -       DR_INIT (dr_b)));
> >>>> -
> >>>>        if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
> >>>>   length_factor = scalar_loop_iters;
> >>>>        else
> >>>> @@ -2335,24 +2429,149 @@ vect_create_cond_for_alias_checks (loop_
> >>>>        segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
> >>>>        segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
> >>>>
> >>>> +      dr_addr_with_seg_len_pair_t dr_with_seg_len_pair
> >>>> +  (dr_addr_with_seg_len
> >>>> +       (dr_a, DR_BASE_ADDRESS (dr_a),
> >>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)),
> >>>> + segment_length_a),
> >>>> +   dr_addr_with_seg_len
> >>>> +       (dr_b, DR_BASE_ADDRESS (dr_b),
> >>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)),
> >>>> + segment_length_b));
> >>>> +
> >>>> +      if (dr_with_seg_len_pair.first.basic_addr >
> >>>> +  dr_with_seg_len_pair.second.basic_addr)
> >>>> + std::swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
> >>>> +
> >>>> +      ddrs_with_seg_len.push_back (dr_with_seg_len_pair);
> >>>> +    }
> >>>> +
> >>>> +  /* Second, we sort the collected data ref pairs so that we can scan
> >>>> +     them once to combine all possible aliasing checks.  */
> >>>> +
> >>>> +  std::sort (ddrs_with_seg_len.begin(), ddrs_with_seg_len.end());
> >>>> +
> >>>> +  /* Remove duplicate data ref pairs.  */
> >>>> +  ddrs_with_seg_len.erase (std::unique (ddrs_with_seg_len.begin(),
> >>>> + ddrs_with_seg_len.end()),
> >>>> +   ddrs_with_seg_len.end());
> >>>> +
> >>>> +  /* We then scan the sorted dr pairs and check if we can combine
> >>>> +     alias checks of two neighbouring dr pairs.  */
> >>>> +
> >>>> +  for (size_t i = 1; i < ddrs_with_seg_len.size (); ++i)
> >>>> +    {
> >>>> +      dr_addr_with_seg_len& dr_a1 = ddrs_with_seg_len[i-1].first;
> >>>> +      dr_addr_with_seg_len& dr_b1 = ddrs_with_seg_len[i-1].second;
> >>>> +      dr_addr_with_seg_len& dr_a2 = ddrs_with_seg_len[i].first;
> >>>> +      dr_addr_with_seg_len& dr_b2 = ddrs_with_seg_len[i].second;
> >>>> +
> >>>> +      if (dr_a1 == dr_a2)
> >>>> + {
> >>>> +  if (dr_b1.basic_addr != dr_b2.basic_addr
> >>>> +      || TREE_CODE (dr_b1.offset) != INTEGER_CST
> >>>> +      || TREE_CODE (dr_b2.offset) != INTEGER_CST)
> >>>> +    continue;
> >>>> +
> >>>> +  int diff = int_cst_value (dr_b2.offset) -
> >>>> +     int_cst_value (dr_b1.offset);
> >>>> +
> >>>> +  gcc_assert (diff > 0);
> >>>> +
> >>>> +  if (diff <= vect_factor
> >>>> +      || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST
> >>>> +  && diff - int_cst_value (dr_b1.seg_len) < vect_factor)
> >>>> +      || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST
> >>>> +  && TREE_CODE (dr_a1.seg_len) == INTEGER_CST
> >>>> +  && diff - int_cst_value (dr_b1.seg_len) <
> >>>> +     int_cst_value (dr_a1.seg_len)))
> >>>> +    {
> >>>> +      if (dump_enabled_p ())
> >>>> + {
> >>>> +  dump_printf_loc
> >>>> +      (MSG_NOTE, vect_location,
> >>>> +       "combining two runtime checks for data references ");
> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b1.dr));
> >>>> +  dump_printf (MSG_NOTE, " and ");
> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b2.dr));
> >>>> +  dump_printf (MSG_NOTE, "\n");
> >>>> + }
> >>>> +
> >>>> +      dr_b1.seg_len = size_binop (PLUS_EXPR,
> >>>> +  dr_b2.seg_len, size_int (diff));
> >>>> +      ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i);
> >>>> +      --i;
> >>>> +    }
> >>>> + }
> >>>> +      else if (dr_b1 == dr_b2)
> >>>> + {
> >>>> +  if (dr_a1.basic_addr != dr_a2.basic_addr
> >>>> +      || TREE_CODE (dr_a1.offset) != INTEGER_CST
> >>>> +      || TREE_CODE (dr_a2.offset) != INTEGER_CST)
> >>>> +    continue;
> >>>> +
> >>>> +  int diff = int_cst_value (dr_a2.offset) -
> >>>> +     int_cst_value (dr_a1.offset);
> >>>> +
> >>>> +  gcc_assert (diff > 0);
> >>>> +
> >>>> +  if (diff <= vect_factor
> >>>> +      || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST
> >>>> +  && diff - int_cst_value (dr_a1.seg_len) < vect_factor)
> >>>> +      || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST
> >>>> +  && TREE_CODE (dr_b1.seg_len) == INTEGER_CST
> >>>> +  && diff - int_cst_value (dr_a1.seg_len) <
> >>>> +     int_cst_value (dr_b1.seg_len)))
> >>>> +    {
> >>>> +      if (dump_enabled_p ())
> >>>> + {
> >>>> +  dump_printf_loc
> >>>> +      (MSG_NOTE, vect_location,
> >>>> +       "combining two runtime checks for data references ");
> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1.dr));
> >>>> +  dump_printf (MSG_NOTE, " and ");
> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2.dr));
> >>>> +  dump_printf (MSG_NOTE, "\n");
> >>>> + }
> >>>> +
> >>>> +      dr_a1.seg_len = size_binop (PLUS_EXPR,
> >>>> +  dr_a2.seg_len, size_int (diff));
> >>>> +      ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i);
> >>>> +      --i;
> >>>> +    }
> >>>> + }
> >>>> +    }
> >>>> +
> >>>> +  for (size_t i = 0, s = ddrs_with_seg_len.size (); i < s; ++i)
> >>>> +    {
> >>>> +      const dr_addr_with_seg_len& dr_a = ddrs_with_seg_len[i].first;
> >>>> +      const dr_addr_with_seg_len& dr_b = ddrs_with_seg_len[i].second;
> >>>> +      tree segment_length_a = dr_a.seg_len;
> >>>> +      tree segment_length_b = dr_b.seg_len;
> >>>> +
> >>>> +      tree addr_base_a
> >>>> + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset);
> >>>> +      tree addr_base_b
> >>>> + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset);
> >>>> +
> >>>>        if (dump_enabled_p ())
> >>>>   {
> >>>>    dump_printf_loc (MSG_NOTE, vect_location,
> >>>> -                           "create runtime check for data references ");
> >>>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
> >>>> +   "create runtime check for data references ");
> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr));
> >>>>    dump_printf (MSG_NOTE, " and ");
> >>>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
> >>>> -          dump_printf (MSG_NOTE, "\n");
> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr));
> >>>> +  dump_printf (MSG_NOTE, "\n");
> >>>>   }
> >>>>
> >>>> -      seg_a_min = addr_base_a;
> >>>> -      seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
> >>>> -      if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0)
> >>>> +      tree seg_a_min = addr_base_a;
> >>>> +      tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
> >>>> +      if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0)
> >>>>   seg_a_min = seg_a_max, seg_a_max = addr_base_a;
> >>>>
> >>>> -      seg_b_min = addr_base_b;
> >>>> -      seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
> >>>> -      if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0)
> >>>> +      tree seg_b_min = addr_base_b;
> >>>> +      tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
> >>>> +      if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0)
> >>>>   seg_b_min = seg_b_max, seg_b_max = addr_base_b;
> >>>>
> >>>>        part_cond_expr =
> >>>> @@ -2477,6 +2696,81 @@ vect_loop_versioning (loop_vec_info loop
> >>>>        adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi));
> >>>>      }
> >>>>
> >>>> +  /* Extract load and store statements on pointers with zero-stride
> >>>> +     accesses.  */
> >>>> +  if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
> >>>> +    {
> >>>> +
> >>>> +      /* In the loop body, we iterate each statement to check if it is a load
> >>>> + or store. Then we check the DR_STEP of the data reference.  If
> >>>> + DR_STEP is zero, then we will hoist the load statement to the loop
> >>>> + preheader, and move the store statement to the loop exit.  */
> >>>> +
> >>>> +      for (gimple_stmt_iterator si = gsi_start_bb (loop->header);
> >>>> +   !gsi_end_p (si); )
> >>>> + {
> >>>> +  gimple stmt = gsi_stmt (si);
> >>>> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> >>>> +  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
> >>>> +
> >>>> +
> >>>> +  if (dr && integer_zerop (DR_STEP (dr)))
> >>>> +    {
> >>>> +      if (DR_IS_READ (dr))
> >>>> + {
> >>>> +  if (dump_file)
> >>>> +    {
> >>>> +      fprintf (dump_file,
> >>>> +       "Hoist the load to outside of the loop:\n");
> >>>> +      print_gimple_stmt (dump_file, stmt, 0,
> >>>> + TDF_VOPS|TDF_MEMSYMS);
> >>>> +    }
> >>>> +
> >>>> +  basic_block preheader = loop_preheader_edge (loop)->src;
> >>>> +  gimple_stmt_iterator si_dst = gsi_last_bb (preheader);
> >>>> +  gsi_move_after (&si, &si_dst);
> >>>> + }
> >>>> +      else
> >>>> + {
> >>>> +  gimple_stmt_iterator si_dst =
> >>>> +      gsi_last_bb (single_exit (loop)->dest);
> >>>> +  gsi_move_after (&si, &si_dst);
> >>>> + }
> >>>> +              continue;
> >>>> +    }
> >>>> +  else if (!dr)
> >>>> +          {
> >>>> +            bool hoist = true;
> >>>> +            for (size_t i = 0; i < gimple_num_ops (stmt); i++)
> >>>> +            {
> >>>> +              tree op = gimple_op (stmt, i);
> >>>> +              if (TREE_CODE (op) == INTEGER_CST
> >>>> +                  || TREE_CODE (op) == REAL_CST)
> >>>> +                continue;
> >>>> +              if (TREE_CODE (op) == SSA_NAME)
> >>>> +              {
> >>>> +                gimple def = SSA_NAME_DEF_STMT (op);
> >>>> +                if (def == stmt
> >>>> +                    || gimple_nop_p (def)
> >>>> +                    || !flow_bb_inside_loop_p (loop, gimple_bb (def)))
> >>>> +                  continue;
> >>>> +              }
> >>>> +              hoist = false;
> >>>> +              break;
> >>>> +            }
> >>>> +
> >>>> +            if (hoist)
> >>>> +            {
> >>>> +              basic_block preheader = loop_preheader_edge (loop)->src;
> >>>> +              gimple_stmt_iterator si_dst = gsi_last_bb (preheader);
> >>>> +              gsi_move_after (&si, &si_dst);
> >>>> +              continue;
> >>>> +            }
> >>>> +          }
> >>>> +          gsi_next (&si);
> >>>> + }
> >>>> +    }
> >>>> +
> >>>>    /* End loop-exit-fixes after versioning.  */
> >>>>
> >>>>    if (cond_expr_stmt_list)
> >>>> Index: gcc/ChangeLog
> >>>> ===================================================================
> >>>> --- gcc/ChangeLog (revision 202663)
> >>>> +++ gcc/ChangeLog (working copy)
> >>>> @@ -1,3 +1,8 @@
> >>>> +2013-10-01  Cong Hou  <congh@google.com>
> >>>> +
> >>>> + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Combine
> >>>> + alias checks if it is possible to amortize the runtime overhead.
> >>>> +
> >>>>
> >>>>
> >>>
> >>> --
> >>> Richard Biener <rguenther@suse.de>
> >>> SUSE / SUSE Labs
> >>> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> >>> GF: Jeff Hawn, Jennifer Guild, Felix Imend

Comments

Richard Biener Oct. 18, 2013, 1:09 p.m. UTC | #1
On Tue, Oct 15, 2013 at 12:43 AM, Cong Hou <congh@google.com> wrote:
> Sorry for forgetting using plain-text mode. Resend it.
>
>
> ---------- Forwarded message ----------
> From: Cong Hou <congh@google.com>
> Date: Mon, Oct 14, 2013 at 3:29 PM
> Subject: Re: [PATCH] Reducing number of alias checks in vectorization.
> To: Richard Biener <rguenther@suse.de>, GCC Patches <gcc-patches@gcc.gnu.org>
> Cc: Jakub Jelinek <jakub@redhat.com>
>
>
> I have made a new patch for this issue according to your comments.
>
> There are several modifications to my previous patch:
>
>
> 1. Remove the use of STL features such as vector and sort. Use GCC's
> vec and qsort instead.

I think that using <utility> and thus std::pair and std::swap is ok.  Including
of <utility> should be done via system.h though.

> 2. Comparisons between tree nodes are not based on their addresses any
> more. Use compare_tree() function instead.

Ok.

> 3. The function vect_create_cond_for_alias_checks() now returns the
> number of alias checks. If its second parameter cond_expr is NULL,
> then this function only calculate the number of alias checks after the
> merging and won't generate comparison expressions.

We now perform the merging twice - that's easily avoided by recording
the merge result in loop_vinfo alongside may_alias_ddrs (which you
should be able to drop as they are already contained in the data
structures you build).

Which means you can split the function and move the merging
part to vect_prune_runtime_alias_test_list.

> 4. The function vect_prune_runtime_alias_test_list() now uses
> vect_create_cond_for_alias_checks() to get the number of alias checks.
>
>
> The patch is attached as a text file.
>
> Please give me your comment on this patch. Thank you!

+  if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0)
+      || TREE_CODE (dr_a1->offset) != INTEGER_CST
+      || TREE_CODE (dr_a2->offset) != INTEGER_CST)
+    continue;
+
+  HOST_WIDEST_INT diff = widest_int_cst_value (dr_a2->offset) -
+ widest_int_cst_value (dr_a1->offset);

use !host_integerp (dr_a1->offset) as check and then int_cst_value.

+  if (diff <= vect_factor
+      || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
+  && diff - widest_int_cst_value (dr_a1->seg_len) < vect_factor)
+      || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
+  && TREE_CODE (dr_b1->seg_len) == INTEGER_CST
+  && diff - widest_int_cst_value (dr_a1->seg_len) <
+     widest_int_cst_value (dr_b1->seg_len)))

can you add a comment above this why it is safe to combine under
this condition?  You seem to combine offsets but not data-ref steps
for example.  The segment length is computed quite optimistically
and you don't seem to adjust that.  That is, generally the alias
check, if implemented as simple overlap, would need to intersect
[init, init + niter * step]  intervals, losing some non-aliasing cases.
The current segment length is optimistically computed and thus
can recover some more cases, but you have to be careful with
merging to not break its implicit assumptions.

Thanks,
Richard.

>
> Cong
>
>
> On Thu, Oct 3, 2013 at 2:35 PM, Cong Hou <congh@google.com> wrote:
>>
>> Forget about this "aux" idea as the segment length for one data ref
>> can be different in different dr pairs.
>>
>> In my patch I created a struct as shown below:
>>
>> struct dr_addr_with_seg_len
>> {
>>   data_reference *dr;
>>   tree basic_addr;
>>   tree offset;
>>   tree seg_len;
>> };
>>
>>
>> Note that basic_addr and offset can always obtained from dr, but we
>> need to store two segment lengths for each dr pair. It is improper to
>> add a field to data_dependence_relation as it is defined outside of
>> vectorizer. We can change the type (a new one combining
>> data_dependence_relation and segment length) of may_alias_ddrs in
>> loop_vec_info to include such information, but we have to add a new
>> type to tree-vectorizer.h which is only used in two places - still too
>> much.
>>
>> One possible solution is that we create a local struct as shown above
>> and a new function which returns the merged alias check information.
>> This function will be called twice: once during analysis phase and
>> once in transformation phase. Then we don't have to store the merged
>> alias check information during those two phases. The additional time
>> cost is minimal as there will not be too many data dependent dr pairs
>> in a loop.
>>
>> Any comment?
>>
>>
>> thanks,
>> Cong
>>
>>
>> On Thu, Oct 3, 2013 at 10:57 AM, Cong Hou <congh@google.com> wrote:
>> > I noticed that there is a "struct dataref_aux" defined in
>> > tree-vectorizer.h which is specific to the vectorizer pass and is
>> > stored in (void*)aux in "struct data_reference". Can we add one more
>> > field "segment_length" to dataref_aux so that we can pass this
>> > information for merging alias checks? Then we can avoid to modify or
>> > create other structures.
>> >
>> >
>> > thanks,
>> > Cong
>> >
>> >
>> > On Wed, Oct 2, 2013 at 2:34 PM, Cong Hou <congh@google.com> wrote:
>> >> On Wed, Oct 2, 2013 at 4:24 AM, Richard Biener <rguenther@suse.de> wrote:
>> >>> On Tue, 1 Oct 2013, Cong Hou wrote:
>> >>>
>> >>>> When alias exists between data refs in a loop, to vectorize it GCC
>> >>>> does loop versioning and adds runtime alias checks. Basically for each
>> >>>> pair of data refs with possible data dependence, there will be two
>> >>>> comparisons generated to make sure there is no aliasing between them
>> >>>> in each iteration of the vectorized loop. If there are many such data
>> >>>> refs pairs, the number of comparisons can be very large, which is a
>> >>>> big overhead.
>> >>>>
>> >>>> However, in some cases it is possible to reduce the number of those
>> >>>> comparisons. For example, for the following loop, we can detect that
>> >>>> b[0] and b[1] are two consecutive member accesses so that we can
>> >>>> combine the alias check between a[0:100]&b[0] and a[0:100]&b[1] into
>> >>>> checking a[0:100]&b[0:2]:
>> >>>>
>> >>>> void foo(int*a, int* b)
>> >>>> {
>> >>>>    for (int i = 0; i < 100; ++i)
>> >>>>     a[i] = b[0] + b[1];
>> >>>> }
>> >>>>
>> >>>> Actually, the requirement of consecutive memory accesses is too
>> >>>> strict. For the following loop, we can still combine the alias checks
>> >>>> between a[0:100]&b[0] and a[0:100]&b[100]:
>> >>>>
>> >>>> void foo(int*a, int* b)
>> >>>> {
>> >>>>    for (int i = 0; i < 100; ++i)
>> >>>>     a[i] = b[0] + b[100];
>> >>>> }
>> >>>>
>> >>>> This is because if b[0] is not in a[0:100] and b[100] is not in
>> >>>> a[0:100] then a[0:100] cannot be between b[0] and b[100]. We only need
>> >>>> to check a[0:100] and b[0:101] don't overlap.
>> >>>>
>> >>>> More generally, consider two pairs of data refs (a, b1) and (a, b2).
>> >>>> Suppose addr_b1 and addr_b2 are basic addresses of data ref b1 and b2;
>> >>>> offset_b1 and offset_b2 (offset_b1 < offset_b2) are offsets of b1 and
>> >>>> b2, and segment_length_a, segment_length_b1, and segment_length_b2 are
>> >>>> segment length of a, b1, and b2. Then we can combine the two
>> >>>> comparisons into one if the following condition is satisfied:
>> >>>>
>> >>>> offset_b2- offset_b1 - segment_length_b1 < segment_length_a
>> >>>>
>> >>>>
>> >>>> This patch detects those combination opportunities to reduce the
>> >>>> number of alias checks. It is tested on an x86-64 machine.
>> >>>
>> >>> Apart from the other comments you got (to which I agree) the patch
>> >>> seems to do two things, namely also:
>> >>>
>> >>> +  /* Extract load and store statements on pointers with zero-stride
>> >>> +     accesses.  */
>> >>> +  if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
>> >>> +    {
>> >>>
>> >>> which I'd rather see in a separate patch (and done also when
>> >>> the loop doesn't require versioning for alias).
>> >>>
>> >>
>> >>
>> >> My mistake.. I am working on those two patches at the same time and
>> >> pasted that one also here by mistake. I will send another patch about
>> >> the "hoist" topic.
>> >>
>> >>
>> >>> Also combining the alias checks in vect_create_cond_for_alias_checks
>> >>> is nice but doesn't properly fix the use of the
>> >>> vect-max-version-for-alias-checks param which currently inhibits
>> >>> vectorization of the HIMENO benchmark by default (and make us look bad
>> >>> compared to LLVM).
>> >>>
>> >>> So I believe this merging should be done incrementally when
>> >>> we collect the DDRs we need to test in vect_mark_for_runtime_alias_test.
>> >>>
>> >>
>> >>
>> >> I agree that vect-max-version-for-alias-checks param should count the
>> >> number of checks after the merge. However, the struct
>> >> data_dependence_relation could not record the new information produced
>> >> by the merge. The new information I mentioned contains the new segment
>> >> length for comparisons. This length is calculated right in
>> >> vect_create_cond_for_alias_checks() function. Since
>> >> vect-max-version-for-alias-checks is used during analysis phase, shall
>> >> we move all those (get segment length for each data ref and merge
>> >> alias checks) from transformation to analysis phase? If we cannot
>> >> store the result properly (data_dependence_relation is not enough),
>> >> shall we do it twice in both phases?
>> >>
>> >> I also noticed a possible bug in the function vect_same_range_drs()
>> >> called by vect_prune_runtime_alias_test_list(). For the following code
>> >> I get two pairs of data refs after
>> >> vect_prune_runtime_alias_test_list(), but in
>> >> vect_create_cond_for_alias_checks() after detecting grouped accesses I
>> >> got two identical pairs of data refs. The consequence is two identical
>> >> alias checks are produced.
>> >>
>> >>
>> >> void yuv2yuyv_ref (int *d, int *src, int n)
>> >> {
>> >>   char *dest = (char *)d;
>> >>   int i;
>> >>
>> >>   for(i=0;i<n/2;i++){
>> >>     dest[i*4 + 0] = (src[i*2 + 0])>>16;
>> >>     dest[i*4 + 1] = (src[i*2 + 1])>>8;
>> >>     dest[i*4 + 2] = (src[i*2 + 0])>>16;
>> >>     dest[i*4 + 3] = (src[i*2 + 0])>>0;
>> >>   }
>> >> }
>> >>
>> >>
>> >> I think the solution to this problem is changing
>> >>
>> >> GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i))
>> >> == GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j)
>> >>
>> >> into
>> >>
>> >> STMT_VINFO_DATA_REF (vinfo_for_stmt (GROUP_FIRST_ELEMENT
>> >> (vinfo_for_stmt (stmt_i))))
>> >> == STMT_VINFO_DATA_REF (vinfo_for_stmt (GROUP_FIRST_ELEMENT
>> >> (vinfo_for_stmt (stmt_j)))
>> >>
>> >>
>> >> in function vect_same_range_drs(). What do you think about it?
>> >>
>> >>
>> >> thanks,
>> >> Cong
>> >>
>> >>
>> >>
>> >>> Thanks for working on this,
>> >>> Richard.
>> >>>
>> >>>>
>> >>>> thanks,
>> >>>> Cong
>> >>>>
>> >>>>
>> >>>>
>> >>>> Index: gcc/tree-vect-loop-manip.c
>> >>>> ===================================================================
>> >>>> --- gcc/tree-vect-loop-manip.c (revision 202662)
>> >>>> +++ gcc/tree-vect-loop-manip.c (working copy)
>> >>>> @@ -19,6 +19,10 @@ You should have received a copy of the G
>> >>>>  along with GCC; see the file COPYING3.  If not see
>> >>>>  <http://www.gnu.org/licenses/>.  */
>> >>>>
>> >>>> +#include <vector>
>> >>>> +#include <utility>
>> >>>> +#include <algorithm>
>> >>>> +
>> >>>>  #include "config.h"
>> >>>>  #include "system.h"
>> >>>>  #include "coretypes.h"
>> >>>> @@ -2248,6 +2252,74 @@ vect_vfa_segment_size (struct data_refer
>> >>>>    return segment_length;
>> >>>>  }
>> >>>>
>> >>>> +namespace
>> >>>> +{
>> >>>> +
>> >>>> +/* struct dr_addr_with_seg_len
>> >>>> +
>> >>>> +   A struct storing information of a data reference, including the data
>> >>>> +   ref itself, its basic address, the access offset and the segment length
>> >>>> +   for aliasing checks.  */
>> >>>> +
>> >>>> +struct dr_addr_with_seg_len
>> >>>> +{
>> >>>> +  dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len)
>> >>>> +    : dr (d), basic_addr (addr), offset (off), seg_len (len) {}
>> >>>> +
>> >>>> +  data_reference* dr;
>> >>>> +  tree basic_addr;
>> >>>> +  tree offset;
>> >>>> +  tree seg_len;
>> >>>> +};
>> >>>> +
>> >>>> +/* Operator == between two dr_addr_with_seg_len objects.
>> >>>> +
>> >>>> +   This equality operator is used to make sure two data refs
>> >>>> +   are the same one so that we will consider to combine the
>> >>>> +   aliasing checks of those two pairs of data dependent data
>> >>>> +   refs.  */
>> >>>> +
>> >>>> +bool operator == (const dr_addr_with_seg_len& d1,
>> >>>> +  const dr_addr_with_seg_len& d2)
>> >>>> +{
>> >>>> +  return operand_equal_p (d1.basic_addr, d2.basic_addr, 0)
>> >>>> + && operand_equal_p (d1.offset, d2.offset, 0)
>> >>>> + && operand_equal_p (d1.seg_len, d2.seg_len, 0);
>> >>>> +}
>> >>>> +
>> >>>> +typedef std::pair <dr_addr_with_seg_len, dr_addr_with_seg_len>
>> >>>> + dr_addr_with_seg_len_pair_t;
>> >>>> +
>> >>>> +
>> >>>> +/* Operator < between two dr_addr_with_seg_len_pair_t objects.
>> >>>> +
>> >>>> +   This operator is used to sort objects of dr_addr_with_seg_len_pair_t
>> >>>> +   so that we can combine aliasing checks during one scan.  */
>> >>>> +
>> >>>> +bool operator < (const dr_addr_with_seg_len_pair_t& p1,
>> >>>> + const dr_addr_with_seg_len_pair_t& p2)
>> >>>> +{
>> >>>> +  const dr_addr_with_seg_len& p11 = p1.first;
>> >>>> +  const dr_addr_with_seg_len& p12 = p1.second;
>> >>>> +  const dr_addr_with_seg_len& p21 = p2.first;
>> >>>> +  const dr_addr_with_seg_len& p22 = p2.second;
>> >>>> +
>> >>>> +  if (p11.basic_addr != p21.basic_addr)
>> >>>> +    return p11.basic_addr < p21.basic_addr;
>> >>>> +  if (p12.basic_addr != p22.basic_addr)
>> >>>> +    return p12.basic_addr < p22.basic_addr;
>> >>>> +  if (TREE_CODE (p11.offset) != INTEGER_CST
>> >>>> +      || TREE_CODE (p21.offset) != INTEGER_CST)
>> >>>> +    return p11.offset < p21.offset;
>> >>>> +  if (int_cst_value (p11.offset) != int_cst_value (p21.offset))
>> >>>> +    return int_cst_value (p11.offset) < int_cst_value (p21.offset);
>> >>>> +  if (TREE_CODE (p12.offset) != INTEGER_CST
>> >>>> +      || TREE_CODE (p22.offset) != INTEGER_CST)
>> >>>> +    return p12.offset < p22.offset;
>> >>>> +  return int_cst_value (p12.offset) < int_cst_value (p22.offset);
>> >>>> +}
>> >>>> +
>> >>>> +}
>> >>>>
>> >>>>  /* Function vect_create_cond_for_alias_checks.
>> >>>>
>> >>>> @@ -2292,20 +2364,51 @@ vect_create_cond_for_alias_checks (loop_
>> >>>>    if (may_alias_ddrs.is_empty ())
>> >>>>      return;
>> >>>>
>> >>>> +
>> >>>> +  /* Basically, for each pair of dependent data refs store_ptr_0
>> >>>> +     and load_ptr_0, we create an expression:
>> >>>> +
>> >>>> +     ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>> >>>> +     || (load_ptr_0 + load_segment_length_0) <= store_ptr_0))
>> >>>> +
>> >>>> +     for aliasing checks. However, in some cases we can decrease
>> >>>> +     the number of checks by combining two checks into one. For
>> >>>> +     example, suppose we have another pair of data refs store_ptr_0
>> >>>> +     and load_ptr_1, and if the following condition is satisfied:
>> >>>> +
>> >>>> +     load_ptr_0 < load_ptr_1  &&
>> >>>> +     load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0
>> >>>> +
>> >>>> +     (this condition means, in each iteration of vectorized loop,
>> >>>> +     the accessed memory of store_ptr_0 cannot be between the memory
>> >>>> +     of load_ptr_0 and load_ptr_1.)
>> >>>> +
>> >>>> +     we then can use only the following expression to finish the
>> >>>> +     alising checks between store_ptr_0 & load_ptr_0 and
>> >>>> +     store_ptr_0 & load_ptr_1:
>> >>>> +
>> >>>> +     ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>> >>>> +     || (load_ptr_1 + load_segment_length_1 <= store_ptr_0))
>> >>>> +
>> >>>> +     Note that we only consider that load_ptr_0 and load_ptr_1 have the
>> >>>> +     same basic address.  */
>> >>>> +
>> >>>> +  std::vector<dr_addr_with_seg_len_pair_t> ddrs_with_seg_len;
>> >>>> +
>> >>>> +  /* First, we collect all data ref pairs for aliasing checks.  */
>> >>>> +
>> >>>>    FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
>> >>>>      {
>> >>>>        struct data_reference *dr_a, *dr_b;
>> >>>>        gimple dr_group_first_a, dr_group_first_b;
>> >>>> -      tree addr_base_a, addr_base_b;
>> >>>>        tree segment_length_a, segment_length_b;
>> >>>>        gimple stmt_a, stmt_b;
>> >>>> -      tree seg_a_min, seg_a_max, seg_b_min, seg_b_max;
>> >>>>
>> >>>>        dr_a = DDR_A (ddr);
>> >>>>        stmt_a = DR_STMT (DDR_A (ddr));
>> >>>>        dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
>> >>>>        if (dr_group_first_a)
>> >>>> -        {
>> >>>> + {
>> >>>>    stmt_a = dr_group_first_a;
>> >>>>    dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
>> >>>>   }
>> >>>> @@ -2314,20 +2417,11 @@ vect_create_cond_for_alias_checks (loop_
>> >>>>        stmt_b = DR_STMT (DDR_B (ddr));
>> >>>>        dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
>> >>>>        if (dr_group_first_b)
>> >>>> -        {
>> >>>> + {
>> >>>>    stmt_b = dr_group_first_b;
>> >>>>    dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
>> >>>>   }
>> >>>>
>> >>>> -      addr_base_a
>> >>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a),
>> >>>> -   size_binop (PLUS_EXPR, DR_OFFSET (dr_a),
>> >>>> -       DR_INIT (dr_a)));
>> >>>> -      addr_base_b
>> >>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b),
>> >>>> -   size_binop (PLUS_EXPR, DR_OFFSET (dr_b),
>> >>>> -       DR_INIT (dr_b)));
>> >>>> -
>> >>>>        if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
>> >>>>   length_factor = scalar_loop_iters;
>> >>>>        else
>> >>>> @@ -2335,24 +2429,149 @@ vect_create_cond_for_alias_checks (loop_
>> >>>>        segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
>> >>>>        segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
>> >>>>
>> >>>> +      dr_addr_with_seg_len_pair_t dr_with_seg_len_pair
>> >>>> +  (dr_addr_with_seg_len
>> >>>> +       (dr_a, DR_BASE_ADDRESS (dr_a),
>> >>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)),
>> >>>> + segment_length_a),
>> >>>> +   dr_addr_with_seg_len
>> >>>> +       (dr_b, DR_BASE_ADDRESS (dr_b),
>> >>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)),
>> >>>> + segment_length_b));
>> >>>> +
>> >>>> +      if (dr_with_seg_len_pair.first.basic_addr >
>> >>>> +  dr_with_seg_len_pair.second.basic_addr)
>> >>>> + std::swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
>> >>>> +
>> >>>> +      ddrs_with_seg_len.push_back (dr_with_seg_len_pair);
>> >>>> +    }
>> >>>> +
>> >>>> +  /* Second, we sort the collected data ref pairs so that we can scan
>> >>>> +     them once to combine all possible aliasing checks.  */
>> >>>> +
>> >>>> +  std::sort (ddrs_with_seg_len.begin(), ddrs_with_seg_len.end());
>> >>>> +
>> >>>> +  /* Remove duplicate data ref pairs.  */
>> >>>> +  ddrs_with_seg_len.erase (std::unique (ddrs_with_seg_len.begin(),
>> >>>> + ddrs_with_seg_len.end()),
>> >>>> +   ddrs_with_seg_len.end());
>> >>>> +
>> >>>> +  /* We then scan the sorted dr pairs and check if we can combine
>> >>>> +     alias checks of two neighbouring dr pairs.  */
>> >>>> +
>> >>>> +  for (size_t i = 1; i < ddrs_with_seg_len.size (); ++i)
>> >>>> +    {
>> >>>> +      dr_addr_with_seg_len& dr_a1 = ddrs_with_seg_len[i-1].first;
>> >>>> +      dr_addr_with_seg_len& dr_b1 = ddrs_with_seg_len[i-1].second;
>> >>>> +      dr_addr_with_seg_len& dr_a2 = ddrs_with_seg_len[i].first;
>> >>>> +      dr_addr_with_seg_len& dr_b2 = ddrs_with_seg_len[i].second;
>> >>>> +
>> >>>> +      if (dr_a1 == dr_a2)
>> >>>> + {
>> >>>> +  if (dr_b1.basic_addr != dr_b2.basic_addr
>> >>>> +      || TREE_CODE (dr_b1.offset) != INTEGER_CST
>> >>>> +      || TREE_CODE (dr_b2.offset) != INTEGER_CST)
>> >>>> +    continue;
>> >>>> +
>> >>>> +  int diff = int_cst_value (dr_b2.offset) -
>> >>>> +     int_cst_value (dr_b1.offset);
>> >>>> +
>> >>>> +  gcc_assert (diff > 0);
>> >>>> +
>> >>>> +  if (diff <= vect_factor
>> >>>> +      || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST
>> >>>> +  && diff - int_cst_value (dr_b1.seg_len) < vect_factor)
>> >>>> +      || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST
>> >>>> +  && TREE_CODE (dr_a1.seg_len) == INTEGER_CST
>> >>>> +  && diff - int_cst_value (dr_b1.seg_len) <
>> >>>> +     int_cst_value (dr_a1.seg_len)))
>> >>>> +    {
>> >>>> +      if (dump_enabled_p ())
>> >>>> + {
>> >>>> +  dump_printf_loc
>> >>>> +      (MSG_NOTE, vect_location,
>> >>>> +       "combining two runtime checks for data references ");
>> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b1.dr));
>> >>>> +  dump_printf (MSG_NOTE, " and ");
>> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b2.dr));
>> >>>> +  dump_printf (MSG_NOTE, "\n");
>> >>>> + }
>> >>>> +
>> >>>> +      dr_b1.seg_len = size_binop (PLUS_EXPR,
>> >>>> +  dr_b2.seg_len, size_int (diff));
>> >>>> +      ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i);
>> >>>> +      --i;
>> >>>> +    }
>> >>>> + }
>> >>>> +      else if (dr_b1 == dr_b2)
>> >>>> + {
>> >>>> +  if (dr_a1.basic_addr != dr_a2.basic_addr
>> >>>> +      || TREE_CODE (dr_a1.offset) != INTEGER_CST
>> >>>> +      || TREE_CODE (dr_a2.offset) != INTEGER_CST)
>> >>>> +    continue;
>> >>>> +
>> >>>> +  int diff = int_cst_value (dr_a2.offset) -
>> >>>> +     int_cst_value (dr_a1.offset);
>> >>>> +
>> >>>> +  gcc_assert (diff > 0);
>> >>>> +
>> >>>> +  if (diff <= vect_factor
>> >>>> +      || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST
>> >>>> +  && diff - int_cst_value (dr_a1.seg_len) < vect_factor)
>> >>>> +      || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST
>> >>>> +  && TREE_CODE (dr_b1.seg_len) == INTEGER_CST
>> >>>> +  && diff - int_cst_value (dr_a1.seg_len) <
>> >>>> +     int_cst_value (dr_b1.seg_len)))
>> >>>> +    {
>> >>>> +      if (dump_enabled_p ())
>> >>>> + {
>> >>>> +  dump_printf_loc
>> >>>> +      (MSG_NOTE, vect_location,
>> >>>> +       "combining two runtime checks for data references ");
>> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1.dr));
>> >>>> +  dump_printf (MSG_NOTE, " and ");
>> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2.dr));
>> >>>> +  dump_printf (MSG_NOTE, "\n");
>> >>>> + }
>> >>>> +
>> >>>> +      dr_a1.seg_len = size_binop (PLUS_EXPR,
>> >>>> +  dr_a2.seg_len, size_int (diff));
>> >>>> +      ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i);
>> >>>> +      --i;
>> >>>> +    }
>> >>>> + }
>> >>>> +    }
>> >>>> +
>> >>>> +  for (size_t i = 0, s = ddrs_with_seg_len.size (); i < s; ++i)
>> >>>> +    {
>> >>>> +      const dr_addr_with_seg_len& dr_a = ddrs_with_seg_len[i].first;
>> >>>> +      const dr_addr_with_seg_len& dr_b = ddrs_with_seg_len[i].second;
>> >>>> +      tree segment_length_a = dr_a.seg_len;
>> >>>> +      tree segment_length_b = dr_b.seg_len;
>> >>>> +
>> >>>> +      tree addr_base_a
>> >>>> + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset);
>> >>>> +      tree addr_base_b
>> >>>> + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset);
>> >>>> +
>> >>>>        if (dump_enabled_p ())
>> >>>>   {
>> >>>>    dump_printf_loc (MSG_NOTE, vect_location,
>> >>>> -                           "create runtime check for data references ");
>> >>>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
>> >>>> +   "create runtime check for data references ");
>> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr));
>> >>>>    dump_printf (MSG_NOTE, " and ");
>> >>>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
>> >>>> -          dump_printf (MSG_NOTE, "\n");
>> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr));
>> >>>> +  dump_printf (MSG_NOTE, "\n");
>> >>>>   }
>> >>>>
>> >>>> -      seg_a_min = addr_base_a;
>> >>>> -      seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
>> >>>> -      if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0)
>> >>>> +      tree seg_a_min = addr_base_a;
>> >>>> +      tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
>> >>>> +      if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0)
>> >>>>   seg_a_min = seg_a_max, seg_a_max = addr_base_a;
>> >>>>
>> >>>> -      seg_b_min = addr_base_b;
>> >>>> -      seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
>> >>>> -      if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0)
>> >>>> +      tree seg_b_min = addr_base_b;
>> >>>> +      tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
>> >>>> +      if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0)
>> >>>>   seg_b_min = seg_b_max, seg_b_max = addr_base_b;
>> >>>>
>> >>>>        part_cond_expr =
>> >>>> @@ -2477,6 +2696,81 @@ vect_loop_versioning (loop_vec_info loop
>> >>>>        adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi));
>> >>>>      }
>> >>>>
>> >>>> +  /* Extract load and store statements on pointers with zero-stride
>> >>>> +     accesses.  */
>> >>>> +  if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
>> >>>> +    {
>> >>>> +
>> >>>> +      /* In the loop body, we iterate each statement to check if it is a load
>> >>>> + or store. Then we check the DR_STEP of the data reference.  If
>> >>>> + DR_STEP is zero, then we will hoist the load statement to the loop
>> >>>> + preheader, and move the store statement to the loop exit.  */
>> >>>> +
>> >>>> +      for (gimple_stmt_iterator si = gsi_start_bb (loop->header);
>> >>>> +   !gsi_end_p (si); )
>> >>>> + {
>> >>>> +  gimple stmt = gsi_stmt (si);
>> >>>> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>> >>>> +  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
>> >>>> +
>> >>>> +
>> >>>> +  if (dr && integer_zerop (DR_STEP (dr)))
>> >>>> +    {
>> >>>> +      if (DR_IS_READ (dr))
>> >>>> + {
>> >>>> +  if (dump_file)
>> >>>> +    {
>> >>>> +      fprintf (dump_file,
>> >>>> +       "Hoist the load to outside of the loop:\n");
>> >>>> +      print_gimple_stmt (dump_file, stmt, 0,
>> >>>> + TDF_VOPS|TDF_MEMSYMS);
>> >>>> +    }
>> >>>> +
>> >>>> +  basic_block preheader = loop_preheader_edge (loop)->src;
>> >>>> +  gimple_stmt_iterator si_dst = gsi_last_bb (preheader);
>> >>>> +  gsi_move_after (&si, &si_dst);
>> >>>> + }
>> >>>> +      else
>> >>>> + {
>> >>>> +  gimple_stmt_iterator si_dst =
>> >>>> +      gsi_last_bb (single_exit (loop)->dest);
>> >>>> +  gsi_move_after (&si, &si_dst);
>> >>>> + }
>> >>>> +              continue;
>> >>>> +    }
>> >>>> +  else if (!dr)
>> >>>> +          {
>> >>>> +            bool hoist = true;
>> >>>> +            for (size_t i = 0; i < gimple_num_ops (stmt); i++)
>> >>>> +            {
>> >>>> +              tree op = gimple_op (stmt, i);
>> >>>> +              if (TREE_CODE (op) == INTEGER_CST
>> >>>> +                  || TREE_CODE (op) == REAL_CST)
>> >>>> +                continue;
>> >>>> +              if (TREE_CODE (op) == SSA_NAME)
>> >>>> +              {
>> >>>> +                gimple def = SSA_NAME_DEF_STMT (op);
>> >>>> +                if (def == stmt
>> >>>> +                    || gimple_nop_p (def)
>> >>>> +                    || !flow_bb_inside_loop_p (loop, gimple_bb (def)))
>> >>>> +                  continue;
>> >>>> +              }
>> >>>> +              hoist = false;
>> >>>> +              break;
>> >>>> +            }
>> >>>> +
>> >>>> +            if (hoist)
>> >>>> +            {
>> >>>> +              basic_block preheader = loop_preheader_edge (loop)->src;
>> >>>> +              gimple_stmt_iterator si_dst = gsi_last_bb (preheader);
>> >>>> +              gsi_move_after (&si, &si_dst);
>> >>>> +              continue;
>> >>>> +            }
>> >>>> +          }
>> >>>> +          gsi_next (&si);
>> >>>> + }
>> >>>> +    }
>> >>>> +
>> >>>>    /* End loop-exit-fixes after versioning.  */
>> >>>>
>> >>>>    if (cond_expr_stmt_list)
>> >>>> Index: gcc/ChangeLog
>> >>>> ===================================================================
>> >>>> --- gcc/ChangeLog (revision 202663)
>> >>>> +++ gcc/ChangeLog (working copy)
>> >>>> @@ -1,3 +1,8 @@
>> >>>> +2013-10-01  Cong Hou  <congh@google.com>
>> >>>> +
>> >>>> + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Combine
>> >>>> + alias checks if it is possible to amortize the runtime overhead.
>> >>>> +
>> >>>>
>> >>>>
>> >>>
>> >>> --
>> >>> Richard Biener <rguenther@suse.de>
>> >>> SUSE / SUSE Labs
>> >>> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
>> >>> GF: Jeff Hawn, Jennifer Guild, Felix Imend
diff mbox

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8a38316..7333f6a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@ 
+2013-10-14  Cong Hou  <congh@google.com>
+
+	* tree-vect-loop-manip.c (vect_create_cond_for_alias_checks):
+	Combine alias checks if it is possible to amortize the runtime
+	overhead.  Return the number of alias checks after merging.
+	* tree-vect-data-refs.c (vect_prune_runtime_alias_test_list):
+	Use the function vect_create_cond_for_alias_checks () to check
+	the number of alias checks.
+
 2013-10-14  David Malcolm  <dmalcolm@redhat.com>
 
 	* dumpfile.h (gcc::dump_manager): New class, to hold state
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 075d071..ea2782f 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@ 
+2013-10-14  Cong Hou  <congh@google.com>
+
+	* gcc.dg/vect/vect-alias-check.c: New.
+
 2013-10-14  Tobias Burnus  <burnus@net-b.de>
 
 	PR fortran/58658
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c
new file mode 100644
index 0000000..00aa82f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize
+   --param=vect-max-version-for-alias-checks=2 -fdump-tree-vect-details" } */
+
+/* A test case showing three potential alias checks between
+   a[i] and b[i], b[i+3], b[i+6]. With alias checks merging
+   enabled, those tree checks can be merged into one, and the
+   loop will be vectorized with vect-max-version-for-alias-checks=2.  */
+
+void foo (int *a, int *b)
+{
+  int i;
+  for (i = 0; i < 1000; ++i)
+    a[i] = b[i] + b[i+3] + b[i+6];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index e7b2f33..a6ddbe2 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -128,41 +128,6 @@  vect_get_smallest_scalar_type (gimple stmt, HOST_WIDE_INT *lhs_size_unit,
 }
 
 
-/* Check if data references pointed by DR_I and DR_J are same or
-   belong to same interleaving group.  Return FALSE if drs are
-   different, otherwise return TRUE.  */
-
-static bool
-vect_same_range_drs (data_reference_p dr_i, data_reference_p dr_j)
-{
-  gimple stmt_i = DR_STMT (dr_i);
-  gimple stmt_j = DR_STMT (dr_j);
-
-  if (operand_equal_p (DR_REF (dr_i), DR_REF (dr_j), 0)
-      || (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i))
-	    && GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j))
-	    && (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i))
-		== GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j)))))
-    return true;
-  else
-    return false;
-}
-
-/* If address ranges represented by DDR_I and DDR_J are equal,
-   return TRUE, otherwise return FALSE.  */
-
-static bool
-vect_vfa_range_equal (ddr_p ddr_i, ddr_p ddr_j)
-{
-  if ((vect_same_range_drs (DDR_A (ddr_i), DDR_A (ddr_j))
-       && vect_same_range_drs (DDR_B (ddr_i), DDR_B (ddr_j)))
-      || (vect_same_range_drs (DDR_A (ddr_i), DDR_B (ddr_j))
-	  && vect_same_range_drs (DDR_B (ddr_i), DDR_A (ddr_j))))
-    return true;
-  else
-    return false;
-}
-
 /* Insert DDR into LOOP_VINFO list of ddrs that may alias and need to be
    tested at run-time.  Return TRUE if DDR was successfully inserted.
    Return false if versioning is not supported.  */
@@ -2380,7 +2345,7 @@  vect_analyze_data_ref_access (struct data_reference *dr)
     references.  T1 and T2 are two data references to be compared.
     The function returns -1, 0, or 1.  */
 
-static int
+int
 compare_tree (tree t1, tree t2)
 {
   int i, cmp;
@@ -2647,6 +2612,9 @@  vect_analyze_data_ref_accesses (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo)
   return true;
 }
 
+int
+vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr);
+
 /* Function vect_prune_runtime_alias_test_list.
 
    Prune a list of ddrs to be tested at run-time by versioning for alias.
@@ -2656,60 +2624,12 @@  vect_analyze_data_ref_accesses (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo)
 bool
 vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
 {
-  vec<ddr_p>  ddrs =
-    LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo);
-  unsigned i, j;
-
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
                      "=== vect_prune_runtime_alias_test_list ===\n");
 
-  for (i = 0; i < ddrs.length (); )
-    {
-      bool found;
-      ddr_p ddr_i;
-
-      ddr_i = ddrs[i];
-      found = false;
-
-      for (j = 0; j < i; j++)
-        {
-	  ddr_p ddr_j = ddrs[j];
-
-	  if (vect_vfa_range_equal (ddr_i, ddr_j))
-	    {
-	      if (dump_enabled_p ())
-		{
-		  dump_printf_loc (MSG_NOTE, vect_location,
-		                   "found equal ranges ");
-		  dump_generic_expr (MSG_NOTE, TDF_SLIM,
-		                     DR_REF (DDR_A (ddr_i)));
-		  dump_printf (MSG_NOTE,  ", ");
-		  dump_generic_expr (MSG_NOTE, TDF_SLIM,
-		                     DR_REF (DDR_B (ddr_i)));
-		  dump_printf (MSG_NOTE,  " and ");
-		  dump_generic_expr (MSG_NOTE, TDF_SLIM,
-		                     DR_REF (DDR_A (ddr_j)));
-		  dump_printf (MSG_NOTE,  ", ");
-		  dump_generic_expr (MSG_NOTE, TDF_SLIM,
-		                     DR_REF (DDR_B (ddr_j)));
-		  dump_printf (MSG_NOTE, "\n");
-		}
-	      found = true;
-	      break;
-	    }
-	}
-
-      if (found)
-      {
-	ddrs.ordered_remove (i);
-	continue;
-      }
-      i++;
-    }
-
-  if (ddrs.length () >
-       (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS))
+  if (vect_create_cond_for_alias_checks (loop_vinfo, NULL) >
+      PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS))
     {
       if (dump_enabled_p ())
 	{
@@ -2718,8 +2638,6 @@  vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
 	                   "generated checks exceeded.\n");
 	}
 
-      LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0);
-
       return false;
     }
 
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 574446a..a6e0bf3 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -2249,26 +2249,138 @@  vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
 }
 
 
+int compare_tree (tree t1, tree t2);
+
+namespace
+{
+
+/* struct dr_addr_with_seg_len
+
+   A struct storing information of a data reference, including the data
+   ref itself, its basic address, the access offset and the segment length
+   for aliasing checks.  */
+
+struct dr_addr_with_seg_len
+{
+  dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len)
+    : dr (d), basic_addr (addr), offset (off), seg_len (len) {}
+
+  data_reference *dr;
+  tree basic_addr;
+  tree offset;
+  tree seg_len;
+};
+
+/* Operator == between two dr_addr_with_seg_len objects.
+
+   This equality operator is used to make sure two data refs
+   are the same one so that we will consider to combine the
+   aliasing checks of those two pairs of data dependent data
+   refs.  */
+
+bool
+operator == (const dr_addr_with_seg_len& d1,
+	     const dr_addr_with_seg_len& d2)
+{
+  return operand_equal_p (d1.basic_addr, d2.basic_addr, 0)
+	 && compare_tree (d1.offset, d2.offset) == 0
+	 && compare_tree (d1.seg_len, d2.seg_len) == 0;
+}
+
+struct dr_addr_with_seg_len_pair_t
+{
+  dr_addr_with_seg_len_pair_t (const dr_addr_with_seg_len& d1,
+			       const dr_addr_with_seg_len& d2)
+    : first (d1), second (d2) {}
+  dr_addr_with_seg_len first;
+  dr_addr_with_seg_len second;
+};
+
+
+/* Function comp_dr_addr_with_seg_len_pair.
+
+   Comparison function for sorting objects of dr_addr_with_seg_len_pair_t
+   so that we can combine aliasing checks in one scan.  */
+
+int
+comp_dr_addr_with_seg_len_pair (const void *p1_, const void *p2_)
+{
+  const dr_addr_with_seg_len_pair_t* p1 =
+    (const dr_addr_with_seg_len_pair_t *) p1_;
+  const dr_addr_with_seg_len_pair_t* p2 =
+    (const dr_addr_with_seg_len_pair_t *) p2_;
+
+  const dr_addr_with_seg_len &p11 = p1->first,
+			     &p12 = p1->second,
+			     &p21 = p2->first,
+			     &p22 = p2->second;
+
+  int comp_res = compare_tree (p11.basic_addr, p21.basic_addr);
+  if (comp_res != 0)
+    return comp_res;
+
+  comp_res = compare_tree (p12.basic_addr, p22.basic_addr);
+  if (comp_res != 0)
+    return comp_res;
+
+  if (TREE_CODE (p11.offset) != INTEGER_CST
+      || TREE_CODE (p21.offset) != INTEGER_CST)
+    {
+      comp_res = compare_tree (p11.offset, p21.offset);
+      if (comp_res != 0)
+	return comp_res;
+    }
+  if (tree_int_cst_compare (p11.offset, p21.offset) < 0)
+    return -1;
+  if (tree_int_cst_compare (p11.offset, p21.offset) > 0)
+    return 1;
+  if (TREE_CODE (p12.offset) != INTEGER_CST
+      || TREE_CODE (p22.offset) != INTEGER_CST)
+    {
+      comp_res = compare_tree (p12.offset, p22.offset);
+      if (comp_res != 0)
+	return comp_res;
+    }
+  if (tree_int_cst_compare (p12.offset, p22.offset) < 0)
+    return -1;
+  if (tree_int_cst_compare (p12.offset, p22.offset) > 0)
+    return 1;
+
+  return 0;
+}
+
+template <class T>
+void swap (T& a, T& b)
+{
+  T c (a);
+  a = b;
+  b = c;
+}
+
+}
+
+
 /* Function vect_create_cond_for_alias_checks.
 
    Create a conditional expression that represents the run-time checks for
    overlapping of address ranges represented by a list of data references
-   relations passed as input.
+   relations passed as input.  Return the number of alias checks.
 
    Input:
    COND_EXPR  - input conditional expression.  New conditions will be chained
-                with logical AND operation.
+                with logical AND operation.  If it is NULL, then the function
+                is used to return the number of alias checks.
    LOOP_VINFO - field LOOP_VINFO_MAY_ALIAS_STMTS contains the list of ddrs
 	        to be checked.
 
    Output:
    COND_EXPR - conditional expression.
 
-   The returned value is the conditional expression to be used in the if
+   The returned COND_EXPR is the conditional expression to be used in the if
    statement that controls which version of the loop gets executed at runtime.
 */
 
-static void
+int
 vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr)
 {
   vec<ddr_p>  may_alias_ddrs =
@@ -2290,22 +2402,54 @@  vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr)
      || (load_ptr_n + load_segment_length_n) <= store_ptr_n))  */
 
   if (may_alias_ddrs.is_empty ())
-    return;
+    return 0;
+
+
+  /* Basically, for each pair of dependent data refs store_ptr_0
+     and load_ptr_0, we create an expression:
+
+     ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
+     || (load_ptr_0 + load_segment_length_0) <= store_ptr_0))
+
+     for aliasing checks.  However, in some cases we can decrease
+     the number of checks by combining two checks into one.  For
+     example, suppose we have another pair of data refs store_ptr_0
+     and load_ptr_1, and if the following condition is satisfied:
+
+     load_ptr_0 < load_ptr_1  &&
+     load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0
+
+     (this condition means, in each iteration of vectorized loop,
+     the accessed memory of store_ptr_0 cannot be between the memory
+     of load_ptr_0 and load_ptr_1.)
+
+     we then can use only the following expression to finish the
+     alising checks between store_ptr_0 & load_ptr_0 and
+     store_ptr_0 & load_ptr_1:
+
+     ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
+     || (load_ptr_1 + load_segment_length_1 <= store_ptr_0))
+
+     Note that we only consider that load_ptr_0 and load_ptr_1 have the
+     same basic address.  */
+
+  vec<dr_addr_with_seg_len_pair_t> ddrs_with_seg_len;
+  ddrs_with_seg_len.create (may_alias_ddrs.length ());
+
+  /* First, we collect all data ref pairs for aliasing checks.  */
 
   FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
     {
       struct data_reference *dr_a, *dr_b;
       gimple dr_group_first_a, dr_group_first_b;
-      tree addr_base_a, addr_base_b;
       tree segment_length_a, segment_length_b;
       gimple stmt_a, stmt_b;
-      tree seg_a_min, seg_a_max, seg_b_min, seg_b_max;
 
       dr_a = DDR_A (ddr);
       stmt_a = DR_STMT (DDR_A (ddr));
       dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
       if (dr_group_first_a)
-        {
+	{
 	  stmt_a = dr_group_first_a;
 	  dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
 	}
@@ -2314,20 +2458,11 @@  vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr)
       stmt_b = DR_STMT (DDR_B (ddr));
       dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
       if (dr_group_first_b)
-        {
+	{
 	  stmt_b = dr_group_first_b;
 	  dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
 	}
 
-      addr_base_a
-	= fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a),
-				   size_binop (PLUS_EXPR, DR_OFFSET (dr_a),
-					       DR_INIT (dr_a)));
-      addr_base_b
-	= fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b),
-				   size_binop (PLUS_EXPR, DR_OFFSET (dr_b),
-					       DR_INIT (dr_b)));
-
       if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
 	length_factor = scalar_loop_iters;
       else
@@ -2335,24 +2470,145 @@  vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr)
       segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
       segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
 
+      dr_addr_with_seg_len_pair_t dr_with_seg_len_pair
+	  (dr_addr_with_seg_len
+	       (dr_a, DR_BASE_ADDRESS (dr_a),
+		size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)),
+		segment_length_a),
+	   dr_addr_with_seg_len
+	       (dr_b, DR_BASE_ADDRESS (dr_b),
+		size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)),
+		segment_length_b));
+
+      if (dr_with_seg_len_pair.first.basic_addr >
+	  dr_with_seg_len_pair.second.basic_addr)
+	  swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
+
+      ddrs_with_seg_len.safe_push (dr_with_seg_len_pair);
+    }
+
+  /* Second, we sort the collected data ref pairs so that we can scan
+     them once to combine all possible aliasing checks.  */
+  ddrs_with_seg_len.qsort (comp_dr_addr_with_seg_len_pair);
+
+  /* We then scan the sorted dr pairs and check if we can combine
+     alias checks of two neighbouring dr pairs.  */
+
+  for (size_t i = 1; i < ddrs_with_seg_len.length (); ++i)
+    {
+      /* Deal with two ddrs (dr_a1, dr_b1) and (dr_a2, dr_b2).  */
+      dr_addr_with_seg_len *dr_a1 = &ddrs_with_seg_len[i-1].first,
+			   *dr_b1 = &ddrs_with_seg_len[i-1].second,
+			   *dr_a2 = &ddrs_with_seg_len[i].first,
+			   *dr_b2 = &ddrs_with_seg_len[i].second;
+
+      /* Remove duplicate data ref pairs.  */
+      if (*dr_a1 == *dr_a2 && *dr_b1 == *dr_b2)
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "found equal ranges ");
+	      dump_generic_expr (MSG_NOTE, TDF_SLIM,
+				 DR_REF (dr_a1->dr));
+	      dump_printf (MSG_NOTE,  ", ");
+	      dump_generic_expr (MSG_NOTE, TDF_SLIM,
+				 DR_REF (dr_b1->dr));
+	      dump_printf (MSG_NOTE,  " and ");
+	      dump_generic_expr (MSG_NOTE, TDF_SLIM,
+				 DR_REF (dr_a2->dr));
+	      dump_printf (MSG_NOTE,  ", ");
+	      dump_generic_expr (MSG_NOTE, TDF_SLIM,
+				 DR_REF (dr_b2->dr));
+	      dump_printf (MSG_NOTE, "\n");
+	    }
+
+	  ddrs_with_seg_len.ordered_remove (i--);
+	  continue;
+	}
+
+      if (*dr_a1 == *dr_a2 || *dr_b1 == *dr_b2)
+	{
+	  /* We consider the case that DR_B1 and DR_B2 are same memrefs,
+	     and DR_A1 and DR_A2 are two consecutive memrefs.  */
+	  if (*dr_a1 == *dr_a2)
+	    {
+	      swap (dr_a1, dr_b1);
+	      swap (dr_a2, dr_b2);
+	    }
+
+	  if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0)
+	      || TREE_CODE (dr_a1->offset) != INTEGER_CST
+	      || TREE_CODE (dr_a2->offset) != INTEGER_CST)
+	    continue;
+
+	  HOST_WIDEST_INT diff = widest_int_cst_value (dr_a2->offset) -
+				 widest_int_cst_value (dr_a1->offset);
+
+	  if (diff <= vect_factor
+	      || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
+		  && diff - widest_int_cst_value (dr_a1->seg_len) < vect_factor)
+	      || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
+		  && TREE_CODE (dr_b1->seg_len) == INTEGER_CST
+		  && diff - widest_int_cst_value (dr_a1->seg_len) <
+		     widest_int_cst_value (dr_b1->seg_len)))
+	    {
+	      if (dump_enabled_p ())
+		{
+		  dump_printf_loc
+		      (MSG_NOTE, vect_location,
+		       "combining two runtime checks for data references ");
+		  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1->dr));
+		  dump_printf (MSG_NOTE, " and ");
+		  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2->dr));
+		  dump_printf (MSG_NOTE, "\n");
+		}
+
+	      dr_a1->seg_len = size_binop (PLUS_EXPR,
+					  dr_a2->seg_len, size_int (diff));
+	      ddrs_with_seg_len.ordered_remove (i--);
+	    }
+	}
+    }
+
+  int alias_check_num = ddrs_with_seg_len.length ();
+
+  if (!cond_expr)
+    {
+      ddrs_with_seg_len.release ();
+      return alias_check_num;
+    }
+
+  for (size_t i = 0, s = ddrs_with_seg_len.length (); i < s; ++i)
+    {
+      const dr_addr_with_seg_len& dr_a = ddrs_with_seg_len[i].first;
+      const dr_addr_with_seg_len& dr_b = ddrs_with_seg_len[i].second;
+      tree segment_length_a = dr_a.seg_len;
+      tree segment_length_b = dr_b.seg_len;
+
+      tree addr_base_a
+	= fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset);
+      tree addr_base_b
+	= fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset);
+
       if (dump_enabled_p ())
 	{
 	  dump_printf_loc (MSG_NOTE, vect_location,
-                           "create runtime check for data references ");
-	  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
+			   "create runtime check for data references ");
+	  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr));
 	  dump_printf (MSG_NOTE, " and ");
-	  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
-          dump_printf (MSG_NOTE, "\n");
+	  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr));
+	  dump_printf (MSG_NOTE, "\n");
 	}
 
-      seg_a_min = addr_base_a;
-      seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
-      if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0)
+      tree seg_a_min = addr_base_a;
+      tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
+      if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0)
 	seg_a_min = seg_a_max, seg_a_max = addr_base_a;
 
-      seg_b_min = addr_base_b;
-      seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
-      if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0)
+      tree seg_b_min = addr_base_b;
+      tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
+      if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0)
 	seg_b_min = seg_b_max, seg_b_max = addr_base_b;
 
       part_cond_expr =
@@ -2367,10 +2623,13 @@  vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr)
 	*cond_expr = part_cond_expr;
     }
 
+  ddrs_with_seg_len.release ();
+
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
 		     "created %u versioning for alias checks.\n",
 		     may_alias_ddrs.length ());
+  return alias_check_num;
 }