Message ID | CY1PR1201MB1098080CE1BB5ABC451551488FCA0@CY1PR1201MB1098.namprd12.prod.outlook.com |
---|---|
State | New |
Headers | show |
On Tue, Jan 12, 2016 at 3:51 PM, Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> wrote: > Hi > > The code below it looks like we always call “vect_permute_load_chain” to load non-unit strides of size powers of 2. > > (---snip---) > /* If reassociation width for vector type is 2 or greater target machine can > execute 2 or more vector instructions in parallel. Otherwise try to > get chain for loads group using vect_shift_permute_load_chain. */ > mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt))); > > if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1 > || exact_log2 (size) != -1 > || !vect_shift_permute_load_chain (dr_chain, size, stmt, > gsi, &result_chain)) > vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain); > > static bool > vect_shift_permute_load_chain (vec<tree> dr_chain, > unsigned int length, > gimple *stmt, > gimple_stmt_iterator *gsi, > vec<tree> *result_chain) > { > …... > …... > if (exact_log2 (length) != -1 && LOOP_VINFO_VECT_FACTOR (loop_vinfo) > 4) ⇐ This is not used. > { > unsigned int j, log_length = exact_log2 (length); > for (i = 0; i < nelt / 2; ++i) > sel[i] = i * 2; > for (i = 0; i < nelt / 2; ++i) > sel[nelt / 2 + i] = i * 2 + 1; > (---snip------) > > > Is there any reason to do so? No idea, benchmarking or history probably (vect_shift_permute_load_chain not handlinging size != 3). > I have not done any benchmarking, but tried simple test cases for -mavx targets with sizes 2, 4 and VF > 4 (short/char types). > Looks like using vect_shift_permute_load_chain seems better. > > Should we change it to something like this ? > > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c > index d0e20da..b0f0a02 100644 > --- a/gcc/tree-vect-data-refs.c > +++ b/gcc/tree-vect-data-refs.c > @@ -5733,9 +5733,9 @@ vect_transform_grouped_load (gimple *stmt, vec<tree> dr_chain, int size, > get chain for loads group using vect_shift_permute_load_chain. */ > mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt))); > if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1 > - || exact_log2 (size) != -1 > - || !vect_shift_permute_load_chain (dr_chain, size, stmt, > - gsi, &result_chain)) > + || (!vect_shift_permute_load_chain (dr_chain, size, stmt, > + gsi, &result_chain) > + && exact_log2 (size) != -1)) Iff then the exact_log2 check should be simply dropped. It doesn't make much sense with shift_permute_laod_chain supporting power-of-two size. Of course only benchmarking will tell ;) Richard. > vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain); > vect_record_grouped_load_vectors (stmt, result_chain); > result_chain.release (); > > regards, > Venkat.
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index d0e20da..b0f0a02 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -5733,9 +5733,9 @@ vect_transform_grouped_load (gimple *stmt, vec<tree> dr_chain, int size, get chain for loads group using vect_shift_permute_load_chain. */ mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt))); if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1 - || exact_log2 (size) != -1 - || !vect_shift_permute_load_chain (dr_chain, size, stmt, - gsi, &result_chain)) + || (!vect_shift_permute_load_chain (dr_chain, size, stmt, + gsi, &result_chain) + && exact_log2 (size) != -1)) vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain); vect_record_grouped_load_vectors (stmt, result_chain); result_chain.release ();