Message ID | 20230626120429.3403256-1-juzhe.zhong@rivai.ai |
---|---|
State | New |
Headers | show |
Series | [V3] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE | expand |
On Mon, 26 Jun 2023, juzhe.zhong@rivai.ai wrote: > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai> > > Hi, Richi. It seems that we use nunits which is len + bias to iterate then we can > simplify the codes. > > Also, I fixed behavior of len_store, > > Before this patch: > (len - bias) * BITS_PER_UNIT > After this patch: > (len + bias) * BITS_PER_UNIT > > gcc/ChangeLog: > > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and fix LEN_STORE. > > --- > gcc/tree-ssa-sccvn.cc | 24 ++++++++++++++++++++++-- > 1 file changed, 22 insertions(+), 2 deletions(-) > > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc > index 11061a374a2..d66e75460ed 100644 > --- a/gcc/tree-ssa-sccvn.cc > +++ b/gcc/tree-ssa-sccvn.cc > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > return (void *)-1; > break; > + case IFN_LEN_MASK_STORE: > + len = gimple_call_arg (call, 2); > + bias = gimple_call_arg (call, 5); > + if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > + return (void *)-1; > + mask = gimple_call_arg (call, internal_fn_mask_index (fn)); > + mask = vn_valueize (mask); > + if (TREE_CODE (mask) != VECTOR_CST) > + return (void *)-1; > + break; > default: > return (void *)-1; > } > @@ -3344,6 +3354,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > tree vectype = TREE_TYPE (def_rhs); > unsigned HOST_WIDE_INT elsz > = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))); > + poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); > + if (len) > + { > + /* Since the following while condition known_lt > + (mask_idx, nunits) will exit the while loop > + when mask_idx > nunits.coeffs[0], we pick the > + MIN (nunits.coeffs[0], len + bias). */ > + nunits = MIN (nunits.coeffs[0], > + tree_to_uhwi (len) + tree_to_shwi (bias)); I think you can use ordered_min here? Alternatively doing ... > + } > if (mask) > { > HOST_WIDE_INT start = 0, length = 0; > @@ -3373,7 +3393,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > length += elsz; > mask_idx++; > } > - while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype))); > + while (known_lt (mask_idx, nunits)); && mask_id < len would be possible. Richard? Thanks, Richard.
Hi, Richi. >> I think you can use ordered_min here? Alternatively doing ... I check the function of ordered_min: ordered_min (const poly_int_pod<N, Ca> &a, const poly_int_pod<N, Cb> &b) { if (known_le (a, b)) return a; else { if (N > 1) gcc_checking_assert (known_le (b, a)); return b; } } It seems that assertion will fail When nunits = [2,2] , len + bias = 3, for example. I may be wrong. Thanks. juzhe.zhong@rivai.ai From: Richard Biener Date: 2023-06-26 20:16 To: Ju-Zhe Zhong CC: gcc-patches; richard.sandiford Subject: Re: [PATCH V3] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE On Mon, 26 Jun 2023, juzhe.zhong@rivai.ai wrote: > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai> > > Hi, Richi. It seems that we use nunits which is len + bias to iterate then we can > simplify the codes. > > Also, I fixed behavior of len_store, > > Before this patch: > (len - bias) * BITS_PER_UNIT > After this patch: > (len + bias) * BITS_PER_UNIT > > gcc/ChangeLog: > > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and fix LEN_STORE. > > --- > gcc/tree-ssa-sccvn.cc | 24 ++++++++++++++++++++++-- > 1 file changed, 22 insertions(+), 2 deletions(-) > > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc > index 11061a374a2..d66e75460ed 100644 > --- a/gcc/tree-ssa-sccvn.cc > +++ b/gcc/tree-ssa-sccvn.cc > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > return (void *)-1; > break; > + case IFN_LEN_MASK_STORE: > + len = gimple_call_arg (call, 2); > + bias = gimple_call_arg (call, 5); > + if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > + return (void *)-1; > + mask = gimple_call_arg (call, internal_fn_mask_index (fn)); > + mask = vn_valueize (mask); > + if (TREE_CODE (mask) != VECTOR_CST) > + return (void *)-1; > + break; > default: > return (void *)-1; > } > @@ -3344,6 +3354,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > tree vectype = TREE_TYPE (def_rhs); > unsigned HOST_WIDE_INT elsz > = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))); > + poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); > + if (len) > + { > + /* Since the following while condition known_lt > + (mask_idx, nunits) will exit the while loop > + when mask_idx > nunits.coeffs[0], we pick the > + MIN (nunits.coeffs[0], len + bias). */ > + nunits = MIN (nunits.coeffs[0], > + tree_to_uhwi (len) + tree_to_shwi (bias)); I think you can use ordered_min here? Alternatively doing ... > + } > if (mask) > { > HOST_WIDE_INT start = 0, length = 0; > @@ -3373,7 +3393,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > length += elsz; > mask_idx++; > } > - while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype))); > + while (known_lt (mask_idx, nunits)); && mask_id < len would be possible. Richard? Thanks, Richard.
On Mon, 26 Jun 2023, juzhe.zhong@rivai.ai wrote: > Hi, Richi. > > >> I think you can use ordered_min here? Alternatively doing ... > > I check the function of ordered_min: > ordered_min (const poly_int_pod<N, Ca> &a, const poly_int_pod<N, Cb> &b) > { > if (known_le (a, b)) > return a; > else > { > if (N > 1) > gcc_checking_assert (known_le (b, a)); > return b; > } > } > > It seems that assertion will fail When nunits = [2,2] , len + bias = 3, for example. Yes, looks like so. > I may be wrong. I guess it would be nice to re-formulate the loop in terms of the encoded VECTOR_CST elts, but then we need to generate the "extents" for set bits, not sure how to do that here. Note in the end we get HOST_WIDE_INT extents from adding the element size for each mask element we look at. The question is how and if we currently handle the trailing ... correctly for VL vectors. It should be a matter of creating a few testcases where we expect (or expect not) to CSE a [masked] VL vector load with one or multiple stores. Like if we have *v = 0; *(v + vls) = 1; ... = *(v + vls/2); that is, two VL vector stores that are "adjacent" and one load that half-overlaps both. That 'vls' would be a poly-int CST then. It might be possible to create the above with intrinsics(?), for sure within a loop by vectorization. Richard. > Thanks. > > > juzhe.zhong@rivai.ai > > From: Richard Biener > Date: 2023-06-26 20:16 > To: Ju-Zhe Zhong > CC: gcc-patches; richard.sandiford > Subject: Re: [PATCH V3] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE > On Mon, 26 Jun 2023, juzhe.zhong@rivai.ai wrote: > > > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai> > > > > Hi, Richi. It seems that we use nunits which is len + bias to iterate then we can > > simplify the codes. > > > > Also, I fixed behavior of len_store, > > > > Before this patch: > > (len - bias) * BITS_PER_UNIT > > After this patch: > > (len + bias) * BITS_PER_UNIT > > > > gcc/ChangeLog: > > > > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and fix LEN_STORE. > > > > --- > > gcc/tree-ssa-sccvn.cc | 24 ++++++++++++++++++++++-- > > 1 file changed, 22 insertions(+), 2 deletions(-) > > > > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc > > index 11061a374a2..d66e75460ed 100644 > > --- a/gcc/tree-ssa-sccvn.cc > > +++ b/gcc/tree-ssa-sccvn.cc > > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > > if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > > return (void *)-1; > > break; > > + case IFN_LEN_MASK_STORE: > > + len = gimple_call_arg (call, 2); > > + bias = gimple_call_arg (call, 5); > > + if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > > + return (void *)-1; > > + mask = gimple_call_arg (call, internal_fn_mask_index (fn)); > > + mask = vn_valueize (mask); > > + if (TREE_CODE (mask) != VECTOR_CST) > > + return (void *)-1; > > + break; > > default: > > return (void *)-1; > > } > > @@ -3344,6 +3354,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > > tree vectype = TREE_TYPE (def_rhs); > > unsigned HOST_WIDE_INT elsz > > = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))); > > + poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); > > + if (len) > > + { > > + /* Since the following while condition known_lt > > + (mask_idx, nunits) will exit the while loop > > + when mask_idx > nunits.coeffs[0], we pick the > > + MIN (nunits.coeffs[0], len + bias). */ > > + nunits = MIN (nunits.coeffs[0], > > + tree_to_uhwi (len) + tree_to_shwi (bias)); > > I think you can use ordered_min here? Alternatively doing ... > > > + } > > if (mask) > > { > > HOST_WIDE_INT start = 0, length = 0; > > @@ -3373,7 +3393,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > > length += elsz; > > mask_idx++; > > } > > - while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype))); > > + while (known_lt (mask_idx, nunits)); > > && mask_id < len > > would be possible. > > Richard? > > Thanks, > Richard. > >
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc index 11061a374a2..d66e75460ed 100644 --- a/gcc/tree-ssa-sccvn.cc +++ b/gcc/tree-ssa-sccvn.cc @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) return (void *)-1; break; + case IFN_LEN_MASK_STORE: + len = gimple_call_arg (call, 2); + bias = gimple_call_arg (call, 5); + if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) + return (void *)-1; + mask = gimple_call_arg (call, internal_fn_mask_index (fn)); + mask = vn_valueize (mask); + if (TREE_CODE (mask) != VECTOR_CST) + return (void *)-1; + break; default: return (void *)-1; } @@ -3344,6 +3354,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, tree vectype = TREE_TYPE (def_rhs); unsigned HOST_WIDE_INT elsz = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))); + poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); + if (len) + { + /* Since the following while condition known_lt + (mask_idx, nunits) will exit the while loop + when mask_idx > nunits.coeffs[0], we pick the + MIN (nunits.coeffs[0], len + bias). */ + nunits = MIN (nunits.coeffs[0], + tree_to_uhwi (len) + tree_to_shwi (bias)); + } if (mask) { HOST_WIDE_INT start = 0, length = 0; @@ -3373,7 +3393,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, length += elsz; mask_idx++; } - while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype))); + while (known_lt (mask_idx, nunits)); if (length != 0) { pd.rhs_off = start; @@ -3389,7 +3409,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, { pd.offset = offset2i; pd.size = (tree_to_uhwi (len) - + -tree_to_shwi (bias)) * BITS_PER_UNIT; + + tree_to_shwi (bias)) * BITS_PER_UNIT; if (BYTES_BIG_ENDIAN) pd.rhs_off = pd.size - tree_to_uhwi (TYPE_SIZE (vectype)); else
From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai> Hi, Richi. It seems that we use nunits which is len + bias to iterate then we can simplify the codes. Also, I fixed behavior of len_store, Before this patch: (len - bias) * BITS_PER_UNIT After this patch: (len + bias) * BITS_PER_UNIT gcc/ChangeLog: * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and fix LEN_STORE. --- gcc/tree-ssa-sccvn.cc | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-)