Message ID | DB6PR0802MB2504F8D3210C3BD88374FE89E7950@DB6PR0802MB2504.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | [PR85804] Fix wrong code by correcting bump step computation in vector(1) load of single-element group access | expand |
On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote: > Hi, > As reported in PR85804, bump step is wrongly computed for vector(1) load of > single-element group access. This patch fixes the issue by correcting bump > step computation for the specific VMAT_CONTIGUOUS case. > Bootstrap and test on x86_64 and AArch64 ongoing, is it OK? To me it looks like the classification as VMAT_CONTIGUOUS is bogus. We'd fall into the grouped_load case otherwise which should handle the situation correctly? Richard? Thanks, Richard. > Thanks, > bin > 2018-05-17 Bin Cheng <bin.cheng@arm.com> > PR tree-optimization/85804 > * tree-vect-stmts.c (vectorizable_load): Compute correct bump step > for vector(1) load in single-element group access. > gcc/testsuite > 2018-05-17 Bin Cheng <bin.cheng@arm.com> > PR tree-optimization/85804 > * gcc.c-torture/execute/pr85804.c: New test.
Richard Biener <richard.guenther@gmail.com> writes: > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote: > >> Hi, >> As reported in PR85804, bump step is wrongly computed for vector(1) load > of >> single-element group access. This patch fixes the issue by correcting > bump >> step computation for the specific VMAT_CONTIGUOUS case. > >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK? > > To me it looks like the classification as VMAT_CONTIGUOUS is bogus. > We'd fall into the grouped_load case otherwise which should handle > the situation correctly? > > Richard? Yeah, I agree. I mentioned to Bin privately that that was probably a misstep and that we should instead continue to treat them as VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector from the array of loaded vectors, instead of doing an actual permute. (Note that VMAT_CONTIGUOUS is OK for stores, since we don't allow gaps there. But it might be easiest to handle both loads and stores in the same way.) Although it still seems weird to "vectorise" stuff to one element. Why not leave the original scalar code in place, and put the onus on whatever wants to produce or consume a V1 to do the appropriate conversion? Thanks, Richard
On Tue, May 22, 2018 at 2:11 PM Richard Sandiford < richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: > > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote: > > > >> Hi, > >> As reported in PR85804, bump step is wrongly computed for vector(1) load > > of > >> single-element group access. This patch fixes the issue by correcting > > bump > >> step computation for the specific VMAT_CONTIGUOUS case. > > > >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK? > > > > To me it looks like the classification as VMAT_CONTIGUOUS is bogus. > > We'd fall into the grouped_load case otherwise which should handle > > the situation correctly? > > > > Richard? > Yeah, I agree. I mentioned to Bin privately that that was probably > a misstep and that we should instead continue to treat them as > VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector > from the array of loaded vectors, instead of doing an actual permute. I'd classify them as VMAT_ELEMENTWISE instead. CONTIGUOUS should be only for no-gap vectors. How do we classify single-element interleaving? That would be another classification choice. > (Note that VMAT_CONTIGUOUS is OK for stores, since we don't allow > gaps there. But it might be easiest to handle both loads and stores > in the same way.) > Although it still seems weird to "vectorise" stuff to one element. > Why not leave the original scalar code in place, and put the onus on > whatever wants to produce or consume a V1 to do the appropriate > conversion? Yeah, V1 is somewhat awkward to deal with but I think the way we're doing it right now is OK. Note if we leave the original scalar code in place we still have to deal with larger unrolling factors thus we'd have to duplicate the scalar code and have different IVs anyways. Richard. > Thanks, > Richard
On Wed, May 23, 2018 at 11:19 AM, Richard Biener <richard.guenther@gmail.com> wrote: > On Tue, May 22, 2018 at 2:11 PM Richard Sandiford < > richard.sandiford@linaro.org> wrote: > >> Richard Biener <richard.guenther@gmail.com> writes: >> > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote: >> > >> >> Hi, >> >> As reported in PR85804, bump step is wrongly computed for vector(1) > load >> > of >> >> single-element group access. This patch fixes the issue by correcting >> > bump >> >> step computation for the specific VMAT_CONTIGUOUS case. >> > >> >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK? >> > >> > To me it looks like the classification as VMAT_CONTIGUOUS is bogus. >> > We'd fall into the grouped_load case otherwise which should handle >> > the situation correctly? >> > >> > Richard? > >> Yeah, I agree. I mentioned to Bin privately that that was probably >> a misstep and that we should instead continue to treat them as >> VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector >> from the array of loaded vectors, instead of doing an actual permute. > > I'd classify them as VMAT_ELEMENTWISE instead. CONTIGUOUS > should be only for no-gap vectors. How do we classify single-element > interleaving? That would be another classification choice. Yes, I suggested this offline too, but Richard may have more to say about this. One thing worth noting is classifying it as VMAT_ELEMENTWISE would disable vectorization in this case because of cost model issue as commented at the end of get_load_store_type. Thanks, bin > >> (Note that VMAT_CONTIGUOUS is OK for stores, since we don't allow >> gaps there. But it might be easiest to handle both loads and stores >> in the same way.) > >> Although it still seems weird to "vectorise" stuff to one element. >> Why not leave the original scalar code in place, and put the onus on >> whatever wants to produce or consume a V1 to do the appropriate >> conversion? > > Yeah, V1 is somewhat awkward to deal with but I think the way we're > doing it right now is OK. Note if we leave the original scalar code in > place we still have to deal with larger unrolling factors thus we'd have > to duplicate the scalar code and have different IVs anyways. > > Richard. > >> Thanks, >> Richard
"Bin.Cheng" <amker.cheng@gmail.com> writes: > On Wed, May 23, 2018 at 11:19 AM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Tue, May 22, 2018 at 2:11 PM Richard Sandiford < >> richard.sandiford@linaro.org> wrote: >> >>> Richard Biener <richard.guenther@gmail.com> writes: >>> > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote: >>> > >>> >> Hi, >>> >> As reported in PR85804, bump step is wrongly computed for vector(1) >> load >>> > of >>> >> single-element group access. This patch fixes the issue by correcting >>> > bump >>> >> step computation for the specific VMAT_CONTIGUOUS case. >>> > >>> >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK? >>> > >>> > To me it looks like the classification as VMAT_CONTIGUOUS is bogus. >>> > We'd fall into the grouped_load case otherwise which should handle >>> > the situation correctly? >>> > >>> > Richard? >> >>> Yeah, I agree. I mentioned to Bin privately that that was probably >>> a misstep and that we should instead continue to treat them as >>> VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector >>> from the array of loaded vectors, instead of doing an actual permute. >> >> I'd classify them as VMAT_ELEMENTWISE instead. CONTIGUOUS >> should be only for no-gap vectors. How do we classify single-element >> interleaving? That would be another classification choice. > Yes, I suggested this offline too, but Richard may have more to say about this. > One thing worth noting is classifying it as VMAT_ELEMENTWISE would > disable vectorization in this case because of cost model issue as > commented at the end of get_load_store_type. Yeah, that's the problem. Using VMAT_ELEMENTWISE also means that we use a scalar load and then insert it into a vector, whereas all we want (and all we currently generate) is a single vector load. So if we classify them as VMAT_ELEMENTWISE, they'll be a special case for both costing (vector load rather than scalar load and vector construct) and code-generation. Thanks, Richard
On Wed, May 23, 2018 at 12:01 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > "Bin.Cheng" <amker.cheng@gmail.com> writes: >> On Wed, May 23, 2018 at 11:19 AM, Richard Biener >> <richard.guenther@gmail.com> wrote: >>> On Tue, May 22, 2018 at 2:11 PM Richard Sandiford < >>> richard.sandiford@linaro.org> wrote: >>> >>>> Richard Biener <richard.guenther@gmail.com> writes: >>>> > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote: >>>> > >>>> >> Hi, >>>> >> As reported in PR85804, bump step is wrongly computed for vector(1) >>> load >>>> > of >>>> >> single-element group access. This patch fixes the issue by correcting >>>> > bump >>>> >> step computation for the specific VMAT_CONTIGUOUS case. >>>> > >>>> >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK? >>>> > >>>> > To me it looks like the classification as VMAT_CONTIGUOUS is bogus. >>>> > We'd fall into the grouped_load case otherwise which should handle >>>> > the situation correctly? >>>> > >>>> > Richard? >>> >>>> Yeah, I agree. I mentioned to Bin privately that that was probably >>>> a misstep and that we should instead continue to treat them as >>>> VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector >>>> from the array of loaded vectors, instead of doing an actual permute. >>> >>> I'd classify them as VMAT_ELEMENTWISE instead. CONTIGUOUS >>> should be only for no-gap vectors. How do we classify single-element >>> interleaving? That would be another classification choice. >> Yes, I suggested this offline too, but Richard may have more to say about this. >> One thing worth noting is classifying it as VMAT_ELEMENTWISE would >> disable vectorization in this case because of cost model issue as >> commented at the end of get_load_store_type. > > Yeah, that's the problem. Using VMAT_ELEMENTWISE also means that we > use a scalar load and then insert it into a vector, whereas all we want > (and all we currently generate) is a single vector load. > > So if we classify them as VMAT_ELEMENTWISE, they'll be a special case > for both costing (vector load rather than scalar load and vector > construct) and code-generation. Looks to me it will be a special case for VMAT_ELEMENTWISE or VMAT_CONTIGUOUS* anyway, probably VMAT_CONTIGUOUS is the easiest one? Thanks, bin > > Thanks, > Richard
On Wed, May 23, 2018 at 1:02 PM Richard Sandiford < richard.sandiford@linaro.org> wrote: > "Bin.Cheng" <amker.cheng@gmail.com> writes: > > On Wed, May 23, 2018 at 11:19 AM, Richard Biener > > <richard.guenther@gmail.com> wrote: > >> On Tue, May 22, 2018 at 2:11 PM Richard Sandiford < > >> richard.sandiford@linaro.org> wrote: > >> > >>> Richard Biener <richard.guenther@gmail.com> writes: > >>> > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote: > >>> > > >>> >> Hi, > >>> >> As reported in PR85804, bump step is wrongly computed for vector(1) > >> load > >>> > of > >>> >> single-element group access. This patch fixes the issue by correcting > >>> > bump > >>> >> step computation for the specific VMAT_CONTIGUOUS case. > >>> > > >>> >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK? > >>> > > >>> > To me it looks like the classification as VMAT_CONTIGUOUS is bogus. > >>> > We'd fall into the grouped_load case otherwise which should handle > >>> > the situation correctly? > >>> > > >>> > Richard? > >> > >>> Yeah, I agree. I mentioned to Bin privately that that was probably > >>> a misstep and that we should instead continue to treat them as > >>> VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector > >>> from the array of loaded vectors, instead of doing an actual permute. > >> > >> I'd classify them as VMAT_ELEMENTWISE instead. CONTIGUOUS > >> should be only for no-gap vectors. How do we classify single-element > >> interleaving? That would be another classification choice. > > Yes, I suggested this offline too, but Richard may have more to say about this. > > One thing worth noting is classifying it as VMAT_ELEMENTWISE would > > disable vectorization in this case because of cost model issue as > > commented at the end of get_load_store_type. > Yeah, that's the problem. Using VMAT_ELEMENTWISE also means that we > use a scalar load and then insert it into a vector, whereas all we want > (and all we currently generate) is a single vector load. > So if we classify them as VMAT_ELEMENTWISE, they'll be a special case > for both costing (vector load rather than scalar load and vector > construct) and code-generation. But V1 elementwise loads already work by loading with the V1 vector type (Bin fixed that recently). But yes, the costmodel thing should be fixed anyways. Richard. > Thanks, > Richard
On Wed, May 23, 2018 at 1:10 PM Bin.Cheng <amker.cheng@gmail.com> wrote: > On Wed, May 23, 2018 at 12:01 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: > > "Bin.Cheng" <amker.cheng@gmail.com> writes: > >> On Wed, May 23, 2018 at 11:19 AM, Richard Biener > >> <richard.guenther@gmail.com> wrote: > >>> On Tue, May 22, 2018 at 2:11 PM Richard Sandiford < > >>> richard.sandiford@linaro.org> wrote: > >>> > >>>> Richard Biener <richard.guenther@gmail.com> writes: > >>>> > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote: > >>>> > > >>>> >> Hi, > >>>> >> As reported in PR85804, bump step is wrongly computed for vector(1) > >>> load > >>>> > of > >>>> >> single-element group access. This patch fixes the issue by correcting > >>>> > bump > >>>> >> step computation for the specific VMAT_CONTIGUOUS case. > >>>> > > >>>> >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK? > >>>> > > >>>> > To me it looks like the classification as VMAT_CONTIGUOUS is bogus. > >>>> > We'd fall into the grouped_load case otherwise which should handle > >>>> > the situation correctly? > >>>> > > >>>> > Richard? > >>> > >>>> Yeah, I agree. I mentioned to Bin privately that that was probably > >>>> a misstep and that we should instead continue to treat them as > >>>> VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector > >>>> from the array of loaded vectors, instead of doing an actual permute. > >>> > >>> I'd classify them as VMAT_ELEMENTWISE instead. CONTIGUOUS > >>> should be only for no-gap vectors. How do we classify single-element > >>> interleaving? That would be another classification choice. > >> Yes, I suggested this offline too, but Richard may have more to say about this. > >> One thing worth noting is classifying it as VMAT_ELEMENTWISE would > >> disable vectorization in this case because of cost model issue as > >> commented at the end of get_load_store_type. > > > > Yeah, that's the problem. Using VMAT_ELEMENTWISE also means that we > > use a scalar load and then insert it into a vector, whereas all we want > > (and all we currently generate) is a single vector load. > > > > So if we classify them as VMAT_ELEMENTWISE, they'll be a special case > > for both costing (vector load rather than scalar load and vector > > construct) and code-generation. > Looks to me it will be a special case for VMAT_ELEMENTWISE or > VMAT_CONTIGUOUS* anyway, probably VMAT_CONTIGUOUS is the easiest one? The question is why we do if (memory_access_type == VMAT_GATHER_SCATTER || (!slp && memory_access_type == VMAT_CONTIGUOUS)) grouped_load = false; I think for the particular testcase removing this would fix the issue as well? Richard. > Thanks, > bin > > > > Thanks, > > Richard
On Wed, May 23, 2018 at 12:12 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Wed, May 23, 2018 at 1:10 PM Bin.Cheng <amker.cheng@gmail.com> wrote: > >> On Wed, May 23, 2018 at 12:01 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >> > "Bin.Cheng" <amker.cheng@gmail.com> writes: >> >> On Wed, May 23, 2018 at 11:19 AM, Richard Biener >> >> <richard.guenther@gmail.com> wrote: >> >>> On Tue, May 22, 2018 at 2:11 PM Richard Sandiford < >> >>> richard.sandiford@linaro.org> wrote: >> >>> >> >>>> Richard Biener <richard.guenther@gmail.com> writes: >> >>>> > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> > wrote: >> >>>> > >> >>>> >> Hi, >> >>>> >> As reported in PR85804, bump step is wrongly computed for > vector(1) >> >>> load >> >>>> > of >> >>>> >> single-element group access. This patch fixes the issue by > correcting >> >>>> > bump >> >>>> >> step computation for the specific VMAT_CONTIGUOUS case. >> >>>> > >> >>>> >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK? >> >>>> > >> >>>> > To me it looks like the classification as VMAT_CONTIGUOUS is bogus. >> >>>> > We'd fall into the grouped_load case otherwise which should handle >> >>>> > the situation correctly? >> >>>> > >> >>>> > Richard? >> >>> >> >>>> Yeah, I agree. I mentioned to Bin privately that that was probably >> >>>> a misstep and that we should instead continue to treat them as >> >>>> VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector >> >>>> from the array of loaded vectors, instead of doing an actual permute. >> >>> >> >>> I'd classify them as VMAT_ELEMENTWISE instead. CONTIGUOUS >> >>> should be only for no-gap vectors. How do we classify single-element >> >>> interleaving? That would be another classification choice. >> >> Yes, I suggested this offline too, but Richard may have more to say > about this. >> >> One thing worth noting is classifying it as VMAT_ELEMENTWISE would >> >> disable vectorization in this case because of cost model issue as >> >> commented at the end of get_load_store_type. >> > >> > Yeah, that's the problem. Using VMAT_ELEMENTWISE also means that we >> > use a scalar load and then insert it into a vector, whereas all we want >> > (and all we currently generate) is a single vector load. >> > >> > So if we classify them as VMAT_ELEMENTWISE, they'll be a special case >> > for both costing (vector load rather than scalar load and vector >> > construct) and code-generation. >> Looks to me it will be a special case for VMAT_ELEMENTWISE or >> VMAT_CONTIGUOUS* anyway, probably VMAT_CONTIGUOUS is the easiest one? > > The question is why we do > > if (memory_access_type == VMAT_GATHER_SCATTER > || (!slp && memory_access_type == VMAT_CONTIGUOUS)) > grouped_load = false; > > I think for the particular testcase removing this would fix the issue as No, simply relaxing this check would result in generating redundant loads (for gap). Also it leads to vect_transform_grouped_load as well as vect_permute_load_chain, so we would need to do the special case handling just like classifying as VMAT_CONTIGUOUS_PERMUTE. Thanks, bin > well? > > Richard. > >> Thanks, >> bin >> > >> > Thanks, >> > Richard
From 502bcd1e445186a56b6ea254a0cd2406fb62f08c Mon Sep 17 00:00:00 2001 From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com> Date: Fri, 18 May 2018 14:18:14 +0100 Subject: [PATCH] pr85804-20180517 --- gcc/testsuite/gcc.c-torture/execute/pr85804.c | 22 ++++++++++++++++++++++ gcc/tree-vect-stmts.c | 15 +++++++++++++++ 2 files changed, 37 insertions(+) create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr85804.c diff --git a/gcc/testsuite/gcc.c-torture/execute/pr85804.c b/gcc/testsuite/gcc.c-torture/execute/pr85804.c new file mode 100644 index 0000000..b8929b1 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr85804.c @@ -0,0 +1,22 @@ +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */ + +long d[64] = {0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 +}; + +void abort (); +void __attribute__((noinline)) foo (int b) +{ + if (b) + abort (); +} +int main() { + int b = 0; + for (int c = 0; c <= 5; c++) + b ^= d[c * 5 + 1]; + foo (b); + return 0; +} + diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 64a157d..e6b95b3 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -7262,6 +7262,7 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, gphi *phi = NULL; vec<tree> dr_chain = vNULL; bool grouped_load = false; + bool single_element = false; gimple *first_stmt; gimple *first_stmt_for_drptr = NULL; bool inv_p; @@ -7822,6 +7823,15 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, first_stmt = stmt; first_dr = dr; group_size = vec_num = 1; + /* For single-element vector in a single-element group, record the group + size in order to compute correct bump size. */ + if (!slp + && memory_access_type == VMAT_CONTIGUOUS + && STMT_VINFO_GROUPED_ACCESS (stmt_info)) + { + single_element = true; + group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt)); + } group_gap_adj = 0; ref_type = reference_alias_ptr_type (DR_REF (first_dr)); } @@ -7992,6 +8002,11 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, else aggr_type = vectype; bump = vect_get_data_ptr_increment (dr, aggr_type, memory_access_type); + /* Multiply bump size by group size for single-element vector in single- + element group. */ + if (single_element && group_size > 1) + bump = fold_build2 (MULT_EXPR, TREE_TYPE (bump), bump, + build_int_cst (TREE_TYPE (bump), group_size)); } tree vec_mask = NULL_TREE; -- 1.9.1