Message ID | mpt1rrkgc6t.fsf@arm.com |
---|---|
State | New |
Headers | show |
Series | predcom: Fix invalid store-store commoning [PR93434] | expand |
Richard Sandiford <richard.sandiford@arm.com> writes: > predcom has the following code to stop one rogue load from > interfering with other store-load opportunities: > > /* If A is read and B write or vice versa and there is unsuitable > dependence, instead of merging both components into a component > that will certainly not pass suitable_component_p, just put the > read into bad component, perhaps at least the write together with > all the other data refs in it's component will be optimizable. */ > > But when store-store commoning was added later, this had the effect > of ignoring loads that occur between two candidate stores. > > There is code further up to handle loads and stores with unknown > dependences: > > /* Don't do store elimination if there is any unknown dependence for > any store data reference. */ > if ((DR_IS_WRITE (dra) || DR_IS_WRITE (drb)) > && (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know > || DDR_NUM_DIST_VECTS (ddr) == 0)) > eliminate_store_p = false; > > But the store-load code above skips loads for *known* dependences > if (a) the load has already been marked "bad" or (b) the data-ref > machinery knows the dependence distance, but determine_offsets > can't handle the combination. > > (a) happens to be the problem in the testcase, but a different > sequence could have given (b) instead. We have writes to individual > fields of a structure and reads from the whole structure. Since > determine_offsets requires the types to be the same, it returns false > for each such read/write combination. > > This patch records which components have had loads removed and > prevents store-store commoning for them. It's a bit too pessimistic, > since there shouldn't be a problem if a "bad" load dominates all stores > in a component. But (a) we can't AFAIK use pcom_stmt_dominates_stmt_p > here and (b) the handling for that case would probably need to be > removed again if we handled more exotic cases in future. Forgot to add: Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK for master and eventually for backports? Thanks, Richard > > 2020-01-28 Richard Sandiford <richard.sandiford@arm.com> > > gcc/ > PR tree-optimization/93434 > * tree-predcom.c (split_data_refs_to_components): Record which > components have had aliasing loads removed. Prevent store-store > commoning for all such components. > > gcc/testsuite/ > PR tree-optimization/93434 > * gcc.c-torture/execute/pr93434.c: New test. > --- > gcc/testsuite/gcc.c-torture/execute/pr93434.c | 36 +++++++++++++++++++ > gcc/tree-predcom.c | 24 +++++++++++-- > 2 files changed, 58 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr93434.c > > diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c > index 047c7fa9583..d2dcfe7f42d 100644 > --- a/gcc/tree-predcom.c > +++ b/gcc/tree-predcom.c > @@ -767,6 +767,7 @@ split_data_refs_to_components (class loop *loop, > /* Don't do store elimination if loop has multiple exit edges. */ > bool eliminate_store_p = single_exit (loop) != NULL; > basic_block last_always_executed = last_always_executed_block (loop); > + auto_bitmap no_store_store_comps; > > FOR_EACH_VEC_ELT (datarefs, i, dr) > { > @@ -838,9 +839,13 @@ split_data_refs_to_components (class loop *loop, > else if (DR_IS_READ (dra) && ib != bad) > { > if (ia == bad) > - continue; > + { > + bitmap_set_bit (no_store_store_comps, ib); > + continue; > + } > else if (!determine_offset (dra, drb, &dummy_off)) > { > + bitmap_set_bit (no_store_store_comps, ib); > merge_comps (comp_father, comp_size, bad, ia); > continue; > } > @@ -848,9 +853,13 @@ split_data_refs_to_components (class loop *loop, > else if (DR_IS_READ (drb) && ia != bad) > { > if (ib == bad) > - continue; > + { > + bitmap_set_bit (no_store_store_comps, ia); > + continue; > + } > else if (!determine_offset (dra, drb, &dummy_off)) > { > + bitmap_set_bit (no_store_store_comps, ia); > merge_comps (comp_father, comp_size, bad, ib); > continue; > } > @@ -908,6 +917,17 @@ split_data_refs_to_components (class loop *loop, > comp->refs.quick_push (dataref); > } > > + if (eliminate_store_p) > + { > + bitmap_iterator bi; > + EXECUTE_IF_SET_IN_BITMAP (no_store_store_comps, 0, ia, bi) > + { > + ca = component_of (comp_father, ia); > + if (ca != bad) > + comps[ca]->eliminate_store_p = false; > + } > + } > + > for (i = 0; i < n; i++) > { > comp = comps[i]; > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr93434.c b/gcc/testsuite/gcc.c-torture/execute/pr93434.c > new file mode 100644 > index 00000000000..e786252794b > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/execute/pr93434.c > @@ -0,0 +1,36 @@ > +typedef struct creal_T { > + double re; > + double im; > +} creal_T; > + > +#define N 16 > +int main() { > + int k; > + int i; > + int j; > + creal_T t2[N]; > + double inval; > + > + inval = 1.0; > + for (j = 0; j < N; ++j) { > + t2[j].re = 0; > + t2[j].im = 0; > + } > + > + for (j = 0; j < N/4; j++) { > + i = j * 4; > + t2[i].re = inval; > + t2[i].im = inval; > + k = i + 3; > + t2[k].re = inval; > + t2[k].im = inval; > + t2[i] = t2[k]; > + t2[k].re = inval; > + } > + > + for (i = 0; i < 2; ++i) > + if (t2[i].re != !i || t2[i].im != !i) > + __builtin_abort (); > + > + return 0; > +}
On Tue, Jan 28, 2020 at 10:49 AM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Richard Sandiford <richard.sandiford@arm.com> writes: > > predcom has the following code to stop one rogue load from > > interfering with other store-load opportunities: > > > > /* If A is read and B write or vice versa and there is unsuitable > > dependence, instead of merging both components into a component > > that will certainly not pass suitable_component_p, just put the > > read into bad component, perhaps at least the write together with > > all the other data refs in it's component will be optimizable. */ > > > > But when store-store commoning was added later, this had the effect > > of ignoring loads that occur between two candidate stores. > > > > There is code further up to handle loads and stores with unknown > > dependences: > > > > /* Don't do store elimination if there is any unknown dependence for > > any store data reference. */ > > if ((DR_IS_WRITE (dra) || DR_IS_WRITE (drb)) > > && (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know > > || DDR_NUM_DIST_VECTS (ddr) == 0)) > > eliminate_store_p = false; > > > > But the store-load code above skips loads for *known* dependences > > if (a) the load has already been marked "bad" or (b) the data-ref > > machinery knows the dependence distance, but determine_offsets > > can't handle the combination. > > > > (a) happens to be the problem in the testcase, but a different > > sequence could have given (b) instead. We have writes to individual > > fields of a structure and reads from the whole structure. Since > > determine_offsets requires the types to be the same, it returns false > > for each such read/write combination. > > > > This patch records which components have had loads removed and > > prevents store-store commoning for them. It's a bit too pessimistic, > > since there shouldn't be a problem if a "bad" load dominates all stores > > in a component. But (a) we can't AFAIK use pcom_stmt_dominates_stmt_p > > here and (b) the handling for that case would probably need to be > > removed again if we handled more exotic cases in future. > > Forgot to add: > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK for master > and eventually for backports? OK for master and backports. Richard. > Thanks, > Richard > > > > > 2020-01-28 Richard Sandiford <richard.sandiford@arm.com> > > > > gcc/ > > PR tree-optimization/93434 > > * tree-predcom.c (split_data_refs_to_components): Record which > > components have had aliasing loads removed. Prevent store-store > > commoning for all such components. > > > > gcc/testsuite/ > > PR tree-optimization/93434 > > * gcc.c-torture/execute/pr93434.c: New test. > > --- > > gcc/testsuite/gcc.c-torture/execute/pr93434.c | 36 +++++++++++++++++++ > > gcc/tree-predcom.c | 24 +++++++++++-- > > 2 files changed, 58 insertions(+), 2 deletions(-) > > create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr93434.c > > > > diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c > > index 047c7fa9583..d2dcfe7f42d 100644 > > --- a/gcc/tree-predcom.c > > +++ b/gcc/tree-predcom.c > > @@ -767,6 +767,7 @@ split_data_refs_to_components (class loop *loop, > > /* Don't do store elimination if loop has multiple exit edges. */ > > bool eliminate_store_p = single_exit (loop) != NULL; > > basic_block last_always_executed = last_always_executed_block (loop); > > + auto_bitmap no_store_store_comps; > > > > FOR_EACH_VEC_ELT (datarefs, i, dr) > > { > > @@ -838,9 +839,13 @@ split_data_refs_to_components (class loop *loop, > > else if (DR_IS_READ (dra) && ib != bad) > > { > > if (ia == bad) > > - continue; > > + { > > + bitmap_set_bit (no_store_store_comps, ib); > > + continue; > > + } > > else if (!determine_offset (dra, drb, &dummy_off)) > > { > > + bitmap_set_bit (no_store_store_comps, ib); > > merge_comps (comp_father, comp_size, bad, ia); > > continue; > > } > > @@ -848,9 +853,13 @@ split_data_refs_to_components (class loop *loop, > > else if (DR_IS_READ (drb) && ia != bad) > > { > > if (ib == bad) > > - continue; > > + { > > + bitmap_set_bit (no_store_store_comps, ia); > > + continue; > > + } > > else if (!determine_offset (dra, drb, &dummy_off)) > > { > > + bitmap_set_bit (no_store_store_comps, ia); > > merge_comps (comp_father, comp_size, bad, ib); > > continue; > > } > > @@ -908,6 +917,17 @@ split_data_refs_to_components (class loop *loop, > > comp->refs.quick_push (dataref); > > } > > > > + if (eliminate_store_p) > > + { > > + bitmap_iterator bi; > > + EXECUTE_IF_SET_IN_BITMAP (no_store_store_comps, 0, ia, bi) > > + { > > + ca = component_of (comp_father, ia); > > + if (ca != bad) > > + comps[ca]->eliminate_store_p = false; > > + } > > + } > > + > > for (i = 0; i < n; i++) > > { > > comp = comps[i]; > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr93434.c b/gcc/testsuite/gcc.c-torture/execute/pr93434.c > > new file mode 100644 > > index 00000000000..e786252794b > > --- /dev/null > > +++ b/gcc/testsuite/gcc.c-torture/execute/pr93434.c > > @@ -0,0 +1,36 @@ > > +typedef struct creal_T { > > + double re; > > + double im; > > +} creal_T; > > + > > +#define N 16 > > +int main() { > > + int k; > > + int i; > > + int j; > > + creal_T t2[N]; > > + double inval; > > + > > + inval = 1.0; > > + for (j = 0; j < N; ++j) { > > + t2[j].re = 0; > > + t2[j].im = 0; > > + } > > + > > + for (j = 0; j < N/4; j++) { > > + i = j * 4; > > + t2[i].re = inval; > > + t2[i].im = inval; > > + k = i + 3; > > + t2[k].re = inval; > > + t2[k].im = inval; > > + t2[i] = t2[k]; > > + t2[k].re = inval; > > + } > > + > > + for (i = 0; i < 2; ++i) > > + if (t2[i].re != !i || t2[i].im != !i) > > + __builtin_abort (); > > + > > + return 0; > > +}
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c index 047c7fa9583..d2dcfe7f42d 100644 --- a/gcc/tree-predcom.c +++ b/gcc/tree-predcom.c @@ -767,6 +767,7 @@ split_data_refs_to_components (class loop *loop, /* Don't do store elimination if loop has multiple exit edges. */ bool eliminate_store_p = single_exit (loop) != NULL; basic_block last_always_executed = last_always_executed_block (loop); + auto_bitmap no_store_store_comps; FOR_EACH_VEC_ELT (datarefs, i, dr) { @@ -838,9 +839,13 @@ split_data_refs_to_components (class loop *loop, else if (DR_IS_READ (dra) && ib != bad) { if (ia == bad) - continue; + { + bitmap_set_bit (no_store_store_comps, ib); + continue; + } else if (!determine_offset (dra, drb, &dummy_off)) { + bitmap_set_bit (no_store_store_comps, ib); merge_comps (comp_father, comp_size, bad, ia); continue; } @@ -848,9 +853,13 @@ split_data_refs_to_components (class loop *loop, else if (DR_IS_READ (drb) && ia != bad) { if (ib == bad) - continue; + { + bitmap_set_bit (no_store_store_comps, ia); + continue; + } else if (!determine_offset (dra, drb, &dummy_off)) { + bitmap_set_bit (no_store_store_comps, ia); merge_comps (comp_father, comp_size, bad, ib); continue; } @@ -908,6 +917,17 @@ split_data_refs_to_components (class loop *loop, comp->refs.quick_push (dataref); } + if (eliminate_store_p) + { + bitmap_iterator bi; + EXECUTE_IF_SET_IN_BITMAP (no_store_store_comps, 0, ia, bi) + { + ca = component_of (comp_father, ia); + if (ca != bad) + comps[ca]->eliminate_store_p = false; + } + } + for (i = 0; i < n; i++) { comp = comps[i]; diff --git a/gcc/testsuite/gcc.c-torture/execute/pr93434.c b/gcc/testsuite/gcc.c-torture/execute/pr93434.c new file mode 100644 index 00000000000..e786252794b --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr93434.c @@ -0,0 +1,36 @@ +typedef struct creal_T { + double re; + double im; +} creal_T; + +#define N 16 +int main() { + int k; + int i; + int j; + creal_T t2[N]; + double inval; + + inval = 1.0; + for (j = 0; j < N; ++j) { + t2[j].re = 0; + t2[j].im = 0; + } + + for (j = 0; j < N/4; j++) { + i = j * 4; + t2[i].re = inval; + t2[i].im = inval; + k = i + 3; + t2[k].re = inval; + t2[k].im = inval; + t2[i] = t2[k]; + t2[k].re = inval; + } + + for (i = 0; i < 2; ++i) + if (t2[i].re != !i || t2[i].im != !i) + __builtin_abort (); + + return 0; +}