Message ID | 6D39441BF12EF246A7ABCE6654B0235380B5CE8C@hhmail02.hh.imgtec.org |
---|---|
State | New |
Headers | show |
On 02/07/2017 09:08 AM, Matthew Fortune wrote: > Hi, > > This patch is a minimal change to prevent (subreg(mem)) from being > simplified to use the outer mode for WORD_REGISTER_OPERATIONS. There > is high probability of refining and/or re-implementing this for GCC 8 > but such a change would be too invasive. This change at least ensures > correctness but may prevent simplification of some acceptable cases. > > No testcase here but I will try to make one using the RTL frontend > later. This fix is required for mips64el-linux-gnu bootstrap > to succeed (in conjunction with patch 1 of this series). The specific > file affected by this bug is building gcc/predict.c where a bad reload > is created in predict_paths_for_bb. Register 300 in the following > example is spilled to memory in SImode but reloaded as DImode. > > (insn 247 212 389 3 (set (reg:SI 300) > (ne:SI (subreg/s/u:SI (reg/v:DI 231 [ taken ]) 0) > (const_int 0 [0]))) "/home/mfortune/gcc/gcc/predict.c":2904 504 {*sne_zero_sisi} > (nil)) > ... > > (insn 250 256 251 40 (set (reg:DI 6 $6) > (subreg:DI (reg:SI 300) 0)) "/home/mfortune/gcc/gcc/predict.c":2904 310 {*movdi_64bit} > (nil)) > gcc/ > PR target/78660 > * lra-constraints.c (simplify_operand_subreg): Handle > WORD_REGISTER_OPERATIONS targets. > This patch is OK too. Thanks again.
> This patch is a minimal change to prevent (subreg(mem)) from being > simplified to use the outer mode for WORD_REGISTER_OPERATIONS. There > is high probability of refining and/or re-implementing this for GCC 8 > but such a change would be too invasive. This change at least ensures > correctness but may prevent simplification of some acceptable cases. This one causes: +FAIL: gcc.dg/torture/builtin-complex-1.c -O3 -fomit-frame-pointer -funroll- loops -fpeel-loops -ftracer -finline-functions (test for excess errors) +WARNING: gcc.dg/torture/builtin-complex-1.c -O3 -fomit-frame-pointer - funroll-loops -fpeel-loops -ftracer -finline-functions compilation failed to produce executable +FAIL: gcc.dg/torture/builtin-complex-1.c -O3 -g (test for excess errors) +WARNING: gcc.dg/torture/builtin-complex-1.c -O3 -g compilation failed to produce executable +WARNING: program timed out. +WARNING: program timed out. on SPARC 32-bit, i.e. LRA hangs. Reduced testcase attached, compile at -O3 with a cc1 configured for sparc-sun-solaris2.10. > gcc/ > PR target/78660 > * lra-constraints.c (simplify_operand_subreg): Handle > WORD_REGISTER_OPERATIONS targets. > --- > gcc/lra-constraints.c | 17 ++++++++++++----- > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c > index 66ff2bb..484a70d 100644 > --- a/gcc/lra-constraints.c > +++ b/gcc/lra-constraints.c > @@ -1541,11 +1541,18 @@ simplify_operand_subreg (int nop, machine_mode > reg_mode) subregs as we don't substitute such equiv memory (see processing > equivalences in function lra_constraints) and because for spilled pseudos > we allocate stack memory enough for the biggest > - corresponding paradoxical subreg. */ > - if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > - && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > - || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > - && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)))) > + corresponding paradoxical subreg. > + > + However, never simplify a (subreg (mem ...)) for > + WORD_REGISTER_OPERATIONS targets as this may lead to loading junk > + data into a register when the inner is narrower than outer or > + missing important data from memory when the inner is wider than > + outer. */ > + if (!WORD_REGISTER_OPERATIONS > + && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > + && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > + || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > + && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg))))) > return true; > > *curr_id->operand_loc[nop] = operand; I think that we might need: if (!(GET_MODE_PRECISION (mode) > GET_MODE_PRECISION (innermode) && WORD_REGISTER_OPERATIONS) && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg))))) return true; i.e. we force the reloading only for paradoxical subregs.
Sorry for the slow reply, been away for a few days.... Eric Botcazou <ebotcazou@adacore.com> writes: > > This patch is a minimal change to prevent (subreg(mem)) from being > > simplified to use the outer mode for WORD_REGISTER_OPERATIONS. There > > is high probability of refining and/or re-implementing this for GCC 8 > > but such a change would be too invasive. This change at least ensures > > correctness but may prevent simplification of some acceptable cases. > > This one causes: > > +FAIL: gcc.dg/torture/builtin-complex-1.c -O3 -fomit-frame-pointer - > funroll- > loops -fpeel-loops -ftracer -finline-functions (test for excess errors) > +WARNING: gcc.dg/torture/builtin-complex-1.c -O3 -fomit-frame-pointer > - > funroll-loops -fpeel-loops -ftracer -finline-functions compilation > failed to produce executable > +FAIL: gcc.dg/torture/builtin-complex-1.c -O3 -g (test for excess > errors) > +WARNING: gcc.dg/torture/builtin-complex-1.c -O3 -g compilation > failed to > produce executable > +WARNING: program timed out. > +WARNING: program timed out. > > on SPARC 32-bit, i.e. LRA hangs. Reduced testcase attached, compile at > -O3 with a cc1 configured for sparc-sun-solaris2.10. > > > gcc/ > > PR target/78660 > > * lra-constraints.c (simplify_operand_subreg): Handle > > WORD_REGISTER_OPERATIONS targets. > > --- > > gcc/lra-constraints.c | 17 ++++++++++++----- > > 1 file changed, 12 insertions(+), 5 deletions(-) > > > > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index > > 66ff2bb..484a70d 100644 > > --- a/gcc/lra-constraints.c > > +++ b/gcc/lra-constraints.c > > @@ -1541,11 +1541,18 @@ simplify_operand_subreg (int nop, machine_mode > > reg_mode) subregs as we don't substitute such equiv memory (see > > processing equivalences in function lra_constraints) and because for > > spilled pseudos we allocate stack memory enough for the biggest > > - corresponding paradoxical subreg. */ > > - if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > > - && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > > - || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > > - && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)))) > > + corresponding paradoxical subreg. > > + > > + However, never simplify a (subreg (mem ...)) for > > + WORD_REGISTER_OPERATIONS targets as this may lead to loading > junk > > + data into a register when the inner is narrower than outer or > > + missing important data from memory when the inner is wider > than > > + outer. */ > > + if (!WORD_REGISTER_OPERATIONS > > + && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > > + && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > > + || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > > + && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN > (reg))))) > > return true; > > > > *curr_id->operand_loc[nop] = operand; > > I think that we might need: > > if (!(GET_MODE_PRECISION (mode) > GET_MODE_PRECISION (innermode) > && WORD_REGISTER_OPERATIONS) > && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg))))) > return true; > > i.e. we force the reloading only for paradoxical subregs. Maybe, though I'm not entirely convinced. For WORD_REGISTER_OPERATIONS both paradoxical and normal SUBREGs can be a problem as the inner mode in both cases can be used elsewhere for a reload making the content of the spill slot wrong either in the subreg reload or the ordinary reload elsewhere. However, the condition can be tightened; I should not have made it so simplistic I guess. I.e. both modes must be no wider than a word and must be different precision to force an inner reload. Adding that check would fix this case for SPARC and should be fine for MIPS but I need to wait for a bootstrap build to be sure. I don't really understand why LRA can't reload this for SPARC though as I'm not sure there is any guarantee provided to backends that some SUBREGs will be reloaded using their outer mode. If there is such a guarantee then it would be much easier to reason about this logic but as it stands I suspect we are having to tweak LRA to cope with assumptions made in various targets that happen to have held true (and I have no doubt that MIPS has some of these as well especially in terms of the FP/GP register usage with float and int modes.) All being well we can capture such assumptions and formalise them so we ensure they hold true (or modify backends appropriately I guess). The condition would look like this, What do you think? if (!(GET_MODE_PRECISION (mode) != GET_MODE_PRECISION (innermode) && GET_MODE_SIZE (mode) <= UNITS_PER_WORD && GET_MODE_SIZE (innermode) <= UNITS_PER_WORD && WORD_REGISTER_OPERATIONS) && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg))))) return true; Thanks, Matthew
Hi Eric, Any thoughts on this? Thanks, Matthew > Sorry for the slow reply, been away for a few days.... > > Eric Botcazou <ebotcazou@adacore.com> writes: > > > This patch is a minimal change to prevent (subreg(mem)) from being > > > simplified to use the outer mode for WORD_REGISTER_OPERATIONS. > > > There is high probability of refining and/or re-implementing this > > > for GCC 8 but such a change would be too invasive. This change at > > > least ensures correctness but may prevent simplification of some > acceptable cases. > > > > This one causes: > > > > +FAIL: gcc.dg/torture/builtin-complex-1.c -O3 -fomit-frame-pointer - > > funroll- > > loops -fpeel-loops -ftracer -finline-functions (test for excess > > errors) > > +WARNING: gcc.dg/torture/builtin-complex-1.c -O3 -fomit-frame- > pointer > > - > > funroll-loops -fpeel-loops -ftracer -finline-functions compilation > > failed to produce executable > > +FAIL: gcc.dg/torture/builtin-complex-1.c -O3 -g (test for excess > > errors) > > +WARNING: gcc.dg/torture/builtin-complex-1.c -O3 -g compilation > > failed to > > produce executable > > +WARNING: program timed out. > > +WARNING: program timed out. > > > > on SPARC 32-bit, i.e. LRA hangs. Reduced testcase attached, compile > > at > > -O3 with a cc1 configured for sparc-sun-solaris2.10. > > > > > gcc/ > > > PR target/78660 > > > * lra-constraints.c (simplify_operand_subreg): Handle > > > WORD_REGISTER_OPERATIONS targets. > > > --- > > > gcc/lra-constraints.c | 17 ++++++++++++----- > > > 1 file changed, 12 insertions(+), 5 deletions(-) > > > > > > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index > > > 66ff2bb..484a70d 100644 > > > --- a/gcc/lra-constraints.c > > > +++ b/gcc/lra-constraints.c > > > @@ -1541,11 +1541,18 @@ simplify_operand_subreg (int nop, > > > machine_mode > > > reg_mode) subregs as we don't substitute such equiv memory (see > > > processing equivalences in function lra_constraints) and because for > > > spilled pseudos we allocate stack memory enough for the biggest > > > - corresponding paradoxical subreg. */ > > > - if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > > > - && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > > > - || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > > > - && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)))) > > > + corresponding paradoxical subreg. > > > + > > > + However, never simplify a (subreg (mem ...)) for > > > + WORD_REGISTER_OPERATIONS targets as this may lead to loading > > junk > > > + data into a register when the inner is narrower than outer or > > > + missing important data from memory when the inner is wider > > than > > > + outer. */ > > > + if (!WORD_REGISTER_OPERATIONS > > > + && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > > > + && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > > > + || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > > > + && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN > > (reg))))) > > > return true; > > > > > > *curr_id->operand_loc[nop] = operand; > > > > I think that we might need: > > > > if (!(GET_MODE_PRECISION (mode) > GET_MODE_PRECISION (innermode) > > && WORD_REGISTER_OPERATIONS) > > && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > > && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > > || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > > && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg))))) > > return true; > > > > i.e. we force the reloading only for paradoxical subregs. > > Maybe, though I'm not entirely convinced. For WORD_REGISTER_OPERATIONS > both paradoxical and normal SUBREGs can be a problem as the inner mode > in both cases can be used elsewhere for a reload making the content of > the spill slot wrong either in the subreg reload or the ordinary reload > elsewhere. However, the condition can be tightened; I should not have > made it so simplistic I guess. I.e. both modes must be no wider than a > word and must be different precision to force an inner reload. > > Adding that check would fix this case for SPARC and should be fine for > MIPS but I need to wait for a bootstrap build to be sure. > > I don't really understand why LRA can't reload this for SPARC though as > I'm not sure there is any guarantee provided to backends that some > SUBREGs will be reloaded using their outer mode. If there is such a > guarantee then it would be much easier to reason about this logic but as > it stands I suspect we are having to tweak LRA to cope with assumptions > made in various targets that happen to have held true (and I have no > doubt that MIPS has some of these as well especially in terms of the > FP/GP register usage with float and int modes.) All being well we can > capture such assumptions and formalise them so we ensure they hold true > (or modify backends appropriately I guess). > > The condition would look like this, What do you think? > > if (!(GET_MODE_PRECISION (mode) != GET_MODE_PRECISION > (innermode) > && GET_MODE_SIZE (mode) <= UNITS_PER_WORD > && GET_MODE_SIZE (innermode) <= UNITS_PER_WORD > && WORD_REGISTER_OPERATIONS) > && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN > (reg))))) > return true; > > Thanks, > Matthew
> The condition would look like this, What do you think? > > if (!(GET_MODE_PRECISION (mode) != GET_MODE_PRECISION (innermode) > && GET_MODE_SIZE (mode) <= UNITS_PER_WORD > && GET_MODE_SIZE (innermode) <= UNITS_PER_WORD > && WORD_REGISTER_OPERATIONS) > && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > > || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > > && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN > (reg))))) return true; OK, I'm going to give it a try on SPARC.
> The condition would look like this, What do you think? > > if (!(GET_MODE_PRECISION (mode) != GET_MODE_PRECISION (innermode) > && GET_MODE_SIZE (mode) <= UNITS_PER_WORD > && GET_MODE_SIZE (innermode) <= UNITS_PER_WORD > && WORD_REGISTER_OPERATIONS) > && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > > || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > > && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN > (reg))))) return true; No regressions on SPARC.
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index 66ff2bb..484a70d 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -1541,11 +1541,18 @@ simplify_operand_subreg (int nop, machine_mode reg_mode) subregs as we don't substitute such equiv memory (see processing equivalences in function lra_constraints) and because for spilled pseudos we allocate stack memory enough for the biggest - corresponding paradoxical subreg. */ - if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) - && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) - || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) - && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)))) + corresponding paradoxical subreg. + + However, never simplify a (subreg (mem ...)) for + WORD_REGISTER_OPERATIONS targets as this may lead to loading junk + data into a register when the inner is narrower than outer or + missing important data from memory when the inner is wider than + outer. */ + if (!WORD_REGISTER_OPERATIONS + && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) + && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) + || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) + && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg))))) return true; *curr_id->operand_loc[nop] = operand;