diff mbox

[3/5] Support WORD_REGISTER_OPERATIONS requirements in simplify_operand_subreg

Message ID 6D39441BF12EF246A7ABCE6654B0235380B5CE8C@hhmail02.hh.imgtec.org
State New
Headers show

Commit Message

Matthew Fortune Feb. 7, 2017, 2:08 p.m. UTC
Hi,

This patch is a minimal change to prevent (subreg(mem)) from being
simplified to use the outer mode for WORD_REGISTER_OPERATIONS.  There
is high probability of refining and/or re-implementing this for GCC 8
but such a change would be too invasive.  This change at least ensures
correctness but may prevent simplification of some acceptable cases.

No testcase here but I will try to make one using the RTL frontend
later.  This fix is required for mips64el-linux-gnu bootstrap
to succeed (in conjunction with patch 1 of this series). The specific
file affected by this bug is building gcc/predict.c where a bad reload
is created in predict_paths_for_bb.  Register 300 in the following
example is spilled to memory in SImode but reloaded as DImode.

(insn 247 212 389 3 (set (reg:SI 300)
        (ne:SI (subreg/s/u:SI (reg/v:DI 231 [ taken ]) 0)
            (const_int 0 [0]))) "/home/mfortune/gcc/gcc/predict.c":2904 504 {*sne_zero_sisi}
     (nil))
...

(insn 250 256 251 40 (set (reg:DI 6 $6)
        (subreg:DI (reg:SI 300) 0)) "/home/mfortune/gcc/gcc/predict.c":2904 310 {*movdi_64bit}
     (nil))

Thanks,
Matthew

gcc/
	PR target/78660
	* lra-constraints.c (simplify_operand_subreg): Handle
	WORD_REGISTER_OPERATIONS targets.
---
 gcc/lra-constraints.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Comments

Vladimir Makarov Feb. 7, 2017, 10:08 p.m. UTC | #1
On 02/07/2017 09:08 AM, Matthew Fortune wrote:
> Hi,
>
> This patch is a minimal change to prevent (subreg(mem)) from being
> simplified to use the outer mode for WORD_REGISTER_OPERATIONS.  There
> is high probability of refining and/or re-implementing this for GCC 8
> but such a change would be too invasive.  This change at least ensures
> correctness but may prevent simplification of some acceptable cases.
>
> No testcase here but I will try to make one using the RTL frontend
> later.  This fix is required for mips64el-linux-gnu bootstrap
> to succeed (in conjunction with patch 1 of this series). The specific
> file affected by this bug is building gcc/predict.c where a bad reload
> is created in predict_paths_for_bb.  Register 300 in the following
> example is spilled to memory in SImode but reloaded as DImode.
>
> (insn 247 212 389 3 (set (reg:SI 300)
>          (ne:SI (subreg/s/u:SI (reg/v:DI 231 [ taken ]) 0)
>              (const_int 0 [0]))) "/home/mfortune/gcc/gcc/predict.c":2904 504 {*sne_zero_sisi}
>       (nil))
> ...
>
> (insn 250 256 251 40 (set (reg:DI 6 $6)
>          (subreg:DI (reg:SI 300) 0)) "/home/mfortune/gcc/gcc/predict.c":2904 310 {*movdi_64bit}
>       (nil))

> gcc/
> 	PR target/78660
> 	* lra-constraints.c (simplify_operand_subreg): Handle
> 	WORD_REGISTER_OPERATIONS targets.
>
This patch is OK too.  Thanks again.
Eric Botcazou Feb. 8, 2017, 3:03 p.m. UTC | #2
> This patch is a minimal change to prevent (subreg(mem)) from being
> simplified to use the outer mode for WORD_REGISTER_OPERATIONS.  There
> is high probability of refining and/or re-implementing this for GCC 8
> but such a change would be too invasive.  This change at least ensures
> correctness but may prevent simplification of some acceptable cases.

This one causes:

+FAIL: gcc.dg/torture/builtin-complex-1.c   -O3 -fomit-frame-pointer -funroll-
loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
+WARNING: gcc.dg/torture/builtin-complex-1.c   -O3 -fomit-frame-pointer -
funroll-loops -fpeel-loops -ftracer -finline-functions  compilation failed to 
produce executable
+FAIL: gcc.dg/torture/builtin-complex-1.c   -O3 -g  (test for excess errors)
+WARNING: gcc.dg/torture/builtin-complex-1.c   -O3 -g  compilation failed to 
produce executable
+WARNING: program timed out.
+WARNING: program timed out.

on SPARC 32-bit, i.e. LRA hangs.  Reduced testcase attached, compile at -O3 
with a cc1 configured for sparc-sun-solaris2.10.

> gcc/
> 	PR target/78660
> 	* lra-constraints.c (simplify_operand_subreg): Handle
> 	WORD_REGISTER_OPERATIONS targets.
> ---
>  gcc/lra-constraints.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> index 66ff2bb..484a70d 100644
> --- a/gcc/lra-constraints.c
> +++ b/gcc/lra-constraints.c
> @@ -1541,11 +1541,18 @@ simplify_operand_subreg (int nop, machine_mode
> reg_mode) subregs as we don't substitute such equiv memory (see processing
> equivalences in function lra_constraints) and because for spilled pseudos
> we allocate stack memory enough for the biggest
> -	     corresponding paradoxical subreg.  */
> -	  if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> -		&& SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> -	      || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> -		  && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg))))
> +	     corresponding paradoxical subreg.
> +
> +	     However, never simplify a (subreg (mem ...)) for
> +	     WORD_REGISTER_OPERATIONS targets as this may lead to loading junk
> +	     data into a register when the inner is narrower than outer or
> +	     missing important data from memory when the inner is wider than
> +	     outer.  */
> +	  if (!WORD_REGISTER_OPERATIONS
> +	      && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> +		    && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> +		  || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> +		      && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN 
(reg)))))
>  	    return true;
> 
>  	  *curr_id->operand_loc[nop] = operand;

I think that we might need:

  if (!(GET_MODE_PRECISION (mode) > GET_MODE_PRECISION (innermode)
	&& WORD_REGISTER_OPERATIONS)
      && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
	    && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
	  || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
	      && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)))))
    return true;

i.e. we force the reloading only for paradoxical subregs.
Matthew Fortune Feb. 14, 2017, 6:53 p.m. UTC | #3
Sorry for the slow reply, been away for a few days....

Eric Botcazou <ebotcazou@adacore.com> writes:
> > This patch is a minimal change to prevent (subreg(mem)) from being
> > simplified to use the outer mode for WORD_REGISTER_OPERATIONS.  There
> > is high probability of refining and/or re-implementing this for GCC 8
> > but such a change would be too invasive.  This change at least ensures
> > correctness but may prevent simplification of some acceptable cases.
> 
> This one causes:
> 
> +FAIL: gcc.dg/torture/builtin-complex-1.c   -O3 -fomit-frame-pointer -
> funroll-
> loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> +WARNING: gcc.dg/torture/builtin-complex-1.c   -O3 -fomit-frame-pointer
> -
> funroll-loops -fpeel-loops -ftracer -finline-functions  compilation
> failed to produce executable
> +FAIL: gcc.dg/torture/builtin-complex-1.c   -O3 -g  (test for excess
> errors)
> +WARNING: gcc.dg/torture/builtin-complex-1.c   -O3 -g  compilation
> failed to
> produce executable
> +WARNING: program timed out.
> +WARNING: program timed out.
> 
> on SPARC 32-bit, i.e. LRA hangs.  Reduced testcase attached, compile at
> -O3 with a cc1 configured for sparc-sun-solaris2.10.
> 
> > gcc/
> > 	PR target/78660
> > 	* lra-constraints.c (simplify_operand_subreg): Handle
> > 	WORD_REGISTER_OPERATIONS targets.
> > ---
> >  gcc/lra-constraints.c | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index
> > 66ff2bb..484a70d 100644
> > --- a/gcc/lra-constraints.c
> > +++ b/gcc/lra-constraints.c
> > @@ -1541,11 +1541,18 @@ simplify_operand_subreg (int nop, machine_mode
> > reg_mode) subregs as we don't substitute such equiv memory (see
> > processing equivalences in function lra_constraints) and because for
> > spilled pseudos we allocate stack memory enough for the biggest
> > -	     corresponding paradoxical subreg.  */
> > -	  if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> > -		&& SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> > -	      || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> > -		  && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg))))
> > +	     corresponding paradoxical subreg.
> > +
> > +	     However, never simplify a (subreg (mem ...)) for
> > +	     WORD_REGISTER_OPERATIONS targets as this may lead to loading
> junk
> > +	     data into a register when the inner is narrower than outer or
> > +	     missing important data from memory when the inner is wider
> than
> > +	     outer.  */
> > +	  if (!WORD_REGISTER_OPERATIONS
> > +	      && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> > +		    && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> > +		  || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> > +		      && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN
> (reg)))))
> >  	    return true;
> >
> >  	  *curr_id->operand_loc[nop] = operand;
> 
> I think that we might need:
> 
>   if (!(GET_MODE_PRECISION (mode) > GET_MODE_PRECISION (innermode)
> 	&& WORD_REGISTER_OPERATIONS)
>       && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> 	    && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> 	  || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> 	      && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)))))
>     return true;
> 
> i.e. we force the reloading only for paradoxical subregs.

Maybe, though I'm not entirely convinced.  For WORD_REGISTER_OPERATIONS
both paradoxical and normal SUBREGs can be a problem as the inner mode
in both cases can be used elsewhere for a reload making the content
of the spill slot wrong either in the subreg reload or the ordinary
reload elsewhere. However, the condition can be tightened; I should
not have made it so simplistic I guess. I.e. both modes must be no
wider than a word and must be different precision to force an inner
reload.

Adding that check would fix this case for SPARC and should be fine
for MIPS but I need to wait for a bootstrap build to be sure.

I don't really understand why LRA can't reload this for SPARC though
as I'm not sure there is any guarantee provided to backends that some
SUBREGs will be reloaded using their outer mode.  If there is such a
guarantee then it would be much easier to reason about this logic but
as it stands I suspect we are having to tweak LRA to cope with
assumptions made in various targets that happen to have held true (and
I have no doubt that MIPS has some of these as well especially in terms
of the FP/GP register usage with float and int modes.)  All being well
we can capture such assumptions and formalise them so we ensure they
hold true (or modify backends appropriately I guess).

The condition would look like this, What do you think?

          if (!(GET_MODE_PRECISION (mode) != GET_MODE_PRECISION (innermode)
                && GET_MODE_SIZE (mode) <= UNITS_PER_WORD
                && GET_MODE_SIZE (innermode) <= UNITS_PER_WORD
                && WORD_REGISTER_OPERATIONS)
              && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
                    && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
                  || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
                      && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)))))
            return true;

Thanks,
Matthew
Matthew Fortune Feb. 20, 2017, 10:02 a.m. UTC | #4
Hi Eric,

Any thoughts on this?

Thanks,
Matthew

> Sorry for the slow reply, been away for a few days....
> 
> Eric Botcazou <ebotcazou@adacore.com> writes:
> > > This patch is a minimal change to prevent (subreg(mem)) from being
> > > simplified to use the outer mode for WORD_REGISTER_OPERATIONS.
> > > There is high probability of refining and/or re-implementing this
> > > for GCC 8 but such a change would be too invasive.  This change at
> > > least ensures correctness but may prevent simplification of some
> acceptable cases.
> >
> > This one causes:
> >
> > +FAIL: gcc.dg/torture/builtin-complex-1.c   -O3 -fomit-frame-pointer -
> > funroll-
> > loops -fpeel-loops -ftracer -finline-functions  (test for excess
> > errors)
> > +WARNING: gcc.dg/torture/builtin-complex-1.c   -O3 -fomit-frame-
> pointer
> > -
> > funroll-loops -fpeel-loops -ftracer -finline-functions  compilation
> > failed to produce executable
> > +FAIL: gcc.dg/torture/builtin-complex-1.c   -O3 -g  (test for excess
> > errors)
> > +WARNING: gcc.dg/torture/builtin-complex-1.c   -O3 -g  compilation
> > failed to
> > produce executable
> > +WARNING: program timed out.
> > +WARNING: program timed out.
> >
> > on SPARC 32-bit, i.e. LRA hangs.  Reduced testcase attached, compile
> > at
> > -O3 with a cc1 configured for sparc-sun-solaris2.10.
> >
> > > gcc/
> > > 	PR target/78660
> > > 	* lra-constraints.c (simplify_operand_subreg): Handle
> > > 	WORD_REGISTER_OPERATIONS targets.
> > > ---
> > >  gcc/lra-constraints.c | 17 ++++++++++++-----
> > >  1 file changed, 12 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index
> > > 66ff2bb..484a70d 100644
> > > --- a/gcc/lra-constraints.c
> > > +++ b/gcc/lra-constraints.c
> > > @@ -1541,11 +1541,18 @@ simplify_operand_subreg (int nop,
> > > machine_mode
> > > reg_mode) subregs as we don't substitute such equiv memory (see
> > > processing equivalences in function lra_constraints) and because for
> > > spilled pseudos we allocate stack memory enough for the biggest
> > > -	     corresponding paradoxical subreg.  */
> > > -	  if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> > > -		&& SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> > > -	      || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> > > -		  && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg))))
> > > +	     corresponding paradoxical subreg.
> > > +
> > > +	     However, never simplify a (subreg (mem ...)) for
> > > +	     WORD_REGISTER_OPERATIONS targets as this may lead to loading
> > junk
> > > +	     data into a register when the inner is narrower than outer or
> > > +	     missing important data from memory when the inner is wider
> > than
> > > +	     outer.  */
> > > +	  if (!WORD_REGISTER_OPERATIONS
> > > +	      && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> > > +		    && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> > > +		  || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> > > +		      && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN
> > (reg)))))
> > >  	    return true;
> > >
> > >  	  *curr_id->operand_loc[nop] = operand;
> >
> > I think that we might need:
> >
> >   if (!(GET_MODE_PRECISION (mode) > GET_MODE_PRECISION (innermode)
> > 	&& WORD_REGISTER_OPERATIONS)
> >       && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> > 	    && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> > 	  || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> > 	      && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)))))
> >     return true;
> >
> > i.e. we force the reloading only for paradoxical subregs.
> 
> Maybe, though I'm not entirely convinced.  For WORD_REGISTER_OPERATIONS
> both paradoxical and normal SUBREGs can be a problem as the inner mode
> in both cases can be used elsewhere for a reload making the content of
> the spill slot wrong either in the subreg reload or the ordinary reload
> elsewhere. However, the condition can be tightened; I should not have
> made it so simplistic I guess. I.e. both modes must be no wider than a
> word and must be different precision to force an inner reload.
> 
> Adding that check would fix this case for SPARC and should be fine for
> MIPS but I need to wait for a bootstrap build to be sure.
> 
> I don't really understand why LRA can't reload this for SPARC though as
> I'm not sure there is any guarantee provided to backends that some
> SUBREGs will be reloaded using their outer mode.  If there is such a
> guarantee then it would be much easier to reason about this logic but as
> it stands I suspect we are having to tweak LRA to cope with assumptions
> made in various targets that happen to have held true (and I have no
> doubt that MIPS has some of these as well especially in terms of the
> FP/GP register usage with float and int modes.)  All being well we can
> capture such assumptions and formalise them so we ensure they hold true
> (or modify backends appropriately I guess).
> 
> The condition would look like this, What do you think?
> 
>           if (!(GET_MODE_PRECISION (mode) != GET_MODE_PRECISION
> (innermode)
>                 && GET_MODE_SIZE (mode) <= UNITS_PER_WORD
>                 && GET_MODE_SIZE (innermode) <= UNITS_PER_WORD
>                 && WORD_REGISTER_OPERATIONS)
>               && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
>                     && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
>                   || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
>                       && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN
> (reg)))))
>             return true;
> 
> Thanks,
> Matthew
Eric Botcazou Feb. 21, 2017, 8:23 a.m. UTC | #5
> The condition would look like this, What do you think?
> 
>           if (!(GET_MODE_PRECISION (mode) != GET_MODE_PRECISION (innermode)
>                 && GET_MODE_SIZE (mode) <= UNITS_PER_WORD
>                 && GET_MODE_SIZE (innermode) <= UNITS_PER_WORD
>                 && WORD_REGISTER_OPERATIONS)
>               && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
>                     && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> 
>                   || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> 
>                       && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN
> (reg))))) return true;

OK, I'm going to give it a try on SPARC.
Eric Botcazou Feb. 21, 2017, 11:49 a.m. UTC | #6
> The condition would look like this, What do you think?
> 
>           if (!(GET_MODE_PRECISION (mode) != GET_MODE_PRECISION (innermode)
>                 && GET_MODE_SIZE (mode) <= UNITS_PER_WORD
>                 && GET_MODE_SIZE (innermode) <= UNITS_PER_WORD
>                 && WORD_REGISTER_OPERATIONS)
>               && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
>                     && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> 
>                   || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> 
>                       && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN
> (reg))))) return true;

No regressions on SPARC.
diff mbox

Patch

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 66ff2bb..484a70d 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1541,11 +1541,18 @@  simplify_operand_subreg (int nop, machine_mode reg_mode)
 	     subregs as we don't substitute such equiv memory (see processing
 	     equivalences in function lra_constraints) and because for spilled
 	     pseudos we allocate stack memory enough for the biggest
-	     corresponding paradoxical subreg.  */
-	  if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
-		&& SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
-	      || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
-		  && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg))))
+	     corresponding paradoxical subreg.
+
+	     However, never simplify a (subreg (mem ...)) for
+	     WORD_REGISTER_OPERATIONS targets as this may lead to loading junk
+	     data into a register when the inner is narrower than outer or
+	     missing important data from memory when the inner is wider than
+	     outer.  */
+	  if (!WORD_REGISTER_OPERATIONS
+	      && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
+		    && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
+		  || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
+		      && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)))))
 	    return true;
 
 	  *curr_id->operand_loc[nop] = operand;