diff mbox series

combine: Do not combine moves from hard registers

Message ID 68abf72a5400b96b9a100966331d3ad2056648e7.1540237620.git.segher@kernel.crashing.org
State New
Headers show
Series combine: Do not combine moves from hard registers | expand

Commit Message

Segher Boessenkool Oct. 22, 2018, 8:17 p.m. UTC
On most targets every function starts with moves from the parameter
passing (hard) registers into pseudos.  Similarly, after every call
there is a move from the return register into a pseudo.  These moves
usually combine with later instructions (leaving pretty much the same
instruction, just with a hard reg instead of a pseudo).

This isn't a good idea.  Register allocation can get rid of unnecessary
moves just fine, and moving the parameter passing registers into many
later instructions tends to prevent good register allocation.  This
patch disallows combining moves from a hard (non-fixed) register.

This also avoid the problem mentioned in PR87600 #c3 (combining hard
registers into inline assembler is problematic).

Because the register move can often be combined with other instructions
*itself*, for example for setting some condition code, this patch adds
extra copies via new pseudos after every copy-from-hard-reg.

On some targets this reduces average code size.  On others it increases
it a bit, 0.1% or 0.2% or so.  (I tested this on all *-linux targets).

I'll commit this to trunk now.  If there are problems, please don't
hesitate to let me know!  Thanks.


Segher


2018-10-22  Segher Boessenkool  <segher@kernel.crashing.org>

	PR rtl-optimization/87600
	* combine.c: Add include of expr.h.
	(cant_combine_insn_p): Do not combine moves from any hard non-fixed
	register to a pseudo.
	(make_more_copies): New function, add a copy to a new pseudo after
	the moves from hard registers into pseudos.
	(rest_of_handle_combine): Declare rebuild_jump_labels_after_combine
	later.  Call make_more_copies.

---
 gcc/combine.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 46 insertions(+), 4 deletions(-)

Comments

Jeff Law Oct. 22, 2018, 8:52 p.m. UTC | #1
On 10/22/18 2:17 PM, Segher Boessenkool wrote:
> On most targets every function starts with moves from the parameter
> passing (hard) registers into pseudos.  Similarly, after every call
> there is a move from the return register into a pseudo.  These moves
> usually combine with later instructions (leaving pretty much the same
> instruction, just with a hard reg instead of a pseudo).
> 
> This isn't a good idea.  Register allocation can get rid of unnecessary
> moves just fine, and moving the parameter passing registers into many
> later instructions tends to prevent good register allocation.  This
> patch disallows combining moves from a hard (non-fixed) register.
> 
> This also avoid the problem mentioned in PR87600 #c3 (combining hard
> registers into inline assembler is problematic).
> 
> Because the register move can often be combined with other instructions
> *itself*, for example for setting some condition code, this patch adds
> extra copies via new pseudos after every copy-from-hard-reg.
> 
> On some targets this reduces average code size.  On others it increases
> it a bit, 0.1% or 0.2% or so.  (I tested this on all *-linux targets).
> 
> I'll commit this to trunk now.  If there are problems, please don't
> hesitate to let me know!  Thanks.
> 
> 
> Segher
> 
> 
> 2018-10-22  Segher Boessenkool  <segher@kernel.crashing.org>
> 
> 	PR rtl-optimization/87600
> 	* combine.c: Add include of expr.h.
> 	(cant_combine_insn_p): Do not combine moves from any hard non-fixed
> 	register to a pseudo.
> 	(make_more_copies): New function, add a copy to a new pseudo after
> 	the moves from hard registers into pseudos.
> 	(rest_of_handle_combine): Declare rebuild_jump_labels_after_combine
> 	later.  Call make_more_copies.
I know we've gone back and forth on this stuff through the years,
particularly for targets with likely-to-spilled classes that are used
for register passing/return values.

I'm certainly willing to go with this as general guidance.  I wouldn't
be surprised if we find that things like CSE, fwprop and other passes
need twiddling over time to mimick what you're doing in combine.

jeff
Segher Boessenkool Oct. 22, 2018, 9:06 p.m. UTC | #2
Hi Jeff,

On Mon, Oct 22, 2018 at 02:52:12PM -0600, Jeff Law wrote:
> I know we've gone back and forth on this stuff through the years,
> particularly for targets with likely-to-spilled classes that are used
> for register passing/return values.

Right, there already was code to not copy to a pseudo for such registers,
and this patch extends it to all hard (but not fixed) registers.  That
reduces code quality quite a bit though, because the move itself can often
be combine.  For that, this now adds an extra copy to a fresh pseudo, and
that fixes this.  (cc:ing Alan who did the previous attempt I remember).

> I'm certainly willing to go with this as general guidance.  I wouldn't
> be surprised if we find that things like CSE, fwprop and other passes
> need twiddling over time to mimick what you're doing in combine.

Yes that could well be.  I think fwprop is fine, but there could well
be other passes.  The asm problem in PR87600 no longer triggers, but it
could be those passes stay away from asm, or something like that.


Segher
Christophe Lyon Oct. 23, 2018, 10:14 a.m. UTC | #3
On Mon, 22 Oct 2018 at 22:17, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On most targets every function starts with moves from the parameter
> passing (hard) registers into pseudos.  Similarly, after every call
> there is a move from the return register into a pseudo.  These moves
> usually combine with later instructions (leaving pretty much the same
> instruction, just with a hard reg instead of a pseudo).
>
> This isn't a good idea.  Register allocation can get rid of unnecessary
> moves just fine, and moving the parameter passing registers into many
> later instructions tends to prevent good register allocation.  This
> patch disallows combining moves from a hard (non-fixed) register.
>
> This also avoid the problem mentioned in PR87600 #c3 (combining hard
> registers into inline assembler is problematic).
>
> Because the register move can often be combined with other instructions
> *itself*, for example for setting some condition code, this patch adds
> extra copies via new pseudos after every copy-from-hard-reg.
>
> On some targets this reduces average code size.  On others it increases
> it a bit, 0.1% or 0.2% or so.  (I tested this on all *-linux targets).
>
> I'll commit this to trunk now.  If there are problems, please don't
> hesitate to let me know!  Thanks.
>

Hi,

I have noticed many regressions on arm and aarch64 between 265366 and
265408 (this commit is 265398).

I bisected at least one to this commit on aarch64:
FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split
live-range of register"
The same test also regresses on arm.

For a whole picture of all the regressions I noticed during these two
commits, have a look at:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/report-build-info.html

Christophe



>
> Segher
>
>
> 2018-10-22  Segher Boessenkool  <segher@kernel.crashing.org>
>
>         PR rtl-optimization/87600
>         * combine.c: Add include of expr.h.
>         (cant_combine_insn_p): Do not combine moves from any hard non-fixed
>         register to a pseudo.
>         (make_more_copies): New function, add a copy to a new pseudo after
>         the moves from hard registers into pseudos.
>         (rest_of_handle_combine): Declare rebuild_jump_labels_after_combine
>         later.  Call make_more_copies.
>
> ---
>  gcc/combine.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 46 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 256b5a4..3ff1760 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "explow.h"
>  #include "insn-attr.h"
>  #include "rtlhooks-def.h"
> +#include "expr.h"
>  #include "params.h"
>  #include "tree-pass.h"
>  #include "valtrack.h"
> @@ -2348,8 +2349,7 @@ cant_combine_insn_p (rtx_insn *insn)
>      dest = SUBREG_REG (dest);
>    if (REG_P (src) && REG_P (dest)
>        && ((HARD_REGISTER_P (src)
> -          && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src))
> -          && targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO (src))))
> +          && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
>           || (HARD_REGISTER_P (dest)
>               && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest))
>               && targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO (dest))))))
> @@ -14969,11 +14969,53 @@ dump_combine_total_stats (FILE *file)
>       total_attempts, total_merges, total_extras, total_successes);
>  }
>
> +/* Make pseudo-to-pseudo copies after every hard-reg-to-pseudo-copy, because
> +   the reg-to-reg copy can usefully combine with later instructions, but we
> +   do not want to combine the hard reg into later instructions, for that
> +   restricts register allocation.  */
> +static void
> +make_more_copies (void)
> +{
> +  basic_block bb;
> +
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      rtx_insn *insn;
> +
> +      FOR_BB_INSNS (bb, insn)
> +        {
> +          if (!NONDEBUG_INSN_P (insn))
> +            continue;
> +
> +         rtx set = single_set (insn);
> +         if (!set)
> +           continue;
> +         rtx src = SET_SRC (set);
> +         rtx dest = SET_DEST (set);
> +         if (GET_CODE (src) == SUBREG)
> +           src = SUBREG_REG (src);
> +         if (!(REG_P (src) && HARD_REGISTER_P (src)))
> +           continue;
> +         if (TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
> +           continue;
> +
> +         rtx new_reg = gen_reg_rtx (GET_MODE (dest));
> +         rtx_insn *insn1 = gen_move_insn (new_reg, src);
> +         rtx_insn *insn2 = gen_move_insn (dest, new_reg);
> +         emit_insn_after (insn1, insn);
> +         emit_insn_after (insn2, insn1);
> +         delete_insn (insn);
> +
> +         insn = insn2;
> +       }
> +    }
> +}
> +
>  /* Try combining insns through substitution.  */
>  static unsigned int
>  rest_of_handle_combine (void)
>  {
> -  int rebuild_jump_labels_after_combine;
> +  make_more_copies ();
>
>    df_set_flags (DF_LR_RUN_DCE + DF_DEFER_INSN_RESCAN);
>    df_note_add_problem ();
> @@ -14982,7 +15024,7 @@ rest_of_handle_combine (void)
>    regstat_init_n_sets_and_refs ();
>    reg_n_sets_max = max_reg_num ();
>
> -  rebuild_jump_labels_after_combine
> +  int rebuild_jump_labels_after_combine
>      = combine_instructions (get_insns (), max_reg_num ());
>
>    /* Combining insns may have turned an indirect jump into a
> --
> 1.8.3.1
>
Christophe Lyon Oct. 23, 2018, 12:02 p.m. UTC | #4
On Tue, 23 Oct 2018 at 12:14, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
>
> On Mon, 22 Oct 2018 at 22:17, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > On most targets every function starts with moves from the parameter
> > passing (hard) registers into pseudos.  Similarly, after every call
> > there is a move from the return register into a pseudo.  These moves
> > usually combine with later instructions (leaving pretty much the same
> > instruction, just with a hard reg instead of a pseudo).
> >
> > This isn't a good idea.  Register allocation can get rid of unnecessary
> > moves just fine, and moving the parameter passing registers into many
> > later instructions tends to prevent good register allocation.  This
> > patch disallows combining moves from a hard (non-fixed) register.
> >
> > This also avoid the problem mentioned in PR87600 #c3 (combining hard
> > registers into inline assembler is problematic).
> >
> > Because the register move can often be combined with other instructions
> > *itself*, for example for setting some condition code, this patch adds
> > extra copies via new pseudos after every copy-from-hard-reg.
> >
> > On some targets this reduces average code size.  On others it increases
> > it a bit, 0.1% or 0.2% or so.  (I tested this on all *-linux targets).
> >
> > I'll commit this to trunk now.  If there are problems, please don't
> > hesitate to let me know!  Thanks.
> >
>
> Hi,
>
> I have noticed many regressions on arm and aarch64 between 265366 and
> 265408 (this commit is 265398).
>
> I bisected at least one to this commit on aarch64:
> FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split
> live-range of register"
> The same test also regresses on arm.
>
> For a whole picture of all the regressions I noticed during these two
> commits, have a look at:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/report-build-info.html
>
> Christophe
>

I also bisected regressions on arm:
gcc.c-torture/execute/920428-2.c
gfortran.dg/actual_array_substr_2.f90
both point to this commit too.



>
>
> >
> > Segher
> >
> >
> > 2018-10-22  Segher Boessenkool  <segher@kernel.crashing.org>
> >
> >         PR rtl-optimization/87600
> >         * combine.c: Add include of expr.h.
> >         (cant_combine_insn_p): Do not combine moves from any hard non-fixed
> >         register to a pseudo.
> >         (make_more_copies): New function, add a copy to a new pseudo after
> >         the moves from hard registers into pseudos.
> >         (rest_of_handle_combine): Declare rebuild_jump_labels_after_combine
> >         later.  Call make_more_copies.
> >
> > ---
> >  gcc/combine.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 46 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/combine.c b/gcc/combine.c
> > index 256b5a4..3ff1760 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "explow.h"
> >  #include "insn-attr.h"
> >  #include "rtlhooks-def.h"
> > +#include "expr.h"
> >  #include "params.h"
> >  #include "tree-pass.h"
> >  #include "valtrack.h"
> > @@ -2348,8 +2349,7 @@ cant_combine_insn_p (rtx_insn *insn)
> >      dest = SUBREG_REG (dest);
> >    if (REG_P (src) && REG_P (dest)
> >        && ((HARD_REGISTER_P (src)
> > -          && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src))
> > -          && targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO (src))))
> > +          && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
> >           || (HARD_REGISTER_P (dest)
> >               && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest))
> >               && targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO (dest))))))
> > @@ -14969,11 +14969,53 @@ dump_combine_total_stats (FILE *file)
> >       total_attempts, total_merges, total_extras, total_successes);
> >  }
> >
> > +/* Make pseudo-to-pseudo copies after every hard-reg-to-pseudo-copy, because
> > +   the reg-to-reg copy can usefully combine with later instructions, but we
> > +   do not want to combine the hard reg into later instructions, for that
> > +   restricts register allocation.  */
> > +static void
> > +make_more_copies (void)
> > +{
> > +  basic_block bb;
> > +
> > +  FOR_EACH_BB_FN (bb, cfun)
> > +    {
> > +      rtx_insn *insn;
> > +
> > +      FOR_BB_INSNS (bb, insn)
> > +        {
> > +          if (!NONDEBUG_INSN_P (insn))
> > +            continue;
> > +
> > +         rtx set = single_set (insn);
> > +         if (!set)
> > +           continue;
> > +         rtx src = SET_SRC (set);
> > +         rtx dest = SET_DEST (set);
> > +         if (GET_CODE (src) == SUBREG)
> > +           src = SUBREG_REG (src);
> > +         if (!(REG_P (src) && HARD_REGISTER_P (src)))
> > +           continue;
> > +         if (TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
> > +           continue;
> > +
> > +         rtx new_reg = gen_reg_rtx (GET_MODE (dest));
> > +         rtx_insn *insn1 = gen_move_insn (new_reg, src);
> > +         rtx_insn *insn2 = gen_move_insn (dest, new_reg);
> > +         emit_insn_after (insn1, insn);
> > +         emit_insn_after (insn2, insn1);
> > +         delete_insn (insn);
> > +
> > +         insn = insn2;
> > +       }
> > +    }
> > +}
> > +
> >  /* Try combining insns through substitution.  */
> >  static unsigned int
> >  rest_of_handle_combine (void)
> >  {
> > -  int rebuild_jump_labels_after_combine;
> > +  make_more_copies ();
> >
> >    df_set_flags (DF_LR_RUN_DCE + DF_DEFER_INSN_RESCAN);
> >    df_note_add_problem ();
> > @@ -14982,7 +15024,7 @@ rest_of_handle_combine (void)
> >    regstat_init_n_sets_and_refs ();
> >    reg_n_sets_max = max_reg_num ();
> >
> > -  rebuild_jump_labels_after_combine
> > +  int rebuild_jump_labels_after_combine
> >      = combine_instructions (get_insns (), max_reg_num ());
> >
> >    /* Combining insns may have turned an indirect jump into a
> > --
> > 1.8.3.1
> >
Segher Boessenkool Oct. 23, 2018, 12:28 p.m. UTC | #5
On Tue, Oct 23, 2018 at 12:14:27PM +0200, Christophe Lyon wrote:
> I have noticed many regressions on arm and aarch64 between 265366 and
> 265408 (this commit is 265398).
> 
> I bisected at least one to this commit on aarch64:
> FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split
> live-range of register"
> The same test also regresses on arm.

Many targets also fail gcc.dg/ira-shrinkwrap-prep-2.c; these tests fail
when random things in the RTL change, apparently.

> For a whole picture of all the regressions I noticed during these two
> commits, have a look at:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/report-build-info.html

No thanks.  I am not going to click on 111 links and whatever is behind
those.  Please summarise, like, what was the diff in test_summary, and
then dig down into individual tests if you want.  Or whatever else works
both for you and for me.  This doesn't work for me.


Segher
Segher Boessenkool Oct. 23, 2018, 12:34 p.m. UTC | #6
On Tue, Oct 23, 2018 at 02:02:35PM +0200, Christophe Lyon wrote:
> I also bisected regressions on arm:
> gcc.c-torture/execute/920428-2.c
> gfortran.dg/actual_array_substr_2.f90
> both point to this commit too.

And what are the errors for those?


Segher
Christophe Lyon Oct. 23, 2018, 12:45 p.m. UTC | #7
On Tue, 23 Oct 2018 at 14:34, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Tue, Oct 23, 2018 at 02:02:35PM +0200, Christophe Lyon wrote:
> > I also bisected regressions on arm:
> > gcc.c-torture/execute/920428-2.c
> > gfortran.dg/actual_array_substr_2.f90
> > both point to this commit too.
>
> And what are the errors for those?
>
Both are execution failures.

For the fortran test:
*** Error in `./actual_array_substr_2.exe': munmap_chunk(): invalid
pointer: 0x00023057 ***

For the C test (920428-2.c), the .log file isn't very helpful:
qemu: uncaught target signal 11 (Segmentation fault) - core dumped



>
> Segher
Christophe Lyon Oct. 23, 2018, 1:25 p.m. UTC | #8
On Tue, 23 Oct 2018 at 14:29, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Tue, Oct 23, 2018 at 12:14:27PM +0200, Christophe Lyon wrote:
> > I have noticed many regressions on arm and aarch64 between 265366 and
> > 265408 (this commit is 265398).
> >
> > I bisected at least one to this commit on aarch64:
> > FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split
> > live-range of register"
> > The same test also regresses on arm.
>
> Many targets also fail gcc.dg/ira-shrinkwrap-prep-2.c; these tests fail
> when random things in the RTL change, apparently.
>
> > For a whole picture of all the regressions I noticed during these two
> > commits, have a look at:
> > http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/report-build-info.html
>
> No thanks.  I am not going to click on 111 links and whatever is behind
> those.  Please summarise, like, what was the diff in test_summary, and
> then dig down into individual tests if you want.  Or whatever else works
> both for you and for me.  This doesn't work for me.
>

OK.... this is not very practical for me either. There were 25 commits between
the two validations being compared,
25-28 gcc tests regressed on aarch64, depending on the exact target
177-206 gcc tests regressed on arm*, 7-29 gfortran regressions on arm*
so I could have to run many bisects to make sure every regression is
caused by the same commit.

Since these are all automated builds with everything discarded after
computing the regressions, it's quite time consuming to re-run the
tests manually on my side (probably at least as much as it is for you).

In this case, the most efficient way would be for me to extract your patch
and have my validation system validate it against the preceding commit,
that would give the regressions caused by your patch only. I'm going
to do that, it should take 3-5h to run.

I know this doesn't answer your question, but I thought you could run aarch64
tests easily and that would be more efficient for the project that you
do it directly
without waiting for me to provide hardly little more information.

Maybe this will answer your question better:
List of aarch64-linux-gnu regressions:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/aarch64-none-linux-gnu/diff-gcc-rh60-aarch64-none-linux-gnu-default-default-default.txt
List of arm-none-linux-gnueabihf regressions:
(gcc) http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a9-neon-fp16.txt
(gfortran) http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/arm-none-linux-gnueabihf/diff-gfortran-rh60-arm-none-linux-gnueabihf-arm-cortex-a9-neon-fp16.txt
(all these are what you get when you click on one of the REGRESSED
links on the main page)

but as I said, these were caused by commits between 265366 and 265408,
so not all of them may be caused by your commit

Don't get me wrong, I'm not angry at you, I don't want to be offending,
I'm very humble.

To me it just highlights again that we need a validation system easier to
work with when we break something on a target we are not familiar with.
I run post-commit validations as finely grained as possible with the CPU
resources I have access to, that's not enough and I think having a
developer-accessible gerrit+jenkins-like system would be very valuable
to test patches before commit. We have a prototype in Linaro, not
production-ready. But I guess that would be worth another
discussion thread :)

Christophe

>
> Segher
Andreas Schwab Oct. 23, 2018, 3:16 p.m. UTC | #9
This miscompiles libffi and libgo on ia64.

The following libffi tests fail:

libffi.call/nested_struct.c -W -Wall -Wno-psabi -O2 -fomit-frame-pointer execution test
libffi.call/nested_struct.c -W -Wall -Wno-psabi -O2 execution test
libffi.call/nested_struct.c -W -Wall -Wno-psabi -Os execution test
libffi.call/nested_struct1.c -W -Wall -Wno-psabi -O2 -fomit-frame-pointer execution test
libffi.call/nested_struct1.c -W -Wall -Wno-psabi -O2 execution test
libffi.call/nested_struct1.c -W -Wall -Wno-psabi -Os execution test
libffi.call/stret_large.c -W -Wall -Wno-psabi -Os output pattern test
libffi.call/stret_large2.c -W -Wall -Wno-psabi -Os output pattern test
libffi.call/stret_medium2.c -W -Wall -Wno-psabi -Os output pattern test

And a lot of libgo tests fail.

Andreas.
Segher Boessenkool Oct. 23, 2018, 3:28 p.m. UTC | #10
On Tue, Oct 23, 2018 at 05:16:38PM +0200, Andreas Schwab wrote:
> This miscompiles libffi and libgo on ia64.

Ouch.  I cannot read ia64 machine code without a lot of handholding...
Any hints what is wrong?


Segher
Segher Boessenkool Oct. 23, 2018, 10:26 p.m. UTC | #11
Hi Christophe,

On Tue, Oct 23, 2018 at 03:25:55PM +0200, Christophe Lyon wrote:
> On Tue, 23 Oct 2018 at 14:29, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > On Tue, Oct 23, 2018 at 12:14:27PM +0200, Christophe Lyon wrote:
> > > I have noticed many regressions on arm and aarch64 between 265366 and
> > > 265408 (this commit is 265398).
> > >
> > > I bisected at least one to this commit on aarch64:
> > > FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split
> > > live-range of register"
> > > The same test also regresses on arm.
> >
> > Many targets also fail gcc.dg/ira-shrinkwrap-prep-2.c; these tests fail
> > when random things in the RTL change, apparently.

This is PR87708 now.

> > > For a whole picture of all the regressions I noticed during these two
> > > commits, have a look at:
> > > http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/report-build-info.html
> >
> > No thanks.  I am not going to click on 111 links and whatever is behind
> > those.  Please summarise, like, what was the diff in test_summary, and
> > then dig down into individual tests if you want.  Or whatever else works
> > both for you and for me.  This doesn't work for me.
> 
> OK.... this is not very practical for me either. There were 25 commits between
> the two validations being compared,
> 25-28 gcc tests regressed on aarch64, depending on the exact target
> 177-206 gcc tests regressed on arm*, 7-29 gfortran regressions on arm*
> so I could have to run many bisects to make sure every regression is
> caused by the same commit.

So many, ouch!  I didn't realise.

> Since these are all automated builds with everything discarded after
> computing the regressions, it's quite time consuming to re-run the
> tests manually on my side (probably at least as much as it is for you).

Running arm tests is very painful for me.  But you say this is on aarch64
as well, I didn't realise that either; aarch64 should be easy to test,
we have many reasonable aarch64 machines in the cfarm.

> I know this doesn't answer your question, but I thought you could run aarch64
> tests easily and that would be more efficient for the project that you
> do it directly
> without waiting for me to provide hardly little more information.

Well, I'm not too familiar with aarch64, so if you can say "this Z is a
pretty simple test that should do X but now does Y" that would be a huge
help :-)

> Maybe this will answer your question better:
> List of aarch64-linux-gnu regressions:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/aarch64-none-linux-gnu/diff-gcc-rh60-aarch64-none-linux-gnu-default-default-default.txt
> List of arm-none-linux-gnueabihf regressions:
> (gcc) http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a9-neon-fp16.txt
> (gfortran) http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/arm-none-linux-gnueabihf/diff-gfortran-rh60-arm-none-linux-gnueabihf-arm-cortex-a9-neon-fp16.txt

That may help yes, thanks!

> To me it just highlights again that we need a validation system easier to
> work with when we break something on a target we are not familiar with.

OTOH a patch like this is likely to break many target-specific tests, and
that should not prevent commiting it imnsho.  If it actively breaks things,
then of course it shouldn't go in as-is, or if it breaks bootstrap, etc.

> I run post-commit validations as finely grained as possible with the CPU
> resources I have access to, that's not enough and I think having a
> developer-accessible gerrit+jenkins-like system would be very valuable
> to test patches before commit. We have a prototype in Linaro, not
> production-ready. But I guess that would be worth another
> discussion thread :)

Yeah...  One when people have time for it ;-)


Segher
Paul Hua Oct. 24, 2018, 7:31 a.m. UTC | #12
Hi:
>
> I have noticed many regressions on arm and aarch64 between 265366 and
> 265408 (this commit is 265398).
>

There are many regressions on mips64el between 265378 and 265420.

r265378 testresults:
https://gcc.gnu.org/ml/gcc-testresults/2018-10/msg02935.html
r265420 testresults:
https://gcc.gnu.org/ml/gcc-testresults/2018-10/msg03065.html

 I bisected at least one to this commit on mips64el:
FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c compilation,  -O1
(internal compiler error)
FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c compilation,  -O2
(internal compiler error)
FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c compilation,  -O2
-flto -fno-use-linker-plugin -flto-partition=none  (internal compiler
error)
FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c compilation,  -O2
-flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler
error)
FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c compilation,  -O3
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions  (internal compiler error)
FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c compilation,  -O3 -g
 (internal compiler error)
FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c compilation,  -Og -g
 (internal compiler error)
FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c compilation,  -Os
(internal compiler error)

I filed a bug at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87720.

Paul Hua

> I bisected at least one to this commit on aarch64:
> FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split
> live-range of register"
> The same test also regresses on arm.
>
> For a whole picture of all the regressions I noticed during these two
> commits, have a look at:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/report-build-info.html
>
> Christophe
>
Christophe Lyon Oct. 24, 2018, 8:23 a.m. UTC | #13
On Wed, 24 Oct 2018 at 00:26, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi Christophe,
>
> On Tue, Oct 23, 2018 at 03:25:55PM +0200, Christophe Lyon wrote:
> > On Tue, 23 Oct 2018 at 14:29, Segher Boessenkool
> > <segher@kernel.crashing.org> wrote:
> > > On Tue, Oct 23, 2018 at 12:14:27PM +0200, Christophe Lyon wrote:
> > > > I have noticed many regressions on arm and aarch64 between 265366 and
> > > > 265408 (this commit is 265398).
> > > >
> > > > I bisected at least one to this commit on aarch64:
> > > > FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split
> > > > live-range of register"
> > > > The same test also regresses on arm.
> > >
> > > Many targets also fail gcc.dg/ira-shrinkwrap-prep-2.c; these tests fail
> > > when random things in the RTL change, apparently.
>
> This is PR87708 now.
>
> > > > For a whole picture of all the regressions I noticed during these two
> > > > commits, have a look at:
> > > > http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/report-build-info.html
> > >
> > > No thanks.  I am not going to click on 111 links and whatever is behind
> > > those.  Please summarise, like, what was the diff in test_summary, and
> > > then dig down into individual tests if you want.  Or whatever else works
> > > both for you and for me.  This doesn't work for me.
> >
> > OK.... this is not very practical for me either. There were 25 commits between
> > the two validations being compared,
> > 25-28 gcc tests regressed on aarch64, depending on the exact target
> > 177-206 gcc tests regressed on arm*, 7-29 gfortran regressions on arm*
> > so I could have to run many bisects to make sure every regression is
> > caused by the same commit.
>
> So many, ouch!  I didn't realise.
>
I've now got the results of validating your patch only, compared to the
previous revision, and it does cause all the regressions I noticed earlier.

> > Since these are all automated builds with everything discarded after
> > computing the regressions, it's quite time consuming to re-run the
> > tests manually on my side (probably at least as much as it is for you).
>
> Running arm tests is very painful for me.  But you say this is on aarch64
> as well, I didn't realise that either; aarch64 should be easy to test,
> we have many reasonable aarch64 machines in the cfarm.
>
> > I know this doesn't answer your question, but I thought you could run aarch64
> > tests easily and that would be more efficient for the project that you
> > do it directly
> > without waiting for me to provide hardly little more information.
>
> Well, I'm not too familiar with aarch64, so if you can say "this Z is a
> pretty simple test that should do X but now does Y" that would be a huge
> help :-)
>
> > Maybe this will answer your question better:
> > List of aarch64-linux-gnu regressions:
> > http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/aarch64-none-linux-gnu/diff-gcc-rh60-aarch64-none-linux-gnu-default-default-default.txt
> > List of arm-none-linux-gnueabihf regressions:
> > (gcc) http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a9-neon-fp16.txt
> > (gfortran) http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/arm-none-linux-gnueabihf/diff-gfortran-rh60-arm-none-linux-gnueabihf-arm-cortex-a9-neon-fp16.txt
>
> That may help yes, thanks!
>
> > To me it just highlights again that we need a validation system easier to
> > work with when we break something on a target we are not familiar with.
>
> OTOH a patch like this is likely to break many target-specific tests, and
> that should not prevent commiting it imnsho.  If it actively breaks things,
> then of course it shouldn't go in as-is, or if it breaks bootstrap, etc.
>
> > I run post-commit validations as finely grained as possible with the CPU
> > resources I have access to, that's not enough and I think having a
> > developer-accessible gerrit+jenkins-like system would be very valuable
> > to test patches before commit. We have a prototype in Linaro, not
> > production-ready. But I guess that would be worth another
> > discussion thread :)
>
> Yeah...  One when people have time for it ;-)
>
>
> Segher
Andreas Schwab Oct. 24, 2018, 10:38 a.m. UTC | #14
It's the test case that is miscompiled.

$ cat nested_struct.c
#include <stdio.h>

typedef struct cls_struct_16byte1 {
  double a;
  float b;
  int c;
} cls_struct_16byte1;

typedef struct cls_struct_16byte2 {
  int ii;
  double dd;
  float ff;
} cls_struct_16byte2;

typedef struct cls_struct_combined {
  cls_struct_16byte1 d;
  cls_struct_16byte2 e;
} cls_struct_combined;

void
cls_struct_combined_fn(struct cls_struct_16byte1 b0,
		       struct cls_struct_16byte2 b1,
		       struct cls_struct_combined b2)
{
  struct cls_struct_combined result;

  printf("%g %g %d %d %g %g %g %g %d %d %g %g\n",
	 b0.a, b0.b, b0.c,
	 b1.ii, b1.dd, b1.ff,
	 b2.d.a, b2.d.b, b2.d.c,
	 b2.e.ii, b2.e.dd, b2.e.ff);
}
$ cat main.c
typedef struct cls_struct_16byte1 {
  double a;
  float b;
  int c;
} cls_struct_16byte1;

typedef struct cls_struct_16byte2 {
  int ii;
  double dd;
  float ff;
} cls_struct_16byte2;

typedef struct cls_struct_combined {
  cls_struct_16byte1 d;
  cls_struct_16byte2 e;
} cls_struct_combined;

void cls_struct_combined_fn (struct cls_struct_16byte1,
			     struct cls_struct_16byte2,
			     struct cls_struct_combined);

int
main (void)
{
  struct cls_struct_16byte1 e_dbl = { 9.0, 2.0, 6};
  struct cls_struct_16byte2 f_dbl = { 1, 2.0, 3.0};
  struct cls_struct_combined g_dbl = {{4.0, 5.0, 6},
				      {3, 1.0, 8.0}};

  cls_struct_combined_fn (e_dbl, f_dbl, g_dbl);
}
$ bad/gcc/xgcc -Bbad/gcc/ -O2 -c nested_struct.c
$ good/gcc/xgcc -Bgood/gcc/ -O2 main.c nested_struct.o
$ ./a.out 
9 2 6 1 2 3 1.32677e-313 5 6 3 1 8

--- good/nested_struct.s
+++ bad/nested_struct.s
@@ -25,78 +25,69 @@ cls_struct_combined_fn:
 	movl r43 = @gprel(.LC0)
 	;;
 	.mmi
-	adds r16 = 120, r12
-	adds r17 = 72, r12
-	adds r18 = 96, r12
+	adds r14 = 120, r12
+	adds r16 = 72, r12
+	adds r15 = 96, r12
 	.mmi
-	adds r19 = 152, r12
-	adds r15 = 88, r12
 	adds r21 = 48, r12
-	;;
-	.mmi
-	mov r14 = r16
-	ldfs f9 = [r19]
-	adds r19 = 144, r12
-	.mmi
-	st8 [r17] = r33
-	st8 [r18] = r36
 	adds r20 = 40, r12
+	adds r18 = 32, r12
 	;;
-	.mmb
+	.mmi
 	st8 [r14] = r37, 8
-	ldfs f8 = [r18]
-	nop 0
-	.mii
-	adds r18 = 24, r12
+	st8 [r16] = r33
 	.save rp, r40
 	mov r40 = b0
 	.body
+	.mmi
+	adds r17 = 24, r12
+	adds r22 = 136, r12
 	add r43 = r1, r43
 	;;
+	.mmb
+	ldfs f8 = [r16]
+	adds r16 = 152, r12
+	nop 0
 	.mmi
-	ldfs f7 = [r17]
 	st8 [r14] = r38
+	st8 [r15] = r36
 	extr.u r38 = r38, 32, 32
-	.mmi
-	st8 [r15] = r35
-	ldfd f10 = [r19]
-	adds r17 = 64, r12
 	;;
 	.mmi
 	ldfs f6 = [r14]
-	st8 [r17] = r32
-	adds r14 = 136, r12
+	st8 [r22] = r39
+	nop 0
 	.mmi
-	adds r19 = 32, r12
-	ld8 r50 = [r16]
-	adds r17 = 16, r12
+	ldfs f9 = [r15]
+	ldfs f7 = [r16]
+	adds r16 = 144, r12
 	;;
 	.mmi
-	st8 [r14] = r39
-	nop 0
-	adds r14 = 76, r12
+	ldfd f10 = [r16]
+	adds r16 = 64, r12
+	adds r15 = 88, r12
 	.mmi
-	ld8 r48 = [r15]
-	st4 [r19] = r39
-	nop 0
+	ld8 r50 = [r14]
+	st4 [r18] = r39
+	adds r14 = 76, r12
 	;;
 	.mmi
-	nop 0
+	st8 [r16] = r32
 	ld4 r46 = [r14]
-	adds r14 = 64, r12
+	adds r16 = 16, r12
 	.mmi
-	nop 0
-	st4 [r18] = r38
+	adds r14 = 64, r12
+	st8 [r15] = r35
 	nop 0
 	;;
-	.mfi
+	.mmi
 	nop 0
-	fnorm.d f9 = f9
+	ld8 r48 = [r15]
 	nop 0
 	.mmi
+	st4 [r17] = r38
 	ld8 r44 = [r14]
 	nop 0
-	nop 0
 	;;
 	.mmf
 	nop 0
@@ -105,31 +96,30 @@ cls_struct_combined_fn:
 	;;
 	.mfi
 	nop 0
-	fnorm.d f7 = f7
+	fnorm.d f9 = f9
 	nop 0
-	.mmi
-	stfd [r20] = f10
+	.mfi
 	nop 0
+	fnorm.d f7 = f7
 	nop 0
 	;;
-	.mmf
-	nop 0
+	.mfi
 	nop 0
 	fnorm.d f6 = f6
-	;;
+	nop 0
 	.mmi
-	stfd [r21] = f9
+	stfd [r20] = f10
 	;;
-	getf.d r49 = f8
+	getf.d r45 = f8
 	nop 0
 	;;
 	.mmi
-	nop 0
-	getf.d r45 = f7
+	getf.d r49 = f9
+	stfd [r21] = f7
 	nop 0
 	;;
 	.mib
-	stfd [r17] = f6
+	stfd [r16] = f6
 	nop 0
 	br.call.sptk.many b0 = printf#
 	;;

Andreas.
Steve Ellcey Oct. 26, 2018, 4:39 p.m. UTC | #15
What is the status of this patch?  I see PR 87708, which is for the
regression to ira-shrinkwrap-prep-[12].c but what about all the
other regressions?  I see 27 of them on my aarch64 build and when
I looked at one of them (gcc.target/aarch64/cvtf_1.c) the code looks
worse than before, generating an extra instruction in each of the
routines.  Here is an example from one function where there is an
extra fmov that was not there before.  The test runs at -O1 but
the extra instruction appears at all optimization levels.  Should
I submit a new PR for this?

Steve Ellcey


void cvt_int32_t_to_float (int a, float b)
{ float c; c = (float) a;
  if ( (c - b) > 0.00001) abort();
}


Which used to generate:

cvt_int32_t_to_float:
.LFB0:
	.cfi_startproc
	scvtf	s1, w0
	fsub	s0, s1, s0
	fcvt	d0, s0
	adrp	x0, .LC0
	ldr	d1, [x0, #:lo12:.LC0]
	fcmpe	d0, d1
	bgt	.L9
	ret
.L9:
	stp	x29, x30, [sp, -16]!
	.cfi_def_cfa_offset 16
	.cfi_offset 29, -16
	.cfi_offset 30, -8
	mov	x29, sp
	bl	abort
	.cfi_endproc

Now generates:

cvt_int32_t_to_float:
.LFB0:
	.cfi_startproc
	fmov	s1, w0
	scvtf	s1, s1
	fsub	s1, s1, s0
	fcvt	d1, s1
	adrp	x0, .LC0
	ldr	d0, [x0, #:lo12:.LC0]
	fcmpe	d1, d0
	bgt	.L9
	ret
.L9:
	stp	x29, x30, [sp, -16]!
	.cfi_def_cfa_offset 16
	.cfi_offset 29, -16
	.cfi_offset 30, -8
	mov	x29, sp
	bl	abort
	.cfi_endproc
Segher Boessenkool Oct. 26, 2018, 4:48 p.m. UTC | #16
On Fri, Oct 26, 2018 at 04:39:06PM +0000, Steve Ellcey wrote:
> What is the status of this patch?  I see PR 87708, which is for the
> regression to ira-shrinkwrap-prep-[12].c but what about all the
> other regressions?  I see 27 of them on my aarch64 build and when
> I looked at one of them (gcc.target/aarch64/cvtf_1.c) the code looks
> worse than before, generating an extra instruction in each of the
> routines.  Here is an example from one function where there is an
> extra fmov that was not there before.  The test runs at -O1 but
> the extra instruction appears at all optimization levels.  Should
> I submit a new PR for this?

Yes, please open a PR.  It's hard to keep track of things without.

Status: I have figure out what I am doing wrong, I hope to have a patch
soon.  This will not fix all the register allocation problems of course.
It's a pity no PRs are opened for the code improvements either though ;-)


Segher
Renlin Li Nov. 2, 2018, 10:19 p.m. UTC | #17
Hi Segher,

I find a problem with your change to add make_more_copies.
I am investigating those regressions, a big amount of them are wrong code generation.

One problem is that, make_more_copies will split the assignment of fp to sfp.

From:
(insn 48 26 28 5 (set (reg/f:SI 102 sfp)
         (reg/f:SI 11 fp)) -1
To:
(insn 51 32 26 5 (set (reg:SI 117)
         (reg/f:SI 11 fp)) 646 {*arm_movsi_vfp}
      (expr_list:REG_EQUIV (reg/f:SI 11 fp)
         (nil)))
(insn 48 26 28 5 (set (reg/f:SI 102 sfp)
         (reg:SI 117)) 646 {*arm_movsi_vfp}
      (expr_list:REG_DEAD (reg:SI 117)
         (nil)))

The original rtx is generated by expand_builtin_setjmp_receiver to adjust the frame pointer.

And later in LRA, it will try to eliminate frame_pointer with hard frame pointer which is
defined the ELIMINABLE_REGS.

Your change split the insn into two.
This makes it doesn't match the "from" and "to" regs defined in ELIMINABLE_REGS.
The if statement to generate the adjustment insn is been skipt.
And the original instruction is just been deleted!



Probably, we don't want to split the move rtx if they are related to entries defined in ELIMINABLE_REGS?


Regards,
Renlin

On 10/24/2018 09:23 AM, Christophe Lyon wrote:
> On Wed, 24 Oct 2018 at 00:26, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
>>
>> Hi Christophe,
>>
>> On Tue, Oct 23, 2018 at 03:25:55PM +0200, Christophe Lyon wrote:
>>> On Tue, 23 Oct 2018 at 14:29, Segher Boessenkool
>>> <segher@kernel.crashing.org> wrote:
>>>> On Tue, Oct 23, 2018 at 12:14:27PM +0200, Christophe Lyon wrote:
>>>>> I have noticed many regressions on arm and aarch64 between 265366 and
>>>>> 265408 (this commit is 265398).
>>>>>
>>>>> I bisected at least one to this commit on aarch64:
>>>>> FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split
>>>>> live-range of register"
>>>>> The same test also regresses on arm.
>>>>
>>>> Many targets also fail gcc.dg/ira-shrinkwrap-prep-2.c; these tests fail
>>>> when random things in the RTL change, apparently.
>>
>> This is PR87708 now.
>>
>>>>> For a whole picture of all the regressions I noticed during these two
>>>>> commits, have a look at:
>>>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/report-build-info.html
>>>>
>>>> No thanks.  I am not going to click on 111 links and whatever is behind
>>>> those.  Please summarise, like, what was the diff in test_summary, and
>>>> then dig down into individual tests if you want.  Or whatever else works
>>>> both for you and for me.  This doesn't work for me.
>>>
>>> OK.... this is not very practical for me either. There were 25 commits between
>>> the two validations being compared,
>>> 25-28 gcc tests regressed on aarch64, depending on the exact target
>>> 177-206 gcc tests regressed on arm*, 7-29 gfortran regressions on arm*
>>> so I could have to run many bisects to make sure every regression is
>>> caused by the same commit.
>>
>> So many, ouch!  I didn't realise.
>>
> I've now got the results of validating your patch only, compared to the
> previous revision, and it does cause all the regressions I noticed earlier.
> 
>>> Since these are all automated builds with everything discarded after
>>> computing the regressions, it's quite time consuming to re-run the
>>> tests manually on my side (probably at least as much as it is for you).
>>
>> Running arm tests is very painful for me.  But you say this is on aarch64
>> as well, I didn't realise that either; aarch64 should be easy to test,
>> we have many reasonable aarch64 machines in the cfarm.
>>
>>> I know this doesn't answer your question, but I thought you could run aarch64
>>> tests easily and that would be more efficient for the project that you
>>> do it directly
>>> without waiting for me to provide hardly little more information.
>>
>> Well, I'm not too familiar with aarch64, so if you can say "this Z is a
>> pretty simple test that should do X but now does Y" that would be a huge
>> help :-)
>>
>>> Maybe this will answer your question better:
>>> List of aarch64-linux-gnu regressions:
>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/aarch64-none-linux-gnu/diff-gcc-rh60-aarch64-none-linux-gnu-default-default-default.txt
>>> List of arm-none-linux-gnueabihf regressions:
>>> (gcc) http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a9-neon-fp16.txt
>>> (gfortran) http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/arm-none-linux-gnueabihf/diff-gfortran-rh60-arm-none-linux-gnueabihf-arm-cortex-a9-neon-fp16.txt
>>
>> That may help yes, thanks!
>>
>>> To me it just highlights again that we need a validation system easier to
>>> work with when we break something on a target we are not familiar with.
>>
>> OTOH a patch like this is likely to break many target-specific tests, and
>> that should not prevent commiting it imnsho.  If it actively breaks things,
>> then of course it shouldn't go in as-is, or if it breaks bootstrap, etc.
>>
>>> I run post-commit validations as finely grained as possible with the CPU
>>> resources I have access to, that's not enough and I think having a
>>> developer-accessible gerrit+jenkins-like system would be very valuable
>>> to test patches before commit. We have a prototype in Linaro, not
>>> production-ready. But I guess that would be worth another
>>> discussion thread :)
>>
>> Yeah...  One when people have time for it ;-)
>>
>>
>> Segher
Segher Boessenkool Nov. 2, 2018, 11:03 p.m. UTC | #18
Hi!

On Fri, Nov 02, 2018 at 10:19:01PM +0000, Renlin Li wrote:
> I find a problem with your change to add make_more_copies.
> I am investigating those regressions, a big amount of them are wrong code 
> generation.
> 
> One problem is that, make_more_copies will split the assignment of fp to 
> sfp.
> 
> From:
> (insn 48 26 28 5 (set (reg/f:SI 102 sfp)
>         (reg/f:SI 11 fp)) -1
> To:
> (insn 51 32 26 5 (set (reg:SI 117)
>         (reg/f:SI 11 fp)) 646 {*arm_movsi_vfp}
>      (expr_list:REG_EQUIV (reg/f:SI 11 fp)
>         (nil)))
> (insn 48 26 28 5 (set (reg/f:SI 102 sfp)
>         (reg:SI 117)) 646 {*arm_movsi_vfp}
>      (expr_list:REG_DEAD (reg:SI 117)
>         (nil)))

I was looking at this just now :-)  (PR87871)

fp is a hard reg, but not a fixed reg, so make_more_moves thinks it is
fine to copy it to some pseudo, before copying it to the final dest.  And
that is just fine as far as I can see.

That final dest is sfp, and that final move is moved over the clobber of
fp, and yes eventually deleted as you say below.

> The original rtx is generated by expand_builtin_setjmp_receiver to adjust 
> the frame pointer.
> 
> And later in LRA, it will try to eliminate frame_pointer with hard frame 
> pointer which is
> defined the ELIMINABLE_REGS.
> 
> Your change split the insn into two.
> This makes it doesn't match the "from" and "to" regs defined in 
> ELIMINABLE_REGS.
> The if statement to generate the adjustment insn is been skipt.
> And the original instruction is just been deleted!

I don't follow why, or what should have prevented it from being deleted.

> Probably, we don't want to split the move rtx if they are related to 
> entries defined in ELIMINABLE_REGS?

One thing I can easily do is not making an intermediate pseudo when copying
*to* a fixed reg, which sfp is.  Let me try if that helps the testcase I'm
looking at (setjmp-4.c).


Segher
Segher Boessenkool Nov. 2, 2018, 11:54 p.m. UTC | #19
On Fri, Nov 02, 2018 at 06:03:20PM -0500, Segher Boessenkool wrote:
> > The original rtx is generated by expand_builtin_setjmp_receiver to adjust 
> > the frame pointer.
> > 
> > And later in LRA, it will try to eliminate frame_pointer with hard frame 
> > pointer which is
> > defined the ELIMINABLE_REGS.
> > 
> > Your change split the insn into two.
> > This makes it doesn't match the "from" and "to" regs defined in 
> > ELIMINABLE_REGS.
> > The if statement to generate the adjustment insn is been skipt.
> > And the original instruction is just been deleted!
> 
> I don't follow why, or what should have prevented it from being deleted.
> 
> > Probably, we don't want to split the move rtx if they are related to 
> > entries defined in ELIMINABLE_REGS?
> 
> One thing I can easily do is not making an intermediate pseudo when copying
> *to* a fixed reg, which sfp is.  Let me try if that helps the testcase I'm
> looking at (setjmp-4.c).

This indeed helps, see patch below.  Could you try that on the whole
testsuite?

Thanks,


Segher


p.s. It still is a problem in the arm backend, but this won't hurt combine,
so why not.


From 814ca23ce05384d017b3c2bff41ab61cf5446e46 Mon Sep 17 00:00:00 2001
Message-Id: <814ca23ce05384d017b3c2bff41ab61cf5446e46.1541202704.git.segher@kernel.crashing.org>
From: Segher Boessenkool <segher@kernel.crashing.org>
Date: Fri, 2 Nov 2018 23:33:32 +0000
Subject: [PATCH] combine: Don't break up copy from hard to fixed reg

---
 gcc/combine.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index dfb0b44..15e941a 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -14998,6 +14998,8 @@ make_more_copies (void)
 	    continue;
 	  if (TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
 	    continue;
+	  if (REG_P (dest) && TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest)))
+	    continue;
 
 	  rtx new_reg = gen_reg_rtx (GET_MODE (dest));
 	  rtx_insn *new_insn = gen_move_insn (new_reg, src);
Jeff Law Nov. 3, 2018, 2:34 a.m. UTC | #20
On 11/2/18 5:54 PM, Segher Boessenkool wrote:
> On Fri, Nov 02, 2018 at 06:03:20PM -0500, Segher Boessenkool wrote:
>>> The original rtx is generated by expand_builtin_setjmp_receiver to adjust 
>>> the frame pointer.
>>>
>>> And later in LRA, it will try to eliminate frame_pointer with hard frame 
>>> pointer which is
>>> defined the ELIMINABLE_REGS.
>>>
>>> Your change split the insn into two.
>>> This makes it doesn't match the "from" and "to" regs defined in 
>>> ELIMINABLE_REGS.
>>> The if statement to generate the adjustment insn is been skipt.
>>> And the original instruction is just been deleted!
>> I don't follow why, or what should have prevented it from being deleted.
>>
>>> Probably, we don't want to split the move rtx if they are related to 
>>> entries defined in ELIMINABLE_REGS?
>> One thing I can easily do is not making an intermediate pseudo when copying
>> *to* a fixed reg, which sfp is.  Let me try if that helps the testcase I'm
>> looking at (setjmp-4.c).
> This indeed helps, see patch below.  Could you try that on the whole
> testsuite?
> 
> Thanks,
> 
> 
> Segher
> 
> 
> p.s. It still is a problem in the arm backend, but this won't hurt combine,
> so why not.
> 
> 
> From 814ca23ce05384d017b3c2bff41ab61cf5446e46 Mon Sep 17 00:00:00 2001
> Message-Id: <814ca23ce05384d017b3c2bff41ab61cf5446e46.1541202704.git.segher@kernel.crashing.org>
> From: Segher Boessenkool <segher@kernel.crashing.org>
> Date: Fri, 2 Nov 2018 23:33:32 +0000
> Subject: [PATCH] combine: Don't break up copy from hard to fixed reg
> 
> ---
>  gcc/combine.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/combine.c b/gcc/combine.c
> index dfb0b44..15e941a 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -14998,6 +14998,8 @@ make_more_copies (void)
>  	    continue;
>  	  if (TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
>  	    continue;
> +	  if (REG_P (dest) && TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest)))
> +	    continue;
>  
>  	  rtx new_reg = gen_reg_rtx (GET_MODE (dest));
>  	  rtx_insn *new_insn = gen_move_insn (new_reg, src);
> -- 1.8.3.1
It certainly helps the armeb test results.

Jeff
Renlin Li Nov. 5, 2018, 12:35 p.m. UTC | #21
Hi Segher,

On 11/03/2018 02:34 AM, Jeff Law wrote:
> On 11/2/18 5:54 PM, Segher Boessenkool wrote:
>> On Fri, Nov 02, 2018 at 06:03:20PM -0500, Segher Boessenkool wrote:
>>>> The original rtx is generated by expand_builtin_setjmp_receiver to adjust
>>>> the frame pointer.
>>>>
>>>> And later in LRA, it will try to eliminate frame_pointer with hard frame
>>>> pointer which is
>>>> defined the ELIMINABLE_REGS.
>>>>
>>>> Your change split the insn into two.
>>>> This makes it doesn't match the "from" and "to" regs defined in
>>>> ELIMINABLE_REGS.
>>>> The if statement to generate the adjustment insn is been skipt.
>>>> And the original instruction is just been deleted!
>>> I don't follow why, or what should have prevented it from being deleted.
>>>
>>>> Probably, we don't want to split the move rtx if they are related to
>>>> entries defined in ELIMINABLE_REGS?
>>> One thing I can easily do is not making an intermediate pseudo when copying
>>> *to* a fixed reg, which sfp is.  Let me try if that helps the testcase I'm
>>> looking at (setjmp-4.c).
>> This indeed helps, see patch below.  Could you try that on the whole
>> testsuite?
>>
>> Thanks,
>>
>>
>> Segher
>>
>>
>> p.s. It still is a problem in the arm backend, but this won't hurt combine,
>> so why not.
>>
>>
>>  From 814ca23ce05384d017b3c2bff41ab61cf5446e46 Mon Sep 17 00:00:00 2001
>> Message-Id: <814ca23ce05384d017b3c2bff41ab61cf5446e46.1541202704.git.segher@kernel.crashing.org>
>> From: Segher Boessenkool <segher@kernel.crashing.org>
>> Date: Fri, 2 Nov 2018 23:33:32 +0000
>> Subject: [PATCH] combine: Don't break up copy from hard to fixed reg
>>
>> ---
>>   gcc/combine.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/gcc/combine.c b/gcc/combine.c
>> index dfb0b44..15e941a 100644
>> --- a/gcc/combine.c
>> +++ b/gcc/combine.c
>> @@ -14998,6 +14998,8 @@ make_more_copies (void)
>>   	    continue;
>>   	  if (TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
>>   	    continue;
>> +	  if (REG_P (dest) && TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest)))
>> +	    continue;
>>   
>>   	  rtx new_reg = gen_reg_rtx (GET_MODE (dest));
>>   	  rtx_insn *new_insn = gen_move_insn (new_reg, src);
>> -- 1.8.3.1
> It certainly helps the armeb test results.

Yes, I can also see it helps a lot with the regression test.
Thanks for working on it!


Beside the correctness issue, there are performance regression issues as other people also reported.

I analysised a case, which is gcc.c-torture/execute/builtins/memcpy-chk.c
In this case, two additional register moves and callee saves are emitted.

The problem is that, make_more_moves split a move into two. Ideally, the RA could figure out and
make the best register allocation. However, in reality, scheduler in some cases will reschedule
the instructions, and which changes the live-range of registers. And thus change the interference graph
of pseudo registers.

This will force the RA to choose a different register for it, and make the move instruction not redundant,
at least, not possible for RA to eliminate it.

For example,

set r102, r1

After combine:
insn x: set r103, r1
insn x+1: set r22, r103

After scheduler:
insn x: set r103, r1
...
...
...
insn x+1: set r102, r103

After IRA, r1 could be assigned to operands used in instructions in between insn x and x+1.
so r23 is conflicting with r1. LRA has to assign r23 a different hard register.
This cause one additional move, and probably one more callee save/restore.

Nothing is obviously wrong here. But...

One simple case probably not beneficial is to split hard register store.
According to your comment on make_more_moves, you might want to apply the transformation only
on hard-reg-to-pseudo-copy?

Regards,
Renlin




> 
> Jeff
>
Renlin Li Nov. 5, 2018, 6:16 p.m. UTC | #22
On 11/05/2018 12:35 PM, Renlin Li wrote:
> Hi Segher,
> 
> On 11/03/2018 02:34 AM, Jeff Law wrote:
>> On 11/2/18 5:54 PM, Segher Boessenkool wrote:
>>> On Fri, Nov 02, 2018 at 06:03:20PM -0500, Segher Boessenkool wrote:
>>>>> The original rtx is generated by expand_builtin_setjmp_receiver to adjust
>>>>> the frame pointer.
>>>>>
>>>>> And later in LRA, it will try to eliminate frame_pointer with hard frame
>>>>> pointer which is
>>>>> defined the ELIMINABLE_REGS.
>>>>>
>>>>> Your change split the insn into two.
>>>>> This makes it doesn't match the "from" and "to" regs defined in
>>>>> ELIMINABLE_REGS.
>>>>> The if statement to generate the adjustment insn is been skipt.
>>>>> And the original instruction is just been deleted!
>>>> I don't follow why, or what should have prevented it from being deleted.
>>>>
>>>>> Probably, we don't want to split the move rtx if they are related to
>>>>> entries defined in ELIMINABLE_REGS?
>>>> One thing I can easily do is not making an intermediate pseudo when copying
>>>> *to* a fixed reg, which sfp is.  Let me try if that helps the testcase I'm
>>>> looking at (setjmp-4.c).
>>> This indeed helps, see patch below.  Could you try that on the whole
>>> testsuite?
>>>
>>> Thanks,
>>>
>>>
>>> Segher
>>>
>>>
>>> p.s. It still is a problem in the arm backend, but this won't hurt combine,
>>> so why not.
>>>
>>>
>>>  From 814ca23ce05384d017b3c2bff41ab61cf5446e46 Mon Sep 17 00:00:00 2001
>>> Message-Id: <814ca23ce05384d017b3c2bff41ab61cf5446e46.1541202704.git.segher@kernel.crashing.org>
>>> From: Segher Boessenkool <segher@kernel.crashing.org>
>>> Date: Fri, 2 Nov 2018 23:33:32 +0000
>>> Subject: [PATCH] combine: Don't break up copy from hard to fixed reg
>>>
>>> ---
>>>   gcc/combine.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/gcc/combine.c b/gcc/combine.c
>>> index dfb0b44..15e941a 100644
>>> --- a/gcc/combine.c
>>> +++ b/gcc/combine.c
>>> @@ -14998,6 +14998,8 @@ make_more_copies (void)
>>>           continue;
>>>         if (TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
>>>           continue;
>>> +      if (REG_P (dest) && TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest)))
>>> +        continue;
>>>         rtx new_reg = gen_reg_rtx (GET_MODE (dest));
>>>         rtx_insn *new_insn = gen_move_insn (new_reg, src);
>>> -- 1.8.3.1
>> It certainly helps the armeb test results.
> 
> Yes, I can also see it helps a lot with the regression test.
> Thanks for working on it!
> 
> 
> Beside the correctness issue, there are performance regression issues as other people also reported.
> 
> I analysised a case, which is gcc.c-torture/execute/builtins/memcpy-chk.c
> In this case, two additional register moves and callee saves are emitted.
> 
> The problem is that, make_more_moves split a move into two. Ideally, the RA could figure out and
> make the best register allocation. However, in reality, scheduler in some cases will reschedule
> the instructions, and which changes the live-range of registers. And thus change the interference graph
> of pseudo registers.
> 
> This will force the RA to choose a different register for it, and make the move instruction not redundant,
> at least, not possible for RA to eliminate it.
> 
> For example,
> 
> set r102, r1
> 
> After combine:
> insn x: set r103, r1
> insn x+1: set r22, r103
> 
> After scheduler:
> insn x: set r103, r1
> ...
> ...
> ...
> insn x+1: set r102, r103
> 
> After IRA, r1 could be assigned to operands used in instructions in between insn x and x+1.
> so r23 is conflicting with r1. LRA has to assign r23 a different hard register.

Sorry, this is not correct. Instructions scheduled between x and x+1 directly use hard register r1.
It is not IRA/LRA assigning r1 to the operands.


To reproduce this particular case, you could use:
cc1  -O3 -marm -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp gcc.c-torture/execute/builtins/memcpy-chk.c

This insn is been splitted.

(insn 152 150 154 11 (set (mem/c:QI (plus:SI (reg/f:SI 266)
                 (const_int 24 [0x18])) [0 MEM[(void *)&p + 20B]+4 S1 A32])
         (reg:QI 1 r1)) "memcpy-chk-reduce.c":48:3 189 {*arm_movqi_insn}
      (expr_list:REG_DEAD (reg:QI 1 r1)
         (nil)))


Regards,
Renlin


> This cause one additional move, and probably one more callee save/restore.
> 
> Nothing is obviously wrong here. But...
> 
> One simple case probably not beneficial is to split hard register store.
> According to your comment on make_more_moves, you might want to apply the transformation only
> on hard-reg-to-pseudo-copy?
> 
> Regards,
> Renlin
> 
> 
> 
> 
>>
>> Jeff
>>
Segher Boessenkool Nov. 5, 2018, 10:50 p.m. UTC | #23
Hi!

On Mon, Nov 05, 2018 at 12:35:24PM +0000, Renlin Li wrote:
> >>--- a/gcc/combine.c
> >>+++ b/gcc/combine.c
> >>@@ -14998,6 +14998,8 @@ make_more_copies (void)
> >>  	    continue;
> >>  	  if (TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
> >>  	    continue;
> >>+	  if (REG_P (dest) && TEST_HARD_REG_BIT (fixed_reg_set, REGNO 
> >>(dest)))
> >>+	    continue;
> >>  
> >>  	  rtx new_reg = gen_reg_rtx (GET_MODE (dest));
> >>  	  rtx_insn *new_insn = gen_move_insn (new_reg, src);
> >>-- 1.8.3.1
> >It certainly helps the armeb test results.
> 
> Yes, I can also see it helps a lot with the regression test.
> Thanks for working on it!

I committed a variant that does this only for frame_pointer_rtx, as r265821.

> Beside the correctness issue, there are performance regression issues as 
> other people also reported.
> 
> I analysised a case, which is gcc.c-torture/execute/builtins/memcpy-chk.c
> In this case, two additional register moves and callee saves are emitted.
> 
> The problem is that, make_more_moves split a move into two. Ideally, the RA 
> could figure out and
> make the best register allocation. However, in reality, scheduler in some 
> cases will reschedule
> the instructions, and which changes the live-range of registers. And thus 
> change the interference graph
> of pseudo registers.
> 
> This will force the RA to choose a different register for it, and make the 
> move instruction not redundant,
> at least, not possible for RA to eliminate it.
> 
> For example,
> 
> set r102, r1
> 
> After combine:
> insn x: set r103, r1
> insn x+1: set r22, r103
> 
> After scheduler:
> insn x: set r103, r1
> ...
> ...
> ...
> insn x+1: set r102, r103
> 
> After IRA, r1 could be assigned to operands used in instructions in between 
> insn x and x+1.
> so r23 is conflicting with r1. LRA has to assign r23 a different hard 
> register.
> This cause one additional move, and probably one more callee save/restore.
> 
> Nothing is obviously wrong here. But...
> 
> One simple case probably not beneficial is to split hard register store.

You mean a store from a hard reg directly to memory?  Leaving that
constrains scheduling.

> According to your comment on make_more_moves, you might want to apply the 
> transformation only
> on hard-reg-to-pseudo-copy?

hard-reg-to-anything really.  Actually making it do this only for pseudos
caused a lot of degradation for some targets iirc.  I can do more tests.

Almost all reported degradations are cases where RA does not make the
best decision.  There are other cases where this combine change makes
better code than before.  (If this was not true, a greedy RA would work
well.)


Segher
Segher Boessenkool Nov. 7, 2018, 11:02 p.m. UTC | #24
Hi!

On Mon, Nov 05, 2018 at 06:16:16PM +0000, Renlin Li wrote:
> Sorry, this is not correct. Instructions scheduled between x and x+1 
> directly use hard register r1.
> It is not IRA/LRA assigning r1 to the operands.
> 
> 
> To reproduce this particular case, you could use:
> cc1  -O3 -marm -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp 
> gcc.c-torture/execute/builtins/memcpy-chk.c
> 
> This insn is been splitted.
> 
> (insn 152 150 154 11 (set (mem/c:QI (plus:SI (reg/f:SI 266)
>                 (const_int 24 [0x18])) [0 MEM[(void *)&p + 20B]+4 S1 A32])
>         (reg:QI 1 r1)) "memcpy-chk-reduce.c":48:3 189 {*arm_movqi_insn}
>      (expr_list:REG_DEAD (reg:QI 1 r1)
>         (nil)))

Okay, I see what is going on.  The arm port often expands movmem to use
hard registers (so that it can use load/store multiple?).  This does not
work well with scheduling and RA, and this combine feature makes it worse.

I don't know what to do about it in combine.


Segher
Segher Boessenkool Nov. 8, 2018, 8:34 p.m. UTC | #25
On Thu, Nov 08, 2018 at 03:44:44PM +0000, Sam Tebbs wrote:
> Does your patch fix the incorrect generation of "scvtf s1, s1"? I was
> looking at the issue as well and don't want to do any overlapping work.

I don't know.  Well, there are no incorrect code issues I know of at all
now; but you mean that it is taking an instruction more than you would
like to see, I suppose?


Segher
Richard Earnshaw (lists) Nov. 9, 2018, 10:20 a.m. UTC | #26
On 07/11/2018 23:02, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Nov 05, 2018 at 06:16:16PM +0000, Renlin Li wrote:
>> Sorry, this is not correct. Instructions scheduled between x and x+1 
>> directly use hard register r1.
>> It is not IRA/LRA assigning r1 to the operands.
>>
>>
>> To reproduce this particular case, you could use:
>> cc1  -O3 -marm -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp 
>> gcc.c-torture/execute/builtins/memcpy-chk.c
>>
>> This insn is been splitted.
>>
>> (insn 152 150 154 11 (set (mem/c:QI (plus:SI (reg/f:SI 266)
>>                 (const_int 24 [0x18])) [0 MEM[(void *)&p + 20B]+4 S1 A32])
>>         (reg:QI 1 r1)) "memcpy-chk-reduce.c":48:3 189 {*arm_movqi_insn}
>>      (expr_list:REG_DEAD (reg:QI 1 r1)
>>         (nil)))
> 
> Okay, I see what is going on.  The arm port often expands movmem to use
> hard registers (so that it can use load/store multiple?).  This does not
> work well with scheduling and RA, and this combine feature makes it worse.
> 

Yes, GCC doesn't have any way of describing the constraint "I need <n>
registers in ascending order of register number".  So the only way
around this is to nail down which registers are used.  In practice this
has worked pretty well for the last 20 years or so.

R.

> I don't know what to do about it in combine.
> 
> 
> Segher
>
Jeff Law Nov. 9, 2018, 9:12 p.m. UTC | #27
On 11/8/18 1:34 PM, Segher Boessenkool wrote:
> On Thu, Nov 08, 2018 at 03:44:44PM +0000, Sam Tebbs wrote:
>> Does your patch fix the incorrect generation of "scvtf s1, s1"? I was
>> looking at the issue as well and don't want to do any overlapping work.
> 
> I don't know.  Well, there are no incorrect code issues I know of at all
> now; but you mean that it is taking an instruction more than you would
> like to see, I suppose?
Which is ultimately similar to the 3 regressions HJ has reported on x86_64.

jeff
Segher Boessenkool Nov. 10, 2018, 4:50 a.m. UTC | #28
On Fri, Nov 09, 2018 at 02:12:22PM -0700, Jeff Law wrote:
> On 11/8/18 1:34 PM, Segher Boessenkool wrote:
> > On Thu, Nov 08, 2018 at 03:44:44PM +0000, Sam Tebbs wrote:
> >> Does your patch fix the incorrect generation of "scvtf s1, s1"? I was
> >> looking at the issue as well and don't want to do any overlapping work.
> > 
> > I don't know.  Well, there are no incorrect code issues I know of at all
> > now; but you mean that it is taking an instruction more than you would
> > like to see, I suppose?
> Which is ultimately similar to the 3 regressions HJ has reported on x86_64.

I just wish those bug reports would mature from "oh hey this patch changed
some stuff" to "oh yeah this made some existing problems (in the backend,
or common code, hey let's blame RA because it's most obvious here, why not!)
more obvious".

All those PRs show situations where we could generate better code.  None
of those PRs show situations where the combine patches were at fault.

Blaming others is not going to improve your code one bit.


(And yes, i'll commit one further improvement patch _right now_).


Segher
Jeff Law Nov. 10, 2018, 4:54 p.m. UTC | #29
On 11/9/18 9:50 PM, Segher Boessenkool wrote:
> On Fri, Nov 09, 2018 at 02:12:22PM -0700, Jeff Law wrote:
>> On 11/8/18 1:34 PM, Segher Boessenkool wrote:
>>> On Thu, Nov 08, 2018 at 03:44:44PM +0000, Sam Tebbs wrote:
>>>> Does your patch fix the incorrect generation of "scvtf s1, s1"? I was
>>>> looking at the issue as well and don't want to do any overlapping work.
>>>
>>> I don't know.  Well, there are no incorrect code issues I know of at all
>>> now; but you mean that it is taking an instruction more than you would
>>> like to see, I suppose?
>> Which is ultimately similar to the 3 regressions HJ has reported on x86_64.
> 
> I just wish those bug reports would mature from "oh hey this patch changed
> some stuff" to "oh yeah this made some existing problems (in the backend,
> or common code, hey let's blame RA because it's most obvious here, why not!)
> more obvious".
> 
> All those PRs show situations where we could generate better code.  None
> of those PRs show situations where the combine patches were at fault.
> 
> Blaming others is not going to improve your code one bit.
> 
> 
> (And yes, i'll commit one further improvement patch _right now_).
I agree to a degree, but it's also the case that these are clearly code
quality regressions triggered by your changes.  Your help in resolving
them would be greatly appreciated.

Jeff
Sam Tebbs Nov. 12, 2018, 11:57 a.m. UTC | #30
On 11/08/2018 08:34 PM, Segher Boessenkool wrote:
> On Thu, Nov 08, 2018 at 03:44:44PM +0000, Sam Tebbs wrote:
>> Does your patch fix the incorrect generation of "scvtf s1, s1"? I was
>> looking at the issue as well and don't want to do any overlapping work.
> I don't know.  Well, there are no incorrect code issues I know of at all
> now; but you mean that it is taking an instruction more than you would
> like to see, I suppose?
>
>
> Segher

Yes I am referring to the extra instruction generated, which is 
incorrect generation in my opinion since it shouldn't be generated.
Segher Boessenkool Nov. 12, 2018, 12:53 p.m. UTC | #31
On Mon, Nov 12, 2018 at 11:54:37AM +0000, Sam Tebbs wrote:
> On 11/08/2018 08:34 PM, Segher Boessenkool wrote:
> 
> > On Thu, Nov 08, 2018 at 03:44:44PM +0000, Sam Tebbs wrote:
> >> Does your patch fix the incorrect generation of "scvtf s1, s1"? I was
> >> looking at the issue as well and don't want to do any overlapping work.
> > I don't know.  Well, there are no incorrect code issues I know of at all
> > now; but you mean that it is taking an instruction more than you would
> > like to see, I suppose?
> 
> Yes, I am referring to the extra instruction being generated. In my
> opinion is incorrect code generation since the intention is that it
> shouldn't be generated, and shouldn't based on the relevant code and
> patterns implemented.

That is not what incorrect code means; it is a "missed optimisation",
instead.

Anyway, does it now do what you want?

(PR87763 btw)


Segher
diff mbox series

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index 256b5a4..3ff1760 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -99,6 +99,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "explow.h"
 #include "insn-attr.h"
 #include "rtlhooks-def.h"
+#include "expr.h"
 #include "params.h"
 #include "tree-pass.h"
 #include "valtrack.h"
@@ -2348,8 +2349,7 @@  cant_combine_insn_p (rtx_insn *insn)
     dest = SUBREG_REG (dest);
   if (REG_P (src) && REG_P (dest)
       && ((HARD_REGISTER_P (src)
-	   && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src))
-	   && targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO (src))))
+	   && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
 	  || (HARD_REGISTER_P (dest)
 	      && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest))
 	      && targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO (dest))))))
@@ -14969,11 +14969,53 @@  dump_combine_total_stats (FILE *file)
      total_attempts, total_merges, total_extras, total_successes);
 }
 
+/* Make pseudo-to-pseudo copies after every hard-reg-to-pseudo-copy, because
+   the reg-to-reg copy can usefully combine with later instructions, but we
+   do not want to combine the hard reg into later instructions, for that
+   restricts register allocation.  */
+static void
+make_more_copies (void)
+{
+  basic_block bb;
+
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      rtx_insn *insn;
+
+      FOR_BB_INSNS (bb, insn)
+        {
+          if (!NONDEBUG_INSN_P (insn))
+            continue;
+
+	  rtx set = single_set (insn);
+	  if (!set)
+	    continue;
+	  rtx src = SET_SRC (set);
+	  rtx dest = SET_DEST (set);
+	  if (GET_CODE (src) == SUBREG)
+	    src = SUBREG_REG (src);
+	  if (!(REG_P (src) && HARD_REGISTER_P (src)))
+	    continue;
+	  if (TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
+	    continue;
+
+	  rtx new_reg = gen_reg_rtx (GET_MODE (dest));
+	  rtx_insn *insn1 = gen_move_insn (new_reg, src);
+	  rtx_insn *insn2 = gen_move_insn (dest, new_reg);
+	  emit_insn_after (insn1, insn);
+	  emit_insn_after (insn2, insn1);
+	  delete_insn (insn);
+
+	  insn = insn2;
+	}
+    }
+}
+
 /* Try combining insns through substitution.  */
 static unsigned int
 rest_of_handle_combine (void)
 {
-  int rebuild_jump_labels_after_combine;
+  make_more_copies ();
 
   df_set_flags (DF_LR_RUN_DCE + DF_DEFER_INSN_RESCAN);
   df_note_add_problem ();
@@ -14982,7 +15024,7 @@  rest_of_handle_combine (void)
   regstat_init_n_sets_and_refs ();
   reg_n_sets_max = max_reg_num ();
 
-  rebuild_jump_labels_after_combine
+  int rebuild_jump_labels_after_combine
     = combine_instructions (get_insns (), max_reg_num ());
 
   /* Combining insns may have turned an indirect jump into a