[0/3] Fix PR78120, in ifcvt/rtlanal/i386.

Message ID	2bde94e0-b470-8aad-6a9c-91e45dc8b687@redhat.com
State	New
Headers	show Return-Path: <gcc-patches-return-442434-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:from:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=Dwsf6HkDiK4ApTCm8 MoiE0pfChxQXRY89gpzcdquyThXRZTR4Xgd0cURHLeZIS5GKDi+oRqobM87oHw8Q 1iBUvmyjoopjTCuR06zgCqpgCqtCl+dX+HMLR0nejL1vFwL+aBwPvFI2m1s6Ossd jXCMrmSt3BktIj0OMKmmGYV4zA= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Subject: Re: [0/3] Fix PR78120, in ifcvt/rtlanal/i386. To: GCC Patches <gcc-patches@gcc.gnu.org> References: <e6d06b7a-831b-299a-5c5a-507e267238a4@redhat.com> From: Bernd Schmidt <bschmidt@redhat.com> Message-ID: <2bde94e0-b470-8aad-6a9c-91e45dc8b687@redhat.com> Date: Wed, 23 Nov 2016 20:00:30 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <e6d06b7a-831b-299a-5c5a-507e267238a4@redhat.com> Content-Type: multipart/mixed; boundary="------------D0255FEB1BC676B9CEC10BE7"

Bernd Schmidt Nov. 23, 2016, 7 p.m. UTC

Note that I misspelled the PR number in the 0/3 message :-/

On 11/23/2016 07:57 PM, Bernd Schmidt wrote:
> 1. I noticed comparisons between set_src_cost and set_rtx_cost seemed to
> be invalid. There seems to be no good reason that insn_rtx_cost
> shouldn't use the latter. It also makes the numbers comparable to the
> ones you get from seq_cost.


Bernd

Jeff Law Nov. 23, 2016, 7:30 p.m. UTC | #1

On 11/23/2016 12:00 PM, Bernd Schmidt wrote:
> Note that I misspelled the PR number in the 0/3 message :-/
>
> On 11/23/2016 07:57 PM, Bernd Schmidt wrote:
>> 1. I noticed comparisons between set_src_cost and set_rtx_cost seemed to
>> be invalid. There seems to be no good reason that insn_rtx_cost
>> shouldn't use the latter. It also makes the numbers comparable to the
>> ones you get from seq_cost.
>
>
> Bernd
>
> 71280-1.diff
>
>
> 	PR rtl-optimization/78120
> 	* rtlanal.c (insn_rtx_cost): Use set_rtx_cost.
LGTM.   As a principle, if we have the full set, we ought to use 
set_rtx_cost, and only use set_src_cost if we don't have the full set 
expression.

Jeff

Bernd Schmidt Nov. 23, 2016, 7:31 p.m. UTC | #2

On 11/23/2016 08:30 PM, Jeff Law wrote:
> On 11/23/2016 12:00 PM, Bernd Schmidt wrote:
>> Note that I misspelled the PR number in the 0/3 message :-/
>>
>> On 11/23/2016 07:57 PM, Bernd Schmidt wrote:
>>> 1. I noticed comparisons between set_src_cost and set_rtx_cost seemed to
>>> be invalid. There seems to be no good reason that insn_rtx_cost
>>> shouldn't use the latter. It also makes the numbers comparable to the
>>> ones you get from seq_cost.
>>
>>
>> Bernd
>>
>> 71280-1.diff
>>
>>
>>     PR rtl-optimization/78120
>>     * rtlanal.c (insn_rtx_cost): Use set_rtx_cost.
> LGTM.   As a principle, if we have the full set, we ought to use
> set_rtx_cost, and only use set_src_cost if we don't have the full set
> expression.

Note that this cannot really be applied on its own, it needs patch 2/3 
so as to not make the x86 port do strange things. So I'll be holding off 
on this one until we have a consensus on the whole set.


Bernd

Segher Boessenkool Nov. 24, 2016, 2:21 p.m. UTC | #3

[ Only your 0/3 and 3/3 messages arrived -- or is this 1/3? ]

On Wed, Nov 23, 2016 at 08:00:30PM +0100, Bernd Schmidt wrote:
> Note that I misspelled the PR number in the 0/3 message :-/
> 
> On 11/23/2016 07:57 PM, Bernd Schmidt wrote:
> >1. I noticed comparisons between set_src_cost and set_rtx_cost seemed to
> >be invalid. There seems to be no good reason that insn_rtx_cost
> >shouldn't use the latter. It also makes the numbers comparable to the
> >ones you get from seq_cost.
> 
> 
> Bernd

> 	PR rtl-optimization/78120
> 	* rtlanal.c (insn_rtx_cost): Use set_rtx_cost.
> 
> Index: gcc/rtlanal.c
> ===================================================================
> --- gcc/rtlanal.c	(revision 242038)
> +++ gcc/rtlanal.c	(working copy)
> @@ -5211,7 +5211,7 @@ insn_rtx_cost (rtx pat, bool speed)
>    else
>      return 0;
>  
> -  cost = set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed);
> +  cost = set_rtx_cost (set, speed);
>    return cost > 0 ? cost : COSTS_N_INSNS (1);
>  }
>  

Combine uses insn_rtx_cost extensively.  I have tried to change it to use
the full rtx cost, not just the source cost, a few times before, and it
always only regressed.  Part of it is that most ports' cost calculations
are, erm, not so great -- every target we care about needs fixes.

Let's please not try this in stage 3.


Segher

Bernd Schmidt Nov. 24, 2016, 2:26 p.m. UTC | #4

On 11/24/2016 03:21 PM, Segher Boessenkool wrote:

> Combine uses insn_rtx_cost extensively.  I have tried to change it to use
> the full rtx cost, not just the source cost, a few times before, and it
> always only regressed.  Part of it is that most ports' cost calculations
> are, erm, not so great -- every target we care about needs fixes.
>
> Let's please not try this in stage 3.

It got approved and committed. Do you want me to revert it now or wait 
for obvious signs of fallout?

Bernd

Segher Boessenkool Nov. 24, 2016, 2:36 p.m. UTC | #5

On Thu, Nov 24, 2016 at 03:26:45PM +0100, Bernd Schmidt wrote:
> On 11/24/2016 03:21 PM, Segher Boessenkool wrote:
> 
> >Combine uses insn_rtx_cost extensively.  I have tried to change it to use
> >the full rtx cost, not just the source cost, a few times before, and it
> >always only regressed.  Part of it is that most ports' cost calculations
> >are, erm, not so great -- every target we care about needs fixes.
> >
> >Let's please not try this in stage 3.
> 
> It got approved and committed. Do you want me to revert it now or wait 
> for obvious signs of fallout?

In my opinion it is an early stage 1 thing, not something suitable for
stage 3.  I can do some simple tests on various targets if you want.


Segher

Bernd Schmidt Nov. 24, 2016, 2:38 p.m. UTC | #6

On 11/24/2016 03:36 PM, Segher Boessenkool wrote:
> On Thu, Nov 24, 2016 at 03:26:45PM +0100, Bernd Schmidt wrote:
>> On 11/24/2016 03:21 PM, Segher Boessenkool wrote:
>>
>>> Combine uses insn_rtx_cost extensively.  I have tried to change it to use
>>> the full rtx cost, not just the source cost, a few times before, and it
>>> always only regressed.  Part of it is that most ports' cost calculations
>>> are, erm, not so great -- every target we care about needs fixes.
>>>
>>> Let's please not try this in stage 3.
>>
>> It got approved and committed. Do you want me to revert it now or wait
>> for obvious signs of fallout?
>
> In my opinion it is an early stage 1 thing, not something suitable for
> stage 3.  I can do some simple tests on various targets if you want.

Sure.

I'll make the argument that stage 3 is when we fix stuff, including 
performance regressions, and this patch is very clearly a fix. When we 
have very obvious distortions like a case where costs from insn_rtx_cost 
and seq_cost aren't comparable, it's impossible to arrive at sane solutions.


Bernd

Eric Botcazou Nov. 24, 2016, 2:44 p.m. UTC | #7

> I'll make the argument that stage 3 is when we fix stuff, including
> performance regressions, and this patch is very clearly a fix. When we
> have very obvious distortions like a case where costs from insn_rtx_cost
> and seq_cost aren't comparable, it's impossible to arrive at sane solutions.

It would help to make a pass over the main architecture back-ends and evaluate 
the potential fallout and required adjustments, if any.

Segher Boessenkool Nov. 24, 2016, 2:53 p.m. UTC | #8

On Thu, Nov 24, 2016 at 03:38:55PM +0100, Bernd Schmidt wrote:
> On 11/24/2016 03:36 PM, Segher Boessenkool wrote:
> >On Thu, Nov 24, 2016 at 03:26:45PM +0100, Bernd Schmidt wrote:
> >>On 11/24/2016 03:21 PM, Segher Boessenkool wrote:
> >>
> >>>Combine uses insn_rtx_cost extensively.  I have tried to change it to use
> >>>the full rtx cost, not just the source cost, a few times before, and it
> >>>always only regressed.  Part of it is that most ports' cost calculations
> >>>are, erm, not so great -- every target we care about needs fixes.
> >>>
> >>>Let's please not try this in stage 3.
> >>
> >>It got approved and committed. Do you want me to revert it now or wait
> >>for obvious signs of fallout?
> >
> >In my opinion it is an early stage 1 thing, not something suitable for
> >stage 3.  I can do some simple tests on various targets if you want.
> 
> Sure.
> 
> I'll make the argument that stage 3 is when we fix stuff, including 
> performance regressions, and this patch is very clearly a fix. When we 
> have very obvious distortions like a case where costs from insn_rtx_cost 
> and seq_cost aren't comparable, it's impossible to arrive at sane solutions.

Your own 2/3 shows my point: you needed fixes to the i386 port for it
to behave sanely after this 1/3; what makes you think other ports are
not in the same boat?

IMHO switching insn_rtx_cost to be based on not just set_src_cost is
a good idea, but will require re-tuning of all targets, so it is not
stage 3 material.

That we compare different kinds of costs (which really has no meaning at
all, it's a heuristic at best) in various places is a known problem, not
a regression.


Segher

Richard Biener Nov. 24, 2016, 3:16 p.m. UTC | #9

On Thu, Nov 24, 2016 at 3:53 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Thu, Nov 24, 2016 at 03:38:55PM +0100, Bernd Schmidt wrote:
>> On 11/24/2016 03:36 PM, Segher Boessenkool wrote:
>> >On Thu, Nov 24, 2016 at 03:26:45PM +0100, Bernd Schmidt wrote:
>> >>On 11/24/2016 03:21 PM, Segher Boessenkool wrote:
>> >>
>> >>>Combine uses insn_rtx_cost extensively.  I have tried to change it to use
>> >>>the full rtx cost, not just the source cost, a few times before, and it
>> >>>always only regressed.  Part of it is that most ports' cost calculations
>> >>>are, erm, not so great -- every target we care about needs fixes.
>> >>>
>> >>>Let's please not try this in stage 3.
>> >>
>> >>It got approved and committed. Do you want me to revert it now or wait
>> >>for obvious signs of fallout?
>> >
>> >In my opinion it is an early stage 1 thing, not something suitable for
>> >stage 3.  I can do some simple tests on various targets if you want.
>>
>> Sure.
>>
>> I'll make the argument that stage 3 is when we fix stuff, including
>> performance regressions, and this patch is very clearly a fix. When we
>> have very obvious distortions like a case where costs from insn_rtx_cost
>> and seq_cost aren't comparable, it's impossible to arrive at sane solutions.
>
> Your own 2/3 shows my point: you needed fixes to the i386 port for it
> to behave sanely after this 1/3; what makes you think other ports are
> not in the same boat?
>
> IMHO switching insn_rtx_cost to be based on not just set_src_cost is
> a good idea, but will require re-tuning of all targets, so it is not
> stage 3 material.

Agreed.

> That we compare different kinds of costs (which really has no meaning at
> all, it's a heuristic at best) in various places is a known problem, not
> a regression.

But technically stage 3 is for general bugfixing, not only regression fixing.

I'd say be prepared to revert but wait to see who screams first.

Thanks,
Richard.

>
> Segher

Bernd Schmidt Nov. 24, 2016, 3:34 p.m. UTC | #10

On 11/24/2016 03:53 PM, Segher Boessenkool wrote:
>
> Your own 2/3 shows my point: you needed fixes to the i386 port for it
> to behave sanely after this 1/3; what makes you think other ports are
> not in the same boat?

When I realized i386 was broken I had a look at aarch64 and it looked 
sane, and that was the basis of the idea for patch 2/3.

It's possible other targets may need to handle SETs as well. In theory, 
something like the block of code I added for i386 could just work when 
copied to other backends if they have no "case SET" yet. If you do run 
into problems please try at least that very simple fix.

> That we compare different kinds of costs (which really has no meaning at
> all, it's a heuristic at best) in various places is a known problem, not
> a regression.

It leads to observable regressions however as PR78120 shows, when code 
like ifcvt tries to use cost calculations in apparently-sensible ways. 
And, as Richard mentioned, we're not in a regressions-only phase.

Bernd

Jeff Law Nov. 24, 2016, 3:46 p.m. UTC | #11

On 11/24/2016 08:16 AM, Richard Biener wrote:
>>
>> IMHO switching insn_rtx_cost to be based on not just set_src_cost is
>> a good idea, but will require re-tuning of all targets, so it is not
>> stage 3 material.
>
> Agreed.
>
>> That we compare different kinds of costs (which really has no meaning at
>> all, it's a heuristic at best) in various places is a known problem, not
>> a regression.
>
> But technically stage 3 is for general bugfixing, not only regression fixing.
>
> I'd say be prepared to revert but wait to see who screams first.
Right.  And I would claim that we're early enough in stage3 that 
attempting to address this BZ is a good thing.  The BZ also happens to 
be a 6/7 regression.

So I'd say let's go with the patch, but be aware that there may be a 
need to twiddle other ports.  If we find a bunch of ports are 
problematical than we might need to think about reversion.

jeff

Jeff Law Nov. 24, 2016, 3:48 p.m. UTC | #12

On 11/24/2016 07:53 AM, Segher Boessenkool wrote:
>
> That we compare different kinds of costs (which really has no meaning at
> all, it's a heuristic at best) in various places is a known problem, not
> a regression.
But the problems with the costing system exhibit themselves as a code 
quality regression.  In the end that's what the end-users see -- a 
regression in the quality of the code GCC generates.

jeff

Segher Boessenkool Nov. 24, 2016, 4:14 p.m. UTC | #13

On Thu, Nov 24, 2016 at 08:48:04AM -0700, Jeff Law wrote:
> On 11/24/2016 07:53 AM, Segher Boessenkool wrote:
> >
> >That we compare different kinds of costs (which really has no meaning at
> >all, it's a heuristic at best) in various places is a known problem, not
> >a regression.
> But the problems with the costing system exhibit themselves as a code 
> quality regression.  In the end that's what the end-users see -- a 
> regression in the quality of the code GCC generates.

Yes, exactly -- and I fear this all-encompassing change will cause just
such a regression for many users.  Tests are running, will know more
later today (or tomorrow).

The PR is about a very specific problem; the patch is not.  The patch
is not a bug fix.  If we allow anything that "makes things better" in
stage 3, what make it different from stage 1 then?

Segher

Segher Boessenkool Nov. 24, 2016, 10:32 p.m. UTC | #14

On Thu, Nov 24, 2016 at 10:14:24AM -0600, Segher Boessenkool wrote:
> On Thu, Nov 24, 2016 at 08:48:04AM -0700, Jeff Law wrote:
> > On 11/24/2016 07:53 AM, Segher Boessenkool wrote:
> > >
> > >That we compare different kinds of costs (which really has no meaning at
> > >all, it's a heuristic at best) in various places is a known problem, not
> > >a regression.
> > But the problems with the costing system exhibit themselves as a code 
> > quality regression.  In the end that's what the end-users see -- a 
> > regression in the quality of the code GCC generates.
> 
> Yes, exactly -- and I fear this all-encompassing change will cause just
> such a regression for many users.  Tests are running, will know more
> later today (or tomorrow).
> 
> The PR is about a very specific problem; the patch is not.  The patch
> is not a bug fix.  If we allow anything that "makes things better" in
> stage 3, what make it different from stage 1 then?

Here are results of testing with trunk right before the three patches,
compared with with the three patches.  This lists the sizes of the vmlinux
of a Linux kernel build for that arch.


better:
    blackfin   1973931   1973867
         frv   3638192   3637792
       h8300   1060172   1059976
        i386   9742984   9742463
        ia64  15402035  15396171
        mips   4286748   4286692
     mn10300   2360025   2358201
       nios2   3185625   3176693
      x86_64  10360418  10359588

worse:
       alpha   5439003   5455979
         c6x   2107939   2108931
        cris   2189380   2193836
        m32r   3427409   3427453
        m68k   3228408   3230978
      mips64   5564819   5565291
      parisc   8278881   8289573
    parisc64   7234619   7249139
     powerpc   8438949   8440005
   powerpc64  14499969  14508689
        s390  12778748  12779220
     shnommu   1369868   1371020
     sparc64   5921556   5922172
      tilegx  12297581  12307461
     tilepro  11215603  11227339
      xtensa   1776196   1779152

does not build:
         arc         0         0
         arm         0         0
       arm64         0         0
  microblaze         0         0
          sh         0         0
       sparc         0         0


Segher

Richard Biener Nov. 25, 2016, 9:15 a.m. UTC | #15

On Thu, Nov 24, 2016 at 5:14 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Thu, Nov 24, 2016 at 08:48:04AM -0700, Jeff Law wrote:
>> On 11/24/2016 07:53 AM, Segher Boessenkool wrote:
>> >
>> >That we compare different kinds of costs (which really has no meaning at
>> >all, it's a heuristic at best) in various places is a known problem, not
>> >a regression.
>> But the problems with the costing system exhibit themselves as a code
>> quality regression.  In the end that's what the end-users see -- a
>> regression in the quality of the code GCC generates.
>
> Yes, exactly -- and I fear this all-encompassing change will cause just
> such a regression for many users.  Tests are running, will know more
> later today (or tomorrow).
>
> The PR is about a very specific problem; the patch is not.  The patch
> is not a bug fix.  If we allow anything that "makes things better" in
> stage 3, what make it different from stage 1 then?

That's a good question ;)  The stage 3 definition has a loophole via
"go file a bug about feature X, then it's a bugfix!".

I'm all open for a more sensible definition, like constraining the kind
of non-regression fixes that we want to allow, but I fear the most
sensible option would be to simply ditch the notion of different
"stages" and make it "general development" and "regression fixing".
(though if you try hard enough and go back in time you'll find that
almost all non-enhancement bugs are regressions in some sense)

And yes, current stage3 still feels too much like stage1 ;)

Richard.

Jeff Law Nov. 25, 2016, 3:34 p.m. UTC | #16

On 11/25/2016 02:15 AM, Richard Biener wrote:
>
> That's a good question ;)  The stage 3 definition has a loophole via
> "go file a bug about feature X, then it's a bugfix!".
Right.  That loophole has existed since we've moved to the current model 
-- we extend a level of trust to our developers not to abuse the 
loophole.  I think that level of trust is warranted and hasn't been 
significantly violated.

> I'm all open for a more sensible definition, like constraining the kind
> of non-regression fixes that we want to allow, but I fear the most
> sensible option would be to simply ditch the notion of different
> "stages" and make it "general development" and "regression fixing".
> (though if you try hard enough and go back in time you'll find that
> almost all non-enhancement bugs are regressions in some sense)
Similarly, I'm always open for improvements.  My worry is if we went to 
development/regression bugfixing cycle, then non-regression bugs would 
largely be ignored.

General bugfixing is, IMHO, a good period -- it gets a larger portion of 
our developers fixing bugs and gives folks with a heavy review load a 
chance to flush out their queues of stuff that came in right at the end 
of stage1.

I'm not really pushing to open a "development cycle" discussion right 
now, but I do sense that our cycles could use some refinement.

>
> And yes, current stage3 still feels too much like stage1 ;)
Hasn't seemed that way to me, but obviously experiences will differ.  My 
biggest worry about this cycle is the higher than typical (compared to 
the last few years) regression bug counts.

That worry is somewhat mitigated by the belief that we're marking 
regressions much much more consistently this year, so we're a lot less 
likely to get a big jump in marked regressions like we saw last year.

*If* that is the case (and my light poking around seems to indicate 
that's true), then we're likely ahead of the gcc-6 cycle, but behind the 
gcc-5 cycle.

jeff

Segher Boessenkool Nov. 25, 2016, 3:55 p.m. UTC | #17

On Fri, Nov 25, 2016 at 10:15:25AM +0100, Richard Biener wrote:
> On Thu, Nov 24, 2016 at 5:14 PM, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > On Thu, Nov 24, 2016 at 08:48:04AM -0700, Jeff Law wrote:
> >> On 11/24/2016 07:53 AM, Segher Boessenkool wrote:
> >> >
> >> >That we compare different kinds of costs (which really has no meaning at
> >> >all, it's a heuristic at best) in various places is a known problem, not
> >> >a regression.
> >> But the problems with the costing system exhibit themselves as a code
> >> quality regression.  In the end that's what the end-users see -- a
> >> regression in the quality of the code GCC generates.
> >
> > Yes, exactly -- and I fear this all-encompassing change will cause just
> > such a regression for many users.  Tests are running, will know more
> > later today (or tomorrow).
> >
> > The PR is about a very specific problem; the patch is not.  The patch
> > is not a bug fix.  If we allow anything that "makes things better" in
> > stage 3, what make it different from stage 1 then?
> 
> That's a good question ;)  The stage 3 definition has a loophole via
> "go file a bug about feature X, then it's a bugfix!".
> 
> I'm all open for a more sensible definition, like constraining the kind
> of non-regression fixes that we want to allow, but I fear the most
> sensible option would be to simply ditch the notion of different
> "stages" and make it "general development" and "regression fixing".
> (though if you try hard enough and go back in time you'll find that
> almost all non-enhancement bugs are regressions in some sense)

The scale goes: early stage 1, anything goes; ...; until stage 4, only
very narrow regression fixes are allowed.

Let's try to keep that spirit, and not behave like politicians following
the "rules" (or not).

> And yes, current stage3 still feels too much like stage1 ;)

Yes, very much so.  Well, at least trunk bootstraps on more targets now.

--

So IMNSHO this rtx costing change belongs in early stage 1, and should
be reverted.  If ifcvt should use full rtx cost instead of rtx_src_cost,
fix *that*, that is a much more local change.  And even then, test on
more targets please.


Segher

Jeff Law Nov. 26, 2016, 10:44 a.m. UTC | #18

On 11/24/2016 03:32 PM, Segher Boessenkool wrote:
> On Thu, Nov 24, 2016 at 10:14:24AM -0600, Segher Boessenkool wrote:
>> On Thu, Nov 24, 2016 at 08:48:04AM -0700, Jeff Law wrote:
>>> On 11/24/2016 07:53 AM, Segher Boessenkool wrote:
>>>>
>>>> That we compare different kinds of costs (which really has no meaning at
>>>> all, it's a heuristic at best) in various places is a known problem, not
>>>> a regression.
>>> But the problems with the costing system exhibit themselves as a code
>>> quality regression.  In the end that's what the end-users see -- a
>>> regression in the quality of the code GCC generates.
>>
>> Yes, exactly -- and I fear this all-encompassing change will cause just
>> such a regression for many users.  Tests are running, will know more
>> later today (or tomorrow).
>>
>> The PR is about a very specific problem; the patch is not.  The patch
>> is not a bug fix.  If we allow anything that "makes things better" in
>> stage 3, what make it different from stage 1 then?
>
> Here are results of testing with trunk right before the three patches,
> compared with with the three patches.  This lists the sizes of the vmlinux
> of a Linux kernel build for that arch.
Thanks.  While I question how much emphasis we should put on code sizes 
as a way to measure this change, it can still point out interesting 
effects, positive and negative.

 From my investigations on the m68k, the effects on the IL are minimal 
with a slight bias towards better code (by suppressing if-conversions of 
some now more costly blocks).  *But* the size of the resulting code was 
all over the place -- sometimes it was better, others worse.  From 
looking at the assembly we seemingly are copying blocks that aren't 
strictly necessary.

Enter bb-reorder and the STC algorithm.  It is copying blocks *very* 
aggressively, like absurdly aggressively on the m68k.  Of course it 
doesn't help that the m68k doesn't define a length attribute and as a 
result STC thinks every insn has size 0 and thus block copying is zero cost.

I want to verify the #s, so take this with a slight grain of salt.  The 
net changes to newlib's .o's for Bernd's work -- +30 bytes.  The effect 
of the STC issue above -- +1115586 bytes.  Or to put it another way, 
Bernd's changes, +.0003% change.  STC, +13.8%.

jeff

Eric Botcazou Nov. 26, 2016, 11:11 a.m. UTC | #19

>  From my investigations on the m68k, the effects on the IL are minimal
> with a slight bias towards better code (by suppressing if-conversions of
> some now more costly blocks).  *But* the size of the resulting code was
> all over the place -- sometimes it was better, others worse.  From
> looking at the assembly we seemingly are copying blocks that aren't
> strictly necessary.

I'm seeing essentially the same thing on SPARC, probably because of the ifcvt 
change; the rtlanal change seems to be neutral for the architecture.

Jeff Law Nov. 26, 2016, 4:15 p.m. UTC | #20

On 11/26/2016 04:11 AM, Eric Botcazou wrote:
>>  From my investigations on the m68k, the effects on the IL are minimal
>> with a slight bias towards better code (by suppressing if-conversions of
>> some now more costly blocks).  *But* the size of the resulting code was
>> all over the place -- sometimes it was better, others worse.  From
>> looking at the assembly we seemingly are copying blocks that aren't
>> strictly necessary.
>
> I'm seeing essentially the same thing on SPARC, probably because of the ifcvt
> change; the rtlanal change seems to be neutral for the architecture.
Just to be clear, I was only testing the rtlanal change, not the ifcvt 
change.

I repeated my test on the GCC runtime libraries for m68k-elf.  Bernd's 
rtlanal change +.03%, the goof in STC, +9.4%.  So the STC goof still 
dwarfs the impact to Bernd's change, but not as badly as I saw in the 
newlib codebase.


Jeff

Segher Boessenkool Nov. 26, 2016, 6:08 p.m. UTC | #21

On Sat, Nov 26, 2016 at 03:44:22AM -0700, Jeff Law wrote:
> On 11/24/2016 03:32 PM, Segher Boessenkool wrote:
> >On Thu, Nov 24, 2016 at 10:14:24AM -0600, Segher Boessenkool wrote:
> >>On Thu, Nov 24, 2016 at 08:48:04AM -0700, Jeff Law wrote:
> >>>On 11/24/2016 07:53 AM, Segher Boessenkool wrote:
> >>>>
> >>>>That we compare different kinds of costs (which really has no meaning at
> >>>>all, it's a heuristic at best) in various places is a known problem, not
> >>>>a regression.
> >>>But the problems with the costing system exhibit themselves as a code
> >>>quality regression.  In the end that's what the end-users see -- a
> >>>regression in the quality of the code GCC generates.
> >>
> >>Yes, exactly -- and I fear this all-encompassing change will cause just
> >>such a regression for many users.  Tests are running, will know more
> >>later today (or tomorrow).
> >>
> >>The PR is about a very specific problem; the patch is not.  The patch
> >>is not a bug fix.  If we allow anything that "makes things better" in
> >>stage 3, what make it different from stage 1 then?
> >
> >Here are results of testing with trunk right before the three patches,
> >compared with with the three patches.  This lists the sizes of the vmlinux
> >of a Linux kernel build for that arch.
> Thanks.  While I question how much emphasis we should put on code sizes 
> as a way to measure this change, it can still point out interesting 
> effects, positive and negative.

Code size I can test "easily" for many archs (it still takes almost a
full day), and it does correlate well with local optimisations on most
archs.  I have looked at the actual differences on some archs (which
takes a lot more time still), and the differences are all over the place.
Which suggests changing the costs is a big change for most of those
archs; and they all have been tuned for the *old* situation, so this
makes things worse in the short run, whether the new costs are better
or not.

Not a change for stage 3, and not something *I* should need to analyse
anyway; this analysis needs to be done *before* the patch goes in.

> From my investigations on the m68k, the effects on the IL are minimal 
> with a slight bias towards better code (by suppressing if-conversions of 
> some now more costly blocks).  *But* the size of the resulting code was 
> all over the place -- sometimes it was better, others worse.  From 
> looking at the assembly we seemingly are copying blocks that aren't 
> strictly necessary.
> 
> Enter bb-reorder and the STC algorithm.  It is copying blocks *very* 
> aggressively, like absurdly aggressively on the m68k.  Of course it 
> doesn't help that the m68k doesn't define a length attribute and as a 
> result STC thinks every insn has size 0 and thus block copying is zero cost.
> 
> I want to verify the #s, so take this with a slight grain of salt.  The 
> net changes to newlib's .o's for Bernd's work -- +30 bytes.  The effect 
> of the STC issue above -- +1115586 bytes.  Or to put it another way, 
> Bernd's changes, +.0003% change.  STC, +13.8%.

STC wasn't changed in the patch.  Maybe interactions with STC is what
causes all the problems, but that is an argument *against* doing this
after stage 1.


Segher

Segher Boessenkool Nov. 26, 2016, 10:03 p.m. UTC | #22

On Sat, Nov 26, 2016 at 09:15:48AM -0700, Jeff Law wrote:
> On 11/26/2016 04:11 AM, Eric Botcazou wrote:
> >> From my investigations on the m68k, the effects on the IL are minimal
> >>with a slight bias towards better code (by suppressing if-conversions of
> >>some now more costly blocks).  *But* the size of the resulting code was
> >>all over the place -- sometimes it was better, others worse.  From
> >>looking at the assembly we seemingly are copying blocks that aren't
> >>strictly necessary.
> >
> >I'm seeing essentially the same thing on SPARC, probably because of the 
> >ifcvt
> >change; the rtlanal change seems to be neutral for the architecture.
> Just to be clear, I was only testing the rtlanal change, not the ifcvt 
> change.
> 
> I repeated my test on the GCC runtime libraries for m68k-elf.  Bernd's 
> rtlanal change +.03%, the goof in STC, +9.4%.  So the STC goof still 
> dwarfs the impact to Bernd's change, but not as badly as I saw in the 
> newlib codebase.

orig, i386+rtlanal, i386+rtlanal+ifcvt:

worse:
       alpha   5439003   5455979   5455979
         c6x   2107939   2108931   2108931
        cris   2189380   2193836   2193836
        m32r   3427409   3427541   3427453
        m68k   3228408   3230978   3230978
        mips   4286748   4286964   4286692
      mips64   5564819   5565643   5565291
      parisc   8278881   8289977   8289573
    parisc64   7234619   7249187   7249139
     powerpc   8438949   8440005   8440005
   powerpc64  14499969  14508689  14508689
        s390  12778748  12779228  12779220
     shnommu   1369868   1371020   1371020
     sparc64   5921556   5922172   5922172
      tilegx  12297581  12307461  12307461
     tilepro  11215603  11227339  11227339
      xtensa   1776196   1779152   1779152

better:
    blackfin   1973931   1973867   1973867
         frv   3638192   3637792   3637792
       h8300   1060172   1059976   1059976
        i386   9742984   9742463   9742463
        ia64  15402035  15396171  15396171
     mn10300   2360025   2358201   2358201
       nios2   3185625   3176693   3176693
      x86_64  10360418  10359588  10359588

did not build:
         arc         0         0         0
         arm         0         0         0
       arm64         0         0         0
  microblaze         0         0         0
          sh         0         0         0
       sparc         0         0         0


tl;dr: The ifcvt change doesn't do much, but the cost change does.


Segher

Bernd Schmidt Nov. 28, 2016, 8:59 a.m. UTC | #23

On 11/25/2016 04:55 PM, Segher Boessenkool wrote:

> So IMNSHO this rtx costing change belongs in early stage 1, and should
> be reverted.

Done.


Bernd

Bernd Schmidt Nov. 28, 2016, 9:05 a.m. UTC | #24

On 11/25/2016 04:55 PM, Segher Boessenkool wrote:
> So IMNSHO this rtx costing change belongs in early stage 1, and should
> be reverted.

Done.


Bernd

Jeff Law Nov. 28, 2016, 6:50 p.m. UTC | #25

On 11/24/2016 09:14 AM, Segher Boessenkool wrote:
> On Thu, Nov 24, 2016 at 08:48:04AM -0700, Jeff Law wrote:
>> On 11/24/2016 07:53 AM, Segher Boessenkool wrote:
>>>
>>> That we compare different kinds of costs (which really has no meaning at
>>> all, it's a heuristic at best) in various places is a known problem, not
>>> a regression.
>> But the problems with the costing system exhibit themselves as a code
>> quality regression.  In the end that's what the end-users see -- a
>> regression in the quality of the code GCC generates.
>
> Yes, exactly -- and I fear this all-encompassing change will cause just
> such a regression for many users.  Tests are running, will know more
> later today (or tomorrow).
>
> The PR is about a very specific problem; the patch is not.  The patch
> is not a bug fix.  If we allow anything that "makes things better" in
> stage 3, what make it different from stage 1 then?
So how would you suggest this be fixed right now?  I'd really like to 
get the regression addressed.

I would claim that Bernd's patch is right from a design and 
implementation standpoint -- the issues are fallout from backend issues 
and none looked terrible to me.

jeff

Bernd Schmidt Nov. 28, 2016, 6:52 p.m. UTC | #26

On 11/28/2016 07:50 PM, Jeff Law wrote:
> So how would you suggest this be fixed right now?  I'd really like to
> get the regression addressed.

The regression is still fixed. That wasn't the case at all stages while 
I was working on it, but the i386 patch seems to suffice now.

> I would claim that Bernd's patch is right from a design and
> implementation standpoint -- the issues are fallout from backend issues
> and none looked terrible to me.

Agree but I reverted it anyway.

Bernd

[0/3] Fix PR78120, in ifcvt/rtlanal/i386.

Commit Message

Comments

Patch