diff mbox

[PARCH,1/2,x86,PR63534] Fix darwin bootstrap

Message ID CAOvf_xwDUU=gsQqHgybovAFHbn1+OVHYNBw+=xG0jau1wG2HDg@mail.gmail.com
State New
Headers show

Commit Message

Evgeny Stupachenko Oct. 17, 2014, 2:08 p.m. UTC
Hi,

The patch fixes 1st fail in darwin bootstarp.
When PIC register is pseudo we don't need to init it after setjmp or
non local goto.

Is it ok?

ChangeLog:

2014-10-17  Evgeny Stupachenko  <evstupac@gmail.com>

        PR target/63534
        * config/i386/i386.c (builtin_setjmp_receiver): Delete.
        (nonlocal_goto_receiver): Ditto.

Comments

Mike Stump Oct. 17, 2014, 6:32 p.m. UTC | #1
On Oct 17, 2014, at 7:08 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> The patch fixes 1st fail in darwin bootstarp.
> When PIC register is pseudo we don't need to init it after setjmp or
> non local goto.
> 
> Is it ok?

So, I don’t see commentary in the PR that all fallout and all bugs introduced are fixed by the patch.  :-(

Given how central pic is to code-gen, I don’t favor any patch, until all bugs and regressions are fixed.  If they can’t be, then I favor reversion of the patch that broke everything.

Additionally, if given the types of changes to codegen, I’d like to see the before and after changes to try and ensure that code-gen quality isn’t regressed.  For example, is the pic register saved and restored?  If restored, is the code to save it actually better than simply appearing it out of thin air as did previously?  If the patch that did all the pic work was in one patch, it would be easier for me to see the change in its entirety.
Evgeny Stupachenko Oct. 22, 2014, 5:24 p.m. UTC | #2
>For example, is the pic register saved and restored?
Yes, if RA decided that it is better to save it.

>If restored, is the code to save it actually better than simply appearing it out of thin air as did previously?
"enabling ebx" gives performance mostly to loops. There still could be
some performance issues, however we got pretty good results on a set
of benchmarks:
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00892.html

To get performance diff now you can compare r216153 svn revision and
r216304 with the patch in 63534#c33

There was quite long discussion on enabling starting here:
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg02186.html

On Fri, Oct 17, 2014 at 10:32 PM, Mike Stump <mikestump@comcast.net> wrote:
> On Oct 17, 2014, at 7:08 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> The patch fixes 1st fail in darwin bootstarp.
>> When PIC register is pseudo we don't need to init it after setjmp or
>> non local goto.
>>
>> Is it ok?
>
> So, I don’t see commentary in the PR that all fallout and all bugs introduced are fixed by the patch.  :-(
>
> Given how central pic is to code-gen, I don’t favor any patch, until all bugs and regressions are fixed.  If they can’t be, then I favor reversion of the patch that broke everything.
>
> Additionally, if given the types of changes to codegen, I’d like to see the before and after changes to try and ensure that code-gen quality isn’t regressed.  For example, is the pic register saved and restored?  If restored, is the code to save it actually better than simply appearing it out of thin air as did previously?  If the patch that did all the pic work was in one patch, it would be easier for me to see the change in its entirety.
Jeff Law Oct. 31, 2014, 8:14 p.m. UTC | #3
On 10/17/14 08:08, Evgeny Stupachenko wrote:
> Hi,
>
> The patch fixes 1st fail in darwin bootstarp.
> When PIC register is pseudo we don't need to init it after setjmp or
> non local goto.
>
> Is it ok?
>
> ChangeLog:
>
> 2014-10-17  Evgeny Stupachenko  <evstupac@gmail.com>
>
>          PR target/63534
>          * config/i386/i386.c (builtin_setjmp_receiver): Delete.
>          (nonlocal_goto_receiver): Ditto.

Why do you think they're not needed?  The builtin setjmp/longjmp 
implementation do not behave like what you're used to in the C library. 
  Specifically, they do not save/restore register state beyond SP, FP 
and possibly ARGP.

So let's take one step back -- what precisely about these patterns was 
causing a problem?    My initial inclination is that we must set the PIC 
register here in the same manner as it's set in the prologue.



Jeff
Evgeny Stupachenko Nov. 1, 2014, 12:39 p.m. UTC | #4
When PIC register is pseudo there is nothing special about it's value
that setjmp can hurt. So if the pseudo register lives across
setjmp_receiver RA should care about correct allocation (in case it is
not saved/restored, it should go on stack). gcc.dg tests and specs
I've tested behave like this.

The initial problem comes from non-local goto as it tries to emit
pseudo PIC register after reload.

On Fri, Oct 31, 2014 at 11:14 PM, Jeff Law <law@redhat.com> wrote:
> On 10/17/14 08:08, Evgeny Stupachenko wrote:
>>
>> Hi,
>>
>> The patch fixes 1st fail in darwin bootstarp.
>> When PIC register is pseudo we don't need to init it after setjmp or
>> non local goto.
>>
>> Is it ok?
>>
>> ChangeLog:
>>
>> 2014-10-17  Evgeny Stupachenko  <evstupac@gmail.com>
>>
>>          PR target/63534
>>          * config/i386/i386.c (builtin_setjmp_receiver): Delete.
>>          (nonlocal_goto_receiver): Ditto.
>
>
> Why do you think they're not needed?  The builtin setjmp/longjmp
> implementation do not behave like what you're used to in the C library.
> Specifically, they do not save/restore register state beyond SP, FP and
> possibly ARGP.
>
> So let's take one step back -- what precisely about these patterns was
> causing a problem?    My initial inclination is that we must set the PIC
> register here in the same manner as it's set in the prologue.
>
>
>
> Jeff
Mike Stump Nov. 1, 2014, 5:33 p.m. UTC | #5
On Nov 1, 2014, at 5:39 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> When PIC register is pseudo there is nothing special about it's value
> that setjmp can hurt. So if the pseudo register lives across
> setjmp_receiver RA should care about correct allocation (in case it is
> not saved/restored, it should go on stack).

So, why is consuming more stack space beneficial?
Jeff Law Nov. 3, 2014, 10:40 p.m. UTC | #6
On 11/01/14 06:39, Evgeny Stupachenko wrote:
> When PIC register is pseudo there is nothing special about it's value
> that setjmp can hurt. So if the pseudo register lives across
> setjmp_receiver RA should care about correct allocation (in case it is
> not saved/restored, it should go on stack). gcc.dg tests and specs
> I've tested behave like this.
If the allocator picked a call-clobbered register for the PIC register, 
then we're obviously OK since the setjmp has to be expected to clobber 
the PIC register.

But if the PIC register is in a call-saved register, then it's going to 
be assumed to not be clobbered across calls and I don't believe that is 
guaranteed for builtin setjmp/longjmp.  Those restore SP, FP and an 
ARGP, but not anything else by default.

So the callee might have clobbered the call saved hard register, 
expecting to restore its value in its epilogue.  But due to the longjmp, 
that epilogue never gets called and thus the call-saved register won't 
have the right value in the receiver.

Now if your argument is that IRA/LRA handle this, that's fine, a pointer 
to that code would be appreciated so that it can be quickly audited. 
Certainly the old local-alloc/global-alloc had magic for setjmp/longjmp 
and maybe IRA/LRA does too, but it's better to be sure than just assume.

>
> The initial problem comes from non-local goto as it tries to emit
> pseudo PIC register after reload.
?  You mean it emits a reference to the pseudo into RTL?  That would 
indicate that the allocators never put the pseudo into a hard 
register?!?  RTL dumps with a few pointers to key insns would help here.

jeff
Eric Botcazou Nov. 5, 2014, 11:54 a.m. UTC | #7
> Now if your argument is that IRA/LRA handle this, that's fine, a pointer
> to that code would be appreciated so that it can be quickly audited.
> Certainly the old local-alloc/global-alloc had magic for setjmp/longjmp
> and maybe IRA/LRA does too, but it's better to be sure than just assume.

See ira-lives.c:1217 and below.
Evgeny Stupachenko Nov. 5, 2014, 12:59 p.m. UTC | #8
On Tue, Nov 4, 2014 at 1:40 AM, Jeff Law <law@redhat.com> wrote:
> On 11/01/14 06:39, Evgeny Stupachenko wrote:
>>
>> When PIC register is pseudo there is nothing special about it's value
>> that setjmp can hurt. So if the pseudo register lives across
>> setjmp_receiver RA should care about correct allocation (in case it is
>> not saved/restored, it should go on stack). gcc.dg tests and specs
>> I've tested behave like this.
>
> If the allocator picked a call-clobbered register for the PIC register, then
> we're obviously OK since the setjmp has to be expected to clobber the PIC
> register.
>
> But if the PIC register is in a call-saved register, then it's going to be
> assumed to not be clobbered across calls and I don't believe that is
> guaranteed for builtin setjmp/longjmp.  Those restore SP, FP and an ARGP,
> but not anything else by default.

I still don't see what is special for PIC register here. PIC pseudo
now behave as every other pseudo register.
If we assume that setjmp can change a pseudo register value we need
IRA/LRA "magic" for each pseudo register.

I believe that when we had EBX fixed, IRA/LRA don't save/restore it anywhere.
Therefore we had to care about EBX value in special cases like
setjmp/non-local goto.
Now RA cares about PIC pseudo as well as about correct allocation for
any pseudo register.

>
> So the callee might have clobbered the call saved hard register, expecting
> to restore its value in its epilogue.  But due to the longjmp, that epilogue
> never gets called and thus the call-saved register won't have the right
> value in the receiver.
>
> Now if your argument is that IRA/LRA handle this, that's fine, a pointer to
> that code would be appreciated so that it can be quickly audited. Certainly
> the old local-alloc/global-alloc had magic for setjmp/longjmp and maybe
> IRA/LRA does too, but it's better to be sure than just assume.
>
>>
>> The initial problem comes from non-local goto as it tries to emit
>> pseudo PIC register after reload.
>
> ?  You mean it emits a reference to the pseudo into RTL?  That would
> indicate that the allocators never put the pseudo into a hard register?!?
> RTL dumps with a few pointers to key insns would help here.

Correct, that is why Darwin crashes with ICE on non-local goto.
We still have:

(define_insn_and_split "nonlocal_goto_receiver"
  [(unspec_volatile [(const_int 0)] UNSPECV_NLGR)]
  "TARGET_MACHO && !TARGET_64BIT && flag_pic"
  "#"
  "&& reload_completed"
  [(const_int 0)]
{
  if (crtl->uses_pic_offset_table)
    {
      rtx xops[3];
      rtx label_rtx = gen_label_rtx ();
      rtx tmp;

      /* Get a new pic base.  */
      emit_insn (gen_set_got_labelled (pic_offset_table_rtx, label_rtx));
.....
Here for MAC only we are trying to use pseudo PIC:
pic_offset_table_rtx when reload_completed.

>
> jeff
>
>
Evgeny Stupachenko Nov. 5, 2014, 1:04 p.m. UTC | #9
We don't emit extra SET_GOT. That is beneficial.
As for stack usage, that is RA to decide which register is more
beneficial to put on stack.

On Sat, Nov 1, 2014 at 8:33 PM, Mike Stump <mikestump@comcast.net> wrote:
> On Nov 1, 2014, at 5:39 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> When PIC register is pseudo there is nothing special about it's value
>> that setjmp can hurt. So if the pseudo register lives across
>> setjmp_receiver RA should care about correct allocation (in case it is
>> not saved/restored, it should go on stack).
>
> So, why is consuming more stack space beneficial?
Jeff Law Nov. 7, 2014, 8:06 p.m. UTC | #10
On 11/05/14 04:54, Eric Botcazou wrote:
>> Now if your argument is that IRA/LRA handle this, that's fine, a pointer
>> to that code would be appreciated so that it can be quickly audited.
>> Certainly the old local-alloc/global-alloc had magic for setjmp/longjmp
>> and maybe IRA/LRA does too, but it's better to be sure than just assume.
>
> See ira-lives.c:1217 and below.
Thanks.

So that code creates a set of conflicts which, if I'm reading correctly, 
will prevent the PIC value from living in a register at all.  Which 
ought to result in it being dumped into the stack and being reloaded for 
each use.  Which ought to be safe (modulo the liveness bug Vlad is 
working on right now).

Does that sound right to either of you?

jeff
Jeff Law Nov. 7, 2014, 8:18 p.m. UTC | #11
On 11/05/14 05:59, Evgeny Stupachenko wrote:
> On Tue, Nov 4, 2014 at 1:40 AM, Jeff Law <law@redhat.com> wrote:
>> On 11/01/14 06:39, Evgeny Stupachenko wrote:
>>>
>>> When PIC register is pseudo there is nothing special about it's
>>> value that setjmp can hurt. So if the pseudo register lives
>>> across setjmp_receiver RA should care about correct allocation
>>> (in case it is not saved/restored, it should go on stack). gcc.dg
>>> tests and specs I've tested behave like this.
>>
>> If the allocator picked a call-clobbered register for the PIC
>> register, then we're obviously OK since the setjmp has to be
>> expected to clobber the PIC register.
>>
>> But if the PIC register is in a call-saved register, then it's
>> going to be assumed to not be clobbered across calls and I don't
>> believe that is guaranteed for builtin setjmp/longjmp.  Those
>> restore SP, FP and an ARGP, but not anything else by default.
>
> I still don't see what is special for PIC register here. PIC pseudo
> now behave as every other pseudo register. If we assume that setjmp
> can change a pseudo register value we need IRA/LRA "magic" for each
> pseudo register.
And that's precisely what we have as Eric pointed out.  Any allocnos 
that are live across calls are not allocated into hard registers if we 
call setjmp or can receive a nonlocal goto.


>
> I believe that when we had EBX fixed, IRA/LRA don't save/restore it
> anywhere. Therefore we had to care about EBX value in special cases
> like setjmp/non-local goto. Now RA cares about PIC pseudo as well as
> about correct allocation for any pseudo register.
Right.  But what was missing was the explanation why this change is 
correct.  With the knowledge about how IRA handles objects under these 
circumstances that Eric pointed to, it becomes much easier to see that 
the special handling is no longer desirable.

>> ?  You mean it emits a reference to the pseudo into RTL?  That
>> would indicate that the allocators never put the pseudo into a hard
>> register?!? RTL dumps with a few pointers to key insns would help
>> here.
>
> Correct, that is why Darwin crashes with ICE on non-local goto. We
> still have:
I was referring to the generated RTL to confirm what I thought you 
stated, namely that we ended up with a reference to the pseudo being 
emitted into the insn stream after reload.

The only way I could see that happening would be if the pseudo wasn't 
allocated a hard register at all.  Which we now know makes sense after 
Eric pointed us the magic in ira-lives.c.

In the end it all comes down to what is the behaviour of the allocator 
for a value that is live across calls in these kinds of functions.

I think I've got enough background to properly review now ;-)

Jeff
Eric Botcazou Nov. 8, 2014, 9:24 a.m. UTC | #12
> So that code creates a set of conflicts which, if I'm reading correctly,
> will prevent the PIC value from living in a register at all.  Which
> ought to result in it being dumped into the stack and being reloaded for
> each use.  Which ought to be safe (modulo the liveness bug Vlad is
> working on right now).
> 
> Does that sound right to either of you?

Yes, that's also my understanding of the code and the behavior required by 
builtin setjmp/longjmp.  I can add that the Ada compiler would fall apart if 
this didn't work correctly in the compiler, and especially in the RA.
diff mbox

Patch

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 624a1c1..fc3776f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16927,57 +16927,6 @@ 
   "* return output_probe_stack_range (operands[0], operands[2]);"
   [(set_attr "type" "multi")])

-(define_expand "builtin_setjmp_receiver"
-  [(label_ref (match_operand 0))]
-  "!TARGET_64BIT && flag_pic"
-{
-#if TARGET_MACHO
-  if (TARGET_MACHO)
-    {
-      rtx xops[3];
-      rtx picreg = gen_rtx_REG (Pmode, PIC_OFFSET_TABLE_REGNUM);
-      rtx_code_label *label_rtx = gen_label_rtx ();
-      emit_insn (gen_set_got_labelled (pic_offset_table_rtx, label_rtx));
-      xops[0] = xops[1] = picreg;
-      xops[2] = machopic_gen_offset (gen_rtx_LABEL_REF (SImode, label_rtx));
-      ix86_expand_binary_operator (MINUS, SImode, xops);
-    }
-  else
-#endif
-    emit_insn (gen_set_got (pic_offset_table_rtx));
-  DONE;
-})
-
-(define_insn_and_split "nonlocal_goto_receiver"
-  [(unspec_volatile [(const_int 0)] UNSPECV_NLGR)]
-  "TARGET_MACHO && !TARGET_64BIT && flag_pic"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  if (crtl->uses_pic_offset_table)
-    {
-      rtx xops[3];
-      rtx label_rtx = gen_label_rtx ();
-      rtx tmp;
-
-      /* Get a new pic base.  */
-      emit_insn (gen_set_got_labelled (pic_offset_table_rtx, label_rtx));
-      /* Correct this with the offset from the new to the old.  */
-      xops[0] = xops[1] = pic_offset_table_rtx;
-      label_rtx = gen_rtx_LABEL_REF (SImode, label_rtx);
-      tmp = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, label_rtx),
-                           UNSPEC_MACHOPIC_OFFSET);
-      xops[2] = gen_rtx_CONST (Pmode, tmp);
-      ix86_expand_binary_operator (MINUS, SImode, xops);
-    }
-  else
-    /* No pic reg restore needed.  */
-    emit_note (NOTE_INSN_DELETED);
-
-  DONE;
-})
-
 ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode.
 ;; Do not split instructions with mask registers.
 (define_split