diff mbox

[i386] : Sibcall tail-call improvement and partial fix PR/60104

Message ID AD1FD69F-389F-47A9-83F5-7584696E2677@comcast.net
State New
Headers show

Commit Message

Mike Stump Sept. 14, 2014, 9:38 p.m. UTC
On May 22, 2014, at 2:01 PM, Kai Tietz <ktietz@redhat.com> wrote:
> This patch adds a small improvement about sibling tail-calls.

So, I was hoping that you would weigh or fix the damage (PR61387) this does on darwin.

Here is a patch that fixes it.



Ok?

Comments

Segher Boessenkool Sept. 15, 2014, 12:43 a.m. UTC | #1
On Sun, Sep 14, 2014 at 02:38:45PM -0700, Mike Stump wrote:
> +	  SIBLING_CALL_P (tmp) = 1;
> +	  SIBLING_CALL_P (tmp) = 1;

The second time is to make sure?  :-)


Segher
Mike Stump Sept. 15, 2014, 7:33 a.m. UTC | #2
On Sep 14, 2014, at 5:43 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> On Sun, Sep 14, 2014 at 02:38:45PM -0700, Mike Stump wrote:
>> +	  SIBLING_CALL_P (tmp) = 1;
>> +	  SIBLING_CALL_P (tmp) = 1;
> 
> The second time is to make sure?  :-)

No, just a last minute cut and paste…  I’ll remove it.
Iain Sandoe Sept. 15, 2014, 10:57 a.m. UTC | #3
Hi Mike,

On 15 Sep 2014, at 08:33, Mike Stump wrote:

> On Sep 14, 2014, at 5:43 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> On Sun, Sep 14, 2014 at 02:38:45PM -0700, Mike Stump wrote:
>>> +	  SIBLING_CALL_P (tmp) = 1;
>>> +	  SIBLING_CALL_P (tmp) = 1;
>> 
>> The second time is to make sure?  :-)
> 
> No, just a last minute cut and paste…  I’ll remove it.

While the patch fixes the fallout from Kai's patch, I am concerned that:

1. It would be good to see how this [original] code path was tested on any other platform than Darwin (where it breaks).
 - I.E. a non-Mach-O test case that exercises that path of the original patch's code.
 - AFAICT this (exercise) does NOT happen for bootstrap and reg-test on x86-64-linux (so how was the original patch tested?).

2. The comment above the new code fragment has not been adjusted to reflect the new/changed functionality.

==

This has been ~ 3 months and the same questions / observations above have been raised on the PR thread and on @patches list.  This does not seem to me to be a "darwin-only" issue, and just assuming that it's some unspecified fault with Darwin's address legalisation seems like an unwarranted leap (especially for x86-64, where Darwin shares a substantial part of its ABI with Linux).

Perhaps it would be safer simply to revert that hunk of the original patch unless/until (1) and (2) above are addressed?

0.02GBP, as usual,
Iain
FX Sept. 15, 2014, 11:25 a.m. UTC | #4
> Perhaps it would be safer simply to revert that hunk of the original patch unless/until (1) and (2) above are addressed?

Given that the original patch addresses “only” a missed-optimization (and causes ice-on-valid), it makes sense to me.

FX
Jeff Law Sept. 15, 2014, 3:42 p.m. UTC | #5
On 09/15/14 05:25, FX wrote:
>> Perhaps it would be safer simply to revert that hunk of the original patch unless/until (1) and (2) above are addressed?
>
> Given that the original patch addresses “only” a missed-optimization (and causes ice-on-valid), it makes sense to me.
What I think we need is folks with an understanding of those systems to 
chime in with the information Kai needs to fix the problem.  I don't 
recall seeing that, so if I missed it, feel free to point me to it.

I'd rather not start going backwards and reverting because we simply 
haven't done the digging to really understand the issues on other other 
ports.

jeff
Iain Sandoe Sept. 15, 2014, 4:01 p.m. UTC | #6
Hi Jeff,

On 15 Sep 2014, at 16:42, Jeff Law wrote:

> On 09/15/14 05:25, FX wrote:
>>> Perhaps it would be safer simply to revert that hunk of the original patch unless/until (1) and (2) above are addressed?
>> 
>> Given that the original patch addresses “only” a missed-optimization (and causes ice-on-valid), it makes sense to me.
> What I think we need is folks with an understanding of those systems to chime in with the information Kai needs to fix the problem.  I don't recall seeing that, so if I missed it, feel free to point me to it.

The information to date is in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61387

> I'd rather not start going backwards and reverting because we simply haven't done the digging to really understand the issues on other other ports.

Well, I'm not in the habit of suggesting reverting parts of patches normally, however...

1) 

The first problem I am having is finding *any* platform test (other than Darwin) that exercises the code in question (and it fails on Darwin).

Ergo, the situation is not about "other ports" - but where is the testcase that exercises this new code on some reference port?

FWIW: When the problem first occurred, I placed a gcc_abort in that position and ran bootstrap and check on x86-64-linux without the abort being triggered (so I don't think that saying the patch "passed regression testing on x86-64-linux" is helping much here).

If Darwin has a bug, fine - then we will go hunting it - but first we need something solid to base the investigation on.

2)

Regardless of port-related issues, the code in question has been significantly altered - but the comment describing it has not been adjusted - at the least the comment should be amended to state the new intent of the code.

thanks
Iain
FX Sept. 18, 2014, 9:13 p.m. UTC | #7
> What I think we need is folks with an understanding of those systems to chime in with the information Kai needs to fix the problem.  I don't recall seeing that, so if I missed it, feel free to point me to it.
> 
> I'd rather not start going backwards and reverting because we simply haven't done the digging to really understand the issues on other other ports.

Iain has argued rather convincingly (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01201.html) that 1. the intent of the code introduced by the patch is unclear and not well document, 2. Kai has not clarified it following Iain’s request, 3. darwin maintainers have no idea how to hunt that bug, because of #1 & 2.

Given that it breaks severely a secondary platform, can we please revert it, while Iain and Kai (and others) can work on getting it better, on list or in bugzilla?

FX


PS: And yes, I know it sucks to revert a patch, and I’ve had one of mine reversed once or twice. But here Kai is not following up on this, or helping out understand what the issue is. It is not even clear that the problem is in darwin-specific code, and not in the patch itself (and only darwin exercices that code path, as Iain said)!
Kai Tietz Sept. 18, 2014, 9:26 p.m. UTC | #8
Hi,

it isn't true that I didn't replied to Iant.  I did this on IRC.  As I
explained there already, this hunk about thunks is more consolidation
of code-paths in that function, and not really part of a feature.  As
this code-path isn't prominent mark being Darwin-code - and please
don't take me wrong, but it seems to be until now the only target
reporting this issues - and therefore I strongly see the issue to be
solved for Darwin.   I don't see that this changes needs an additional
testcase demonstration on a already regression-tested target that it
doesn't break ... This is somehow like asking for gcc-testcase
demostration that gcc's darwin target isn't responsible for earth's
warming ...

Nevertheless I provided in the past already a patch which fixes the
issue well.  As underlying issue seems to be that gotpcrel relocations
aren't suitable for call-instruction's address on Darwin's target. So
disallowing for this the use via the operand-check-predicate is the
right thing to do. As this prevents in all cases that for this target
such a construct might be generated even for none-thunk case, which
seems btw not being problematic at all.

I don't agree to revert that patch.  Please provide a testcase, why my
suggested fix isn't suitable.

Regards,
Kai
FX Sept. 18, 2014, 9:35 p.m. UTC | #9
Dear Kai,

> it isn't true that I didn't replied to Iant.  I did this on IRC.

Good. I simply did not see any recent comment from you on the list, or bugzilla.

> As
> this code-path isn't prominent mark being Darwin-code - and please
> don't take me wrong, but it seems to be until now the only target
> reporting this issues

Sure, no problem. There are many code-paths in the compiler that are only taken on a subset of targets, so noone is implying that you should have tested it on all targets before committing.


> - and therefore I strongly see the issue to be
> solved for Darwin.   I don't see that this changes needs an additional
> testcase demonstration on a already regression-tested target that it
> doesn't break …

I’m afraid I don’t understand what you mean by that. I was only saying that if this part of the patch is only exercised on darwin, and it fails there, we might want to change it.

> Nevertheless I provided in the past already a patch which fixes the
> issue well.

Could you give a link to the patch? I’m not finding it. Has it been tested on darwin? If not, I can do it.


> I don't agree to revert that patch.  Please provide a testcase, why my
> suggested fix isn't suitable.

If there is a patch submitted that fixes the issue, of course reversion is bad. I was unaware of that.

FX
Kai Tietz Sept. 18, 2014, 9:56 p.m. UTC | #10
2014-09-18 23:35 GMT+02:00 FX <fxcoudert@gmail.com>:
> Dear Kai,
>
>> it isn't true that I didn't replied to Iant.  I did this on IRC.
>
> Good. I simply did not see any recent comment from you on the list, or bugzilla.
>
>> As
>> this code-path isn't prominent mark being Darwin-code - and please
>> don't take me wrong, but it seems to be until now the only target
>> reporting this issues
>
> Sure, no problem. There are many code-paths in the compiler that are only taken on a subset of targets, so noone is implying that you should have tested it on all targets before committing.
>
>
>> - and therefore I strongly see the issue to be
>> solved for Darwin.   I don't see that this changes needs an additional
>> testcase demonstration on a already regression-tested target that it
>> doesn't break ...
>
> I'm afraid I don't understand what you mean by that. I was only saying that if this part of the patch is only exercised on darwin, and it fails there, we might want to change it.
>
>> Nevertheless I provided in the past already a patch which fixes the
>> issue well.
>
> Could you give a link to the patch? I'm not finding it. Has it been tested on darwin? If not, I can do it.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61387 comment #9.  It
doesn't apply anymore due current version of predicate was altered.
Nevertheless I could easily update patch for current trunk version.

>
>> I don't agree to revert that patch.  Please provide a testcase, why my
>> suggested fix isn't suitable.
>
> If there is a patch submitted that fixes the issue, of course reversion is bad. I was unaware of that.
>
> FX

Kai
FX Sept. 18, 2014, 10:03 p.m. UTC | #11
> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61387 comment #9.  It
> doesn't apply anymore due current version of predicate was altered.
> Nevertheless I could easily update patch for current trunk version.

Sure, please send that and I’ll test it on darwin.

Thanks,
FX
Iain Sandoe Sept. 18, 2014, 10:20 p.m. UTC | #12
Hello Kai, all,

I am not concerned solely about Darwin here, certainly we can make a patch to "fix" the problem there.

My concern is for the general state of this code:

On 18 Sep 2014, at 22:26, Kai Tietz wrote:

> it isn't true that I didn't replied to Iant.

( iains ;) )

>  I did this on IRC.

=== a summary of the points I picked up from this irc conversation (and my interpretation).

1. There has been a change made "to make the upper path like the lower path" (as you said on IRC).
  - apparently (from our conversation) you don't expect this to be a general optimisation improvement.
  - but it doesn't seem to be either "obvious" or "a cleanup" to me
  - if you are asserting that it is a cleanup, then some explanation is in order.

2. Apparently you don't think it is necessary to have any testcase to demonstrate that the new code is working?
  - perhaps you are asserting that the code is "correct by inspection"?

3. You don't seem to think it necessary to amend the comments in the code to reflect the new functionality?

4. I find it difficult to accept that one can change a random part of the code, without any expectation of general improvement, that causes a breakage in some target - and then say "it's a target problem".

...

I really think that these points need addressing - if you honestly believe that this code "only affects darwin" then surely you won't object to us changing it back to the previous working case.

If you believe that the new version is benefiting any target, then please show me some (any) target that correctly exercises it.

Iain.


>  As I
> explained there already, this hunk about thunks is more consolidation
> of code-paths in that function, and not really part of a feature.  As
> this code-path isn't prominent mark being Darwin-code - and please
> don't take me wrong, but it seems to be until now the only target
> reporting this issues - and therefore I strongly see the issue to be
> solved for Darwin.   I don't see that this changes needs an additional
> testcase demonstration on a already regression-tested target that it
> doesn't break ... This is somehow like asking for gcc-testcase
> demostration that gcc's darwin target isn't responsible for earth's
> warming ...
> 
> Nevertheless I provided in the past already a patch which fixes the
> issue well.  As underlying issue seems to be that gotpcrel relocations
> aren't suitable for call-instruction's address on Darwin's target. So
> disallowing for this the use via the operand-check-predicate is the
> right thing to do. As this prevents in all cases that for this target
> such a construct might be generated even for none-thunk case, which
> seems btw not being problematic at all.
> 
> I don't agree to revert that patch.  Please provide a testcase, why my
> suggested fix isn't suitable.
Jeff Law Sept. 19, 2014, 4:14 a.m. UTC | #13
On 09/18/14 16:20, Iain Sandoe wrote:
>
> 1. There has been a change made "to make the upper path like the lower path" (as you said on IRC).
>    - apparently (from our conversation) you don't expect this to be a general optimisation improvement.
>    - but it doesn't seem to be either "obvious" or "a cleanup" to me
>    - if you are asserting that it is a cleanup, then some explanation is in order.
Seems reasonable.

>
> 2. Apparently you don't think it is necessary to have any testcase to demonstrate that the new code is working?
>    - perhaps you are asserting that the code is "correct by inspection"?
A testcase would be nice to add and I'd strongly prefer to have one in 
the suite -- even if it's Darwin specific.

My understanding is the problem is Darwin's linker (or dynamic loader?) 
can't handle the code we end up generating.  While there may be other 
systems where one could show the problem due to limitations of the 
linker/loader on thos systems, probably the most common will be Darwin. 
  So we probably need a Darwin specific test.


However, I don't think Kai has a Darwin box to do the necessary testing. 
  If you, or someone, could build a test and verify it fails without 
Kai's patch, then passes with Kai's patch, that'd be quite helpful.

>
> 3. You don't seem to think it necessary to amend the comments in the code to reflect the new functionality?
Kai, can you update the comments, please?

Jeff
Jeff Law Sept. 19, 2014, 4:21 a.m. UTC | #14
On 09/18/14 15:35, FX wrote:
>
> Could you give a link to the patch? I’m not finding it. Has it been tested on darwin? If not, I can do it.
That would be greatly appreciated.


Jeff
Jeff Law Sept. 19, 2014, 6:04 a.m. UTC | #15
On 09/18/14 15:26, Kai Tietz wrote:
> Hi,
>
> it isn't true that I didn't replied to Iant.  I did this on IRC.  As I
> explained there already, this hunk about thunks is more consolidation
> of code-paths in that function, and not really part of a feature.  As
> this code-path isn't prominent mark being Darwin-code - and please
> don't take me wrong, but it seems to be until now the only target
> reporting this issues - and therefore I strongly see the issue to be
> solved for Darwin.   I don't see that this changes needs an additional
> testcase demonstration on a already regression-tested target that it
> doesn't break ... This is somehow like asking for gcc-testcase
> demostration that gcc's darwin target isn't responsible for earth's
> warming ...
I found this a bit difficult to parse, so I'm going to try and 
summarize, please tell me if I've got it right or wrong.

The code in question is not explicitly marked as being Darwin specific; 
however, to date we've only managed to exercise it on Darwin. 
Therefore, any fix is likely to be fairly specific to Darwin's unique 
characteristics.

Furthermore, Kai believes that any new test would be redundant with the 
existing tests that are currently failing on Darwin.

Is that a correct summary?

Jeff
Iain Sandoe Sept. 19, 2014, 7:39 a.m. UTC | #16
Hello Jeff, all,

On 19 Sep 2014, at 05:14, Jeff Law wrote:

> On 09/18/14 16:20, Iain Sandoe wrote:
>> 
>> 1. There has been a change made "to make the upper path like the lower path" (as you said on IRC).
>>   - apparently (from our conversation) you don't expect this to be a general optimisation improvement.
>>   - but it doesn't seem to be either "obvious" or "a cleanup" to me
>>   - if you are asserting that it is a cleanup, then some explanation is in order.
> Seems reasonable.
> 
>> 
>> 2. Apparently you don't think it is necessary to have any testcase to demonstrate that the new code is working?
>>   - perhaps you are asserting that the code is "correct by inspection"?
> A testcase would be nice to add and I'd strongly prefer to have one in the suite -- even if it's Darwin specific.

I have not succeeded in getting this code-path to trigger on Linux** 
- on irc, IIUC, Kai was saying that he didn't expect it would trigger on either Windows or Linux?

* if it's really intended to be mach-o specific then:
 a) it should have an  "if (TARGET_MACHO && ...)"  so that it's DCE'd for other targets.
 b) I'd like a clear explanation of what it's supposed to do so that we can examine why it doesn't do that..
 c) ..and, until we fix it it, it should be disabled or left out.

* If it's not darwin-specific then surely a change in code-gen must be accompanied by some test that demonstrates it DTRT on at least one target?

--

** I put a gcc_abort() in the path and ran "bootstrap" and "make check" on x86-64-Linux without seeing any aborts from this (so currently i have no non-darwin data).

> My understanding is the problem is Darwin's linker (or dynamic loader?) can't handle the code we end up generating.  While there may be other systems where one could show the problem due to limitations of the linker/loader on thos systems, probably the most common will be Darwin.  So we probably need a Darwin specific test.

This is not a ld64 or dyld issue.
The fail is a compiler ICE (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61387)

Kai's speculation is that there's some unspecified fault with Darwin's address legalisation, although I have pointed out that for x86_64, the darwin and linux ABIs are the same, so there's a lot of shared code (the code is triggered by a different symbol visibility - which might, in itself, be a bug but that's separate from the matter at hand). 

---

IMO, we need to put Darwin to one side, and have a clear explanation of what the change is supposed to achieve on a NON-Darwin system.

> However, I don't think Kai has a Darwin box to do the necessary testing.  If you, or someone, could build a test and verify it fails without Kai's patch, then passes with Kai's patch, that'd be quite helpful.

If there's a genuine Darwin problem, then we will try to fix it, of course;  and Darwin folks will be happy to test prospective patches.

 - but I remain concerned that this is just papering over an underlying issue with the change, that is being thrown up by Darwin (by luck that it happens to trigger the code path).

>> 3. You don't seem to think it necessary to amend the comments in the code to reflect the new functionality?
> Kai, can you update the comments, please?

thanks for reviewing,
Iain

> 
> Jeff
FX Sept. 19, 2014, 7:43 a.m. UTC | #17
> I found this a bit difficult to parse, so I'm going to try and summarize, please tell me if I've got it right or wrong.
> 
> The code in question is not explicitly marked as being Darwin specific; however, to date we've only managed to exercise it on Darwin. Therefore, any fix is likely to be fairly specific to Darwin's unique characteristics.
> 
> Furthermore, Kai believes that any new test would be redundant with the existing tests that are currently failing on Darwin.
> 
> Is that a correct summary?

That seems correct, yes. Something in Darwin’s handling of visibility triggers it.
One more point, unanswered in what I’ve seen, is this from Iain:

> b) I'd like a clear explanation of what it's supposed to do so that we can examine why it doesn't do that..
> c) ..and, until we fix it it, it should be disabled or left out.

FX
diff mbox

Patch

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 215252)
+++ config/i386/i386.c	(working copy)
@@ -38968,9 +38968,12 @@  x86_output_mi_thunk (FILE *file, tree, H
     {
       if (sibcall_insn_operand (fnaddr, word_mode))
 	{
-	  tmp = gen_rtx_CALL (VOIDmode, fnaddr, const0_rtx);
-          tmp = emit_call_insn (tmp);
-          SIBLING_CALL_P (tmp) = 1;
+	  fnaddr = XEXP (DECL_RTL (function), 0);
+	  tmp = gen_rtx_MEM (QImode, fnaddr);
+	  tmp = gen_rtx_CALL (VOIDmode, tmp, const0_rtx);
+	  tmp = emit_call_insn (tmp);
+	  SIBLING_CALL_P (tmp) = 1;
+	  SIBLING_CALL_P (tmp) = 1;
 	}
       else
 	emit_jump_insn (gen_indirect_jump (fnaddr));