diff mbox series

[RFA] Stack clash protection 06/08 - V4

Message ID af150425-213a-eb58-046d-556a044bd6da@redhat.com
State New
Headers show
Series [RFA] Stack clash protection 06/08 - V4 | expand

Commit Message

Jeff Law Sept. 22, 2017, 5:49 p.m. UTC
Enough changed since V3 that I think another review round on the ppc
bits is warranted.

wrap_frame_mem is gone.  Allocation, attribute setting and
notes/scheduler barriers are now in rs6000_emit_allocate_stack_1.

handle_residual is gone, it's folded into rs6000_emit_allocate_stack_1.

rs6000_emit_probe_stack_range_stack_clash now returns an insn with the
first stack adjustment or just before the stack adjustment loop.  This
seems more correct given how its return value (sp_adjust) is used.

The go test suite has been run with and without this patch.  I've found
os/signal seems to flip between pass and fail without rhyme or reason.
It may be related to system load.  TestCgoCallbackGC now passes for
reasons unknown.  I haven't run it enough to know if it is sensitive to
load or other timing factors.

I also flipped things so that clash protection is enabled by default and
re-ran the tests.  The idea being to see if I could exercise the path
that uses SP_ADJUST a bit more.  But that gave me the same results.
While I think the change in the return value in
rs6000_emit_probe_stack_range_stack_clash is more correct, I don't have
a good way to test it.

There were other minor improvements pointed out by Segher that were
fixed as well.

I kept the separation of the case where there's no stack space
allocated.   I did look at pushing that into
emit_probe_stack_range_stack_clash, but doing so actually complicates
things.  It's cleaner to leave it as-is.


OK for the trunk?

Jeff
commit b25ea3141981f833f3d1211f50209f349c2eae3e
Author: Jeff Law <law@redhat.com>
Date:   Fri Sep 22 11:47:26 2017 -0600

            * config/rs6000/rs6000-protos.h (output_probe_stack_range): Update
            prototype for new argument.
            * config/rs6000/rs6000.c (rs6000_emit_allocate_stack_1): New function,
            mostly extracted from rs6000_emit_allocate_stack.
            (rs6000_emit_probe_stack_range_stack_clash): New function.
            (rs6000_emit_allocate_stack): Call
            rs6000_emit_probe_stack_range_stack_clash as needed.
            (rs6000_emit_probe_stack_range): Add additional argument
            to call to gen_probe_stack_range{si,di}.
            (output_probe_stack_range): New.
            (output_probe_stack_range_1): Renamed from output_probe_stack_range.
            (output_probe_stack_range_stack_clash): New.
            (rs6000_emit_prologue): Emit notes into dump file as requested.
            * rs6000.md (allocate_stack): Handle -fstack-clash-protection.
            (probe_stack_range<P:mode>): Operand 0 is now early-clobbered.
            Add additional operand and pass it to output_probe_stack_range.
    
            * lib/target-supports.exp
            (check_effective_target_supports_stack_clash_protection): Enable for
            rs6000 and powerpc targets.

Comments

Segher Boessenkool Sept. 25, 2017, 10:52 a.m. UTC | #1
Hi Jeff,

On Fri, Sep 22, 2017 at 11:49:04AM -0600, Jeff Law wrote:
> The go test suite has been run with and without this patch.  I've found
> os/signal seems to flip between pass and fail without rhyme or reason.
> It may be related to system load.  TestCgoCallbackGC now passes for
> reasons unknown.  I haven't run it enough to know if it is sensitive to
> load or other timing factors.

Many Go tests flip-flop.  I don't know why.

> I also flipped things so that clash protection is enabled by default and
> re-ran the tests.  The idea being to see if I could exercise the path
> that uses SP_ADJUST a bit more.  But that gave me the same results.
> While I think the change in the return value in
> rs6000_emit_probe_stack_range_stack_clash is more correct, I don't have
> a good way to test it.

Did you also test Ada?  It needs testing.  I wanted to try it myself,
but your patch doesn't apply (you included the changelog bits in the
patch), and I cannot easily manually apply it either because you sent
it as base64 instead of as text.

(... Okay, I think I have it working; testing now).

Some comments, mostly trivial comment stuff:


> +/* Allocate SIZE_INT bytes on the stack using a store with update style insn
> +   and set the appropriate attributes for the generated insn.  Return the
> +   generated insn.

"Return the first generated insn"?

> +   SIZE_INT is used to create the CFI note for the allocation.
> +
> +   SIZE_RTX is an rtx containing the size of the adjustment.  Note that
> +   since stacks grow to lower addresses its runtime value is -SIZE_INT.

The size_rtx doesn't always correspond to size_int so this a bit
misleading.

(These comments were in the original code already, oh well).

> +   COPY_REG, if non-null, should contain a copy of the original
> +   stack pointer at exit from this function.

"Return a copy of the original stack pointer in COPY_REG if that is
non-null"?  It wasn't clear to me that it is this function that should
set it :-)

> +static rtx_insn *
> +rs6000_emit_probe_stack_range_stack_clash (HOST_WIDE_INT orig_size,
> +					   rtx copy_reg)
> +{
> +  rtx orig_sp = copy_reg;
> +
> +  HOST_WIDE_INT probe_interval
> +    = 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);

HOST_WIDE_INT_1U << ...

> +  /* If explicitly requested,
> +       or the rounded size is not the same as the original size
> +       or the the rounded size is greater than a page,
> +     then we will need a copy of the original stack pointer.  */
> +  if (rounded_size != orig_size
> +      || rounded_size > probe_interval
> +      || copy_reg)
> +    {
> +      /* If the caller requested a copy of the incoming stack pointer,
> +	 then ORIG_SP == COPY_REG and will not be NULL.
> +
> +	 If no copy was requested, then we use r0 to hold the copy.  */
> +      if (orig_sp == NULL_RTX)
> +	orig_sp = gen_rtx_REG (Pmode, 0);
> +      emit_move_insn (orig_sp, stack_pointer_rtx);

Maybe just write the "if" as "if (!copy_reg)"?  You can lose the first
half of the comment then (since it is obvious then).

> +      for (int i = 0; i < rounded_size; i += probe_interval)
> +	{
> +	  rtx_insn *insn
> +	    = rs6000_emit_allocate_stack_1 (probe_interval,
> +					    probe_int, orig_sp);
> +	  if (!retval)
> +	    retval = insn;
> +	}

Maybe "if (i == 0)" is clearer?

> @@ -25509,6 +25703,23 @@ rs6000_emit_allocate_stack (HOST_WIDE_INT size, rtx copy_reg, int copy_off)
>  	warning (0, "stack limit expression is not supported");
>      }
>  
> +  if (flag_stack_clash_protection)
> +    {
> +      if (size < (1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE)))

HOST_WIDE_INT_1U again.

> +/* Probe a range of stack addresses from REG1 to REG3 inclusive.  These are
> +   absolute addresses.  REG2 contains the backchain that must be stored into
> +   *sp at each allocation.

I would just remove "These are absolute addresses.", or write something
like "These are addresses, not offsets", but that is kind of obvious
isn't it ;-)

> +static const char *
> +output_probe_stack_range_stack_clash (rtx reg1, rtx reg2, rtx reg3)
> +{
> +  static int labelno = 0;
> +  char loop_lab[32];
> +  rtx xops[3];
> +
> +  HOST_WIDE_INT probe_interval
> +    = 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);

Once more :-)  Maybe a helper function is in order?  Would avoid the
huge length names at least.


Bootstrap+testsuite finished on BE, but I forgot to enable stack-clash
protection by default, whoops.  Will have results later today (also LE).


Segher
Segher Boessenkool Sept. 25, 2017, 12:41 p.m. UTC | #2
On Mon, Sep 25, 2017 at 05:52:27AM -0500, Segher Boessenkool wrote:
> Bootstrap+testsuite finished on BE, but I forgot to enable stack-clash
> protection by default, whoops.  Will have results later today (also LE).

Some new failures show up:

+FAIL: c-c++-common/ubsan/vla-1.c   -O0  execution test

/home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:18:7: runtime error: variable length array bound evaluates to non-positive value -1
/home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:24:7: runtime error: variable length array bound evaluates to non-positive value -1
/home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:24:7: runtime error: variable length array bound evaluates to non-positive value -1
/home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:30:7: runtime error: variable length array bound evaluates to non-positive value -1
/home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:30:7: runtime error: variable length array bound evaluates to non-positive value -1
/home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:30:7: runtime error: variable length array bound evaluates to non-positive value -1
/home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:36:7: runtime error: variable length array bound evaluates to non-positive value -5

(both gcc and g++, both -m32 and -m64).

This is BE; LE is still running.


Segher
Segher Boessenkool Sept. 25, 2017, 2:14 p.m. UTC | #3
On Mon, Sep 25, 2017 at 07:41:18AM -0500, Segher Boessenkool wrote:
> On Mon, Sep 25, 2017 at 05:52:27AM -0500, Segher Boessenkool wrote:
> > Bootstrap+testsuite finished on BE, but I forgot to enable stack-clash
> > protection by default, whoops.  Will have results later today (also LE).
> 
> Some new failures show up:
> 
> +FAIL: c-c++-common/ubsan/vla-1.c   -O0  execution test
> 
> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:18:7: runtime error: variable length array bound evaluates to non-positive value -1
> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:24:7: runtime error: variable length array bound evaluates to non-positive value -1
> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:24:7: runtime error: variable length array bound evaluates to non-positive value -1
> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:30:7: runtime error: variable length array bound evaluates to non-positive value -1
> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:30:7: runtime error: variable length array bound evaluates to non-positive value -1
> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:30:7: runtime error: variable length array bound evaluates to non-positive value -1
> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:36:7: runtime error: variable length array bound evaluates to non-positive value -5
> 
> (both gcc and g++, both -m32 and -m64).
> 
> This is BE; LE is still running.

LE show the same, but also

                === acats tests ===
+FAIL:  c52103x
+FAIL:  c52104x
+FAIL:  c52104y
+FAIL:  cb1010a

These are

---- C52103X CHECK THAT IN ARRAY ASSIGNMENTS AND IN SLICE ASSIGNMENTS,
                THE LENGTHS MUST MATCH; ALSO CHECK WHETHER
                CONSTRAINT_ERROR OR STORAGE_ERROR ARE RAISED FOR LARGE
                ARRAYS.
   - C52103X NO CONSTRAINT_ERROR FOR TYPE WITH 'LENGTH = INTEGER'LAST + 
                3.

raised STORAGE_ERROR : stack overflow or erroneous memory access
FAIL:	c52103x

(twice)

---- C52104Y CHECK THAT IN ARRAY ASSIGNMENTS AND IN SLICE ASSIGNMENTS,
                THE LENGTHS MUST MATCH.
   - C52104Y NO CONSTRAINT_ERROR FOR NON-NULL ARRAY SUBTYPE WHEN ONE
                DIMENSION HAS INTEGER'LAST + 3 COMPONENTS.

raised STORAGE_ERROR : stack overflow or erroneous memory access
FAIL:   c52104y

and,

---- CB1010A CHECK THAT STORAGE_ERROR IS RAISED WHEN STORAGE ALLOCATED
                TO A TASK IS EXCEEDED.
   - CB1010A CHECK TASKS THAT DO NOT HANDLE STORAGE_ERROR PRIOR TO
                RENDEZVOUS.
FAIL:   cb1010a


Segher
Jeff Law Sept. 25, 2017, 3:29 p.m. UTC | #4
On 09/25/2017 06:41 AM, Segher Boessenkool wrote:
> On Mon, Sep 25, 2017 at 05:52:27AM -0500, Segher Boessenkool wrote:
>> Bootstrap+testsuite finished on BE, but I forgot to enable stack-clash
>> protection by default, whoops.  Will have results later today (also LE).
> 
> Some new failures show up:
> 
> +FAIL: c-c++-common/ubsan/vla-1.c   -O0  execution test
> 
> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:18:7: runtime error: variable length array bound evaluates to non-positive value -1
> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:24:7: runtime error: variable length array bound evaluates to non-positive value -1
> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:24:7: runtime error: variable length array bound evaluates to non-positive value -1
> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:30:7: runtime error: variable length array bound evaluates to non-positive value -1
> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:30:7: runtime error: variable length array bound evaluates to non-positive value -1
> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:30:7: runtime error: variable length array bound evaluates to non-positive value -1
> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:36:7: runtime error: variable length array bound evaluates to non-positive value -5
Yes.  I've known about this.

What happens is ubsan detects the error, but allows the code to continue
to run and try to allocate huge stacks.  The stack clash code comes
along and tries to probe the just allocated space which fails in the
expected manner.  It's really a testsuite issue and not an issue with
either UB or stack clash protection -- that's why I didn't call it out.

We could ask the sanitizers to abort on detecting UB, but then the test
itself needs to be split up (and that's the right thing to do IMHO).

There are other tests which are going to fail -- things like mixing
-fstack-check and -fstack-clash and an assortment of guality things.

Jeff
Jeff Law Sept. 25, 2017, 4 p.m. UTC | #5
On 09/25/2017 04:52 AM, Segher Boessenkool wrote:
> 
>> I also flipped things so that clash protection is enabled by default and
>> re-ran the tests.  The idea being to see if I could exercise the path
>> that uses SP_ADJUST a bit more.  But that gave me the same results.
>> While I think the change in the return value in
>> rs6000_emit_probe_stack_range_stack_clash is more correct, I don't have
>> a good way to test it.
> 
> Did you also test Ada?  It needs testing.  I wanted to try it myself,
> but your patch doesn't apply (you included the changelog bits in the
> patch), and I cannot easily manually apply it either because you sent
> it as base64 instead of as text.
I didn't test Ada with -fstack-clash-protection on by default.  I did
test it as part of the normal bootstrap & regression test cycle with no
changes in the Ada testsuites.

Testing it with stack clash protection on by default is easy to do :-)
I wouldn't be surprised to see failures since some tests are known to
test/use -fstack-check= which explicitly conflicts with stack clash
protection.

> 
> (... Okay, I think I have it working; testing now).
> 
> Some comments, mostly trivial comment stuff:

> 
> 
>> +/* Allocate SIZE_INT bytes on the stack using a store with update style insn
>> +   and set the appropriate attributes for the generated insn.  Return the
>> +   generated insn.
> 
> "Return the first generated insn"?
Fixed.

>> +   SIZE_INT is used to create the CFI note for the allocation.
>> +
>> +   SIZE_RTX is an rtx containing the size of the adjustment.  Note that
>> +   since stacks grow to lower addresses its runtime value is -SIZE_INT.
> 
> The size_rtx doesn't always correspond to size_int so this a bit
> misleading.
The value at runtime of SIZE_RTX is always -SIZE_INT.   It might be held
in a register, but that register's value is always -SIZE_INT.


> 
> (These comments were in the original code already, oh well).
Happy to adjust if you've got a suggestion on how to make it clearer.


> 
>> +   COPY_REG, if non-null, should contain a copy of the original
>> +   stack pointer at exit from this function.
> 
> "Return a copy of the original stack pointer in COPY_REG if that is
> non-null"?  It wasn't clear to me that it is this function that should
> set it :-)
It's not clear to me either.  I'm just trying to preserve behavior of
the existing code in its handling of COPY_REG and COPY_OFF.



> 
>> +static rtx_insn *
>> +rs6000_emit_probe_stack_range_stack_clash (HOST_WIDE_INT orig_size,
>> +					   rtx copy_reg)
>> +{
>> +  rtx orig_sp = copy_reg;
>> +
>> +  HOST_WIDE_INT probe_interval
>> +    = 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
> 
> HOST_WIDE_INT_1U << ...
It won't matter in practice because of the limits we've put on those
PARAMs, but sure, easy to do except for the long lines, even in a helper
function :(




> 
>> +  /* If explicitly requested,
>> +       or the rounded size is not the same as the original size
>> +       or the the rounded size is greater than a page,
>> +     then we will need a copy of the original stack pointer.  */
>> +  if (rounded_size != orig_size
>> +      || rounded_size > probe_interval
>> +      || copy_reg)
>> +    {
>> +      /* If the caller requested a copy of the incoming stack pointer,
>> +	 then ORIG_SP == COPY_REG and will not be NULL.
>> +
>> +	 If no copy was requested, then we use r0 to hold the copy.  */
>> +      if (orig_sp == NULL_RTX)
>> +	orig_sp = gen_rtx_REG (Pmode, 0);
>> +      emit_move_insn (orig_sp, stack_pointer_rtx);
> 
> Maybe just write the "if" as "if (!copy_reg)"?  You can lose the first
> half of the comment then (since it is obvious then).
Agreed.

> 
>> +      for (int i = 0; i < rounded_size; i += probe_interval)
>> +	{
>> +	  rtx_insn *insn
>> +	    = rs6000_emit_allocate_stack_1 (probe_interval,
>> +					    probe_int, orig_sp);
>> +	  if (!retval)
>> +	    retval = insn;
>> +	}
> 
> Maybe "if (i == 0)" is clearer?
No strong opinions here.  I added a comment and changed  to i == 0
> 
>> @@ -25509,6 +25703,23 @@ rs6000_emit_allocate_stack (HOST_WIDE_INT size, rtx copy_reg, int copy_off)
>>  	warning (0, "stack limit expression is not supported");
>>      }
>>  
>> +  if (flag_stack_clash_protection)
>> +    {
>> +      if (size < (1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE)))
> 
> HOST_WIDE_INT_1U again.
Fixed.

> 
>> +/* Probe a range of stack addresses from REG1 to REG3 inclusive.  These are
>> +   absolute addresses.  REG2 contains the backchain that must be stored into
>> +   *sp at each allocation.
> 
> I would just remove "These are absolute addresses.", or write something
> like "These are addresses, not offsets", but that is kind of obvious
> isn't it ;-)
:-)  It was copied from the analogous -fstack-check routine.   Note
there's a similar routine for -fstack-check which uses offsets, so I
think being very explicit makes sense.  Perhaps just "These are
addresses, not offsets" is better than the current comment?  I'll go
with whatever you prefer here.




> 
>> +static const char *
>> +output_probe_stack_range_stack_clash (rtx reg1, rtx reg2, rtx reg3)
>> +{
>> +  static int labelno = 0;
>> +  char loop_lab[32];
>> +  rtx xops[3];
>> +
>> +  HOST_WIDE_INT probe_interval
>> +    = 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
> 
> Once more :-)  Maybe a helper function is in order?  Would avoid the
> huge length names at least.
Fixed.  Long term I hope we find that changing these things isn't useful
and we just drop them.


> 
> 
> Bootstrap+testsuite finished on BE, but I forgot to enable stack-clash
> protection by default, whoops.  Will have results later today (also LE).
FWIW, I did do BE tests of earlier versions of this code as well as
ppc32-be with and without stack clash protection enabled by default.
But most of the time I'm testing ppc64le.

jeff
Jeff Law Sept. 25, 2017, 4:15 p.m. UTC | #6
On 09/25/2017 08:14 AM, Segher Boessenkool wrote:
> On Mon, Sep 25, 2017 at 07:41:18AM -0500, Segher Boessenkool wrote:
>> On Mon, Sep 25, 2017 at 05:52:27AM -0500, Segher Boessenkool wrote:
>>> Bootstrap+testsuite finished on BE, but I forgot to enable stack-clash
>>> protection by default, whoops.  Will have results later today (also LE).
>>
>> Some new failures show up:
>>
>> +FAIL: c-c++-common/ubsan/vla-1.c   -O0  execution test
>>
>> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:18:7: runtime error: variable length array bound evaluates to non-positive value -1
>> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:24:7: runtime error: variable length array bound evaluates to non-positive value -1
>> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:24:7: runtime error: variable length array bound evaluates to non-positive value -1
>> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:30:7: runtime error: variable length array bound evaluates to non-positive value -1
>> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:30:7: runtime error: variable length array bound evaluates to non-positive value -1
>> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:30:7: runtime error: variable length array bound evaluates to non-positive value -1
>> /home/segher/src/gcc/gcc/testsuite/c-c++-common/ubsan/vla-1.c:36:7: runtime error: variable length array bound evaluates to non-positive value -5
>>
>> (both gcc and g++, both -m32 and -m64).
>>
>> This is BE; LE is still running.
> 
> LE show the same, but also
> 
>                 === acats tests ===
> +FAIL:  c52103x
> +FAIL:  c52104x
> +FAIL:  c52104y
> +FAIL:  cb1010a
> 
> These are
> 
> ---- C52103X CHECK THAT IN ARRAY ASSIGNMENTS AND IN SLICE ASSIGNMENTS,
>                 THE LENGTHS MUST MATCH; ALSO CHECK WHETHER
>                 CONSTRAINT_ERROR OR STORAGE_ERROR ARE RAISED FOR LARGE
>                 ARRAYS.
>    - C52103X NO CONSTRAINT_ERROR FOR TYPE WITH 'LENGTH = INTEGER'LAST + 
>                 3.
> 
> raised STORAGE_ERROR : stack overflow or erroneous memory access
> FAIL:	c52103x
This is expected.   stack clash protection does not guarantee we can run
the signal handler upon stack overflow -- thus we can not guarantee we
get the constraint error.


> 
> (twice)
> 
> ---- C52104Y CHECK THAT IN ARRAY ASSIGNMENTS AND IN SLICE ASSIGNMENTS,
>                 THE LENGTHS MUST MATCH.
>    - C52104Y NO CONSTRAINT_ERROR FOR NON-NULL ARRAY SUBTYPE WHEN ONE
>                 DIMENSION HAS INTEGER'LAST + 3 COMPONENTS.
> 
> raised STORAGE_ERROR : stack overflow or erroneous memory access
> FAIL:   c52104y
Likewise.


> 
> and,
> 
> ---- CB1010A CHECK THAT STORAGE_ERROR IS RAISED WHEN STORAGE ALLOCATED
>                 TO A TASK IS EXCEEDED.
>    - CB1010A CHECK TASKS THAT DO NOT HANDLE STORAGE_ERROR PRIOR TO
>                 RENDEZVOUS.
> FAIL:   cb1010a
Likewise.  We can't guarantee we can run the signal handler and thus we
can't inform the Ada runtime about the overflow.

Jeff
Segher Boessenkool Sept. 25, 2017, 5:25 p.m. UTC | #7
On Mon, Sep 25, 2017 at 10:00:55AM -0600, Jeff Law wrote:
> On 09/25/2017 04:52 AM, Segher Boessenkool wrote:
> > Did you also test Ada?  It needs testing.  I wanted to try it myself,
> > but your patch doesn't apply (you included the changelog bits in the
> > patch), and I cannot easily manually apply it either because you sent
> > it as base64 instead of as text.
> I didn't test Ada with -fstack-clash-protection on by default.  I did
> test it as part of the normal bootstrap & regression test cycle with no
> changes in the Ada testsuites.
> 
> Testing it with stack clash protection on by default is easy to do :-)
> I wouldn't be surprised to see failures since some tests are known to
> test/use -fstack-check= which explicitly conflicts with stack clash
> protection.

Right, and that did happen (see other mails).  But that's okay, it won't
be enabled by default (yet) (right?), and when it does just the testcases
need fixing?

> >> +   SIZE_INT is used to create the CFI note for the allocation.
> >> +
> >> +   SIZE_RTX is an rtx containing the size of the adjustment.  Note that
> >> +   since stacks grow to lower addresses its runtime value is -SIZE_INT.
> > 
> > The size_rtx doesn't always correspond to size_int so this a bit
> > misleading.
> The value at runtime of SIZE_RTX is always -SIZE_INT.   It might be held
> in a register, but that register's value is always -SIZE_INT.

Ah, I was looking at the last call in rs6000_emit_allocate_stack.  It's
a bit hard to follow, but I see you are right.  Unfortunate that you
won't get good generated code if you just drop that size_rtx parameter
and generate it from size_int here (or do you?)

(You could still do
  if (!size_rtx)
    size_rtx = GEN_INT (-size_int);
to simplify the callers a bit).

> > (These comments were in the original code already, oh well).
> Happy to adjust if you've got a suggestion on how to make it clearer.

I'm drawing a blank right now :-/

> >> +rs6000_emit_probe_stack_range_stack_clash (HOST_WIDE_INT orig_size,
> >> +					   rtx copy_reg)
> >> +{
> >> +  rtx orig_sp = copy_reg;
> >> +
> >> +  HOST_WIDE_INT probe_interval
> >> +    = 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
> > 
> > HOST_WIDE_INT_1U << ...
> It won't matter in practice because of the limits we've put on those
> PARAMs, but sure, easy to do except for the long lines, even in a helper
> function :(

Yeah.  But whenever I see "1 <<" I think "will it fit" -- no need for
that with HOST_WIDE_INT_1 (or unsigned, if you can) :-)

> >> +/* Probe a range of stack addresses from REG1 to REG3 inclusive.  These are
> >> +   absolute addresses.  REG2 contains the backchain that must be stored into
> >> +   *sp at each allocation.
> > 
> > I would just remove "These are absolute addresses.", or write something
> > like "These are addresses, not offsets", but that is kind of obvious
> > isn't it ;-)
> :-)  It was copied from the analogous -fstack-check routine.   Note
> there's a similar routine for -fstack-check which uses offsets, so I
> think being very explicit makes sense.  Perhaps just "These are
> addresses, not offsets" is better than the current comment?  I'll go
> with whatever you prefer here.

Yeah that is fine, thanks.

> >> +static const char *
> >> +output_probe_stack_range_stack_clash (rtx reg1, rtx reg2, rtx reg3)
> >> +{
> >> +  static int labelno = 0;
> >> +  char loop_lab[32];
> >> +  rtx xops[3];
> >> +
> >> +  HOST_WIDE_INT probe_interval
> >> +    = 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
> > 
> > Once more :-)  Maybe a helper function is in order?  Would avoid the
> > huge length names at least.
> Fixed.  Long term I hope we find that changing these things isn't useful
> and we just drop them.

I hope we can change to 64kB for 64-bit Power (instead of 4kB) -- if we
do, we can probably turn on this protection by default, since the runtime
cost will be close to zero (almost all functions will need *no* extra
code compared to no clash protection).

But we'll probably need to support 4kB as well, even for 64-bit.

> > Bootstrap+testsuite finished on BE, but I forgot to enable stack-clash
> > protection by default, whoops.  Will have results later today (also LE).
> FWIW, I did do BE tests of earlier versions of this code as well as
> ppc32-be with and without stack clash protection enabled by default.
> But most of the time I'm testing ppc64le.

So all testing was fine (except the things with stack clash protection
on by default, which you are not proposing to commit).

I'm quite happy with the patch now; it's okay for trunk.

Thank you for all the work!


Segher
Jeff Law Sept. 25, 2017, 6:18 p.m. UTC | #8
On 09/25/2017 11:25 AM, Segher Boessenkool wrote:
> On Mon, Sep 25, 2017 at 10:00:55AM -0600, Jeff Law wrote:
>> On 09/25/2017 04:52 AM, Segher Boessenkool wrote:
>>> Did you also test Ada?  It needs testing.  I wanted to try it myself,
>>> but your patch doesn't apply (you included the changelog bits in the
>>> patch), and I cannot easily manually apply it either because you sent
>>> it as base64 instead of as text.
>> I didn't test Ada with -fstack-clash-protection on by default.  I did
>> test it as part of the normal bootstrap & regression test cycle with no
>> changes in the Ada testsuites.
>>
>> Testing it with stack clash protection on by default is easy to do :-)
>> I wouldn't be surprised to see failures since some tests are known to
>> test/use -fstack-check= which explicitly conflicts with stack clash
>> protection.
> 
> Right, and that did happen (see other mails).  But that's okay, it won't
> be enabled by default (yet) (right?), and when it does just the testcases
> need fixing?
Correct, it's not enabled by default at this time.    To propose that
we'd need to do some testsuite work.  We'd also need to make some
decisions on how to handle the partially protected targets.

So for the immediate future, stack clash protection has to be explicitly
requested.

We're still working out the long term plan for RHEL, but defaulting it
on for all the RHEL architectures is definitely part of that plan.

> 
>>>> +   SIZE_INT is used to create the CFI note for the allocation.
>>>> +
>>>> +   SIZE_RTX is an rtx containing the size of the adjustment.  Note that
>>>> +   since stacks grow to lower addresses its runtime value is -SIZE_INT.
>>>
>>> The size_rtx doesn't always correspond to size_int so this a bit
>>> misleading.
>> The value at runtime of SIZE_RTX is always -SIZE_INT.   It might be held
>> in a register, but that register's value is always -SIZE_INT.
> 
> Ah, I was looking at the last call in rs6000_emit_allocate_stack.  It's
> a bit hard to follow, but I see you are right.  Unfortunate that you
> won't get good generated code if you just drop that size_rtx parameter
> and generate it from size_int here (or do you?)
I don't think it would make any difference from a code generation
standpoint.  One could argue that passing in SIZE_RTX is just silly as
the caller's don't really need that information and it just hides the
invariant that SIZE_RTX is just -SIZE at runtime.  Let me give that a
whirl in the tester.


> 
> Yeah.  But whenever I see "1 <<" I think "will it fit" -- no need for
> that with HOST_WIDE_INT_1 (or unsigned, if you can) :-)
And that's why I just went ahead and wrote a helper.  It ensures we're
good to go even if we expand the limits significantly.

>> Fixed.  Long term I hope we find that changing these things isn't useful
>> and we just drop them.
> 
> I hope we can change to 64kB for 64-bit Power (instead of 4kB) -- if we
> do, we can probably turn on this protection by default, since the runtime
> cost will be close to zero (almost all functions will need *no* extra
> code compared to no clash protection).
Right.   But I would claim that even today with 4k guard and 4k probe
interval that the overhead is likely not measurable in practice on a
target like ppc.


> 
> But we'll probably need to support 4kB as well, even for 64-bit.
Note that we can adjust the size of the guard and probe interval
independently.  So if a system is providing a multi-page guard we can
take advantage of that by raising the former, but not the latter.

> 
>>> Bootstrap+testsuite finished on BE, but I forgot to enable stack-clash
>>> protection by default, whoops.  Will have results later today (also LE).
>> FWIW, I did do BE tests of earlier versions of this code as well as
>> ppc32-be with and without stack clash protection enabled by default.
>> But most of the time I'm testing ppc64le.
> 
> So all testing was fine (except the things with stack clash protection
> on by default, which you are not proposing to commit).
> 
> I'm quite happy with the patch now; it's okay for trunk.
> 
> Thank you for all the work!
Thanks.  Your feedback on both the PPC and generic bits did
significantly improve the implementation.

Jeff
diff mbox series

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index fdb7221848c..2bdb8c07372 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,22 @@ 
+2017-09-22  Jeff Law  <law@redhat.com>
+
+	* config/rs6000/rs6000-protos.h (output_probe_stack_range): Update
+	prototype for new argument.
+	* config/rs6000/rs6000.c (rs6000_emit_allocate_stack_1): New function,
+	mostly extracted from rs6000_emit_allocate_stack.
+	(rs6000_emit_probe_stack_range_stack_clash): New function.
+	(rs6000_emit_allocate_stack): Call
+	rs6000_emit_probe_stack_range_stack_clash as needed.
+	(rs6000_emit_probe_stack_range): Add additional argument
+	to call to gen_probe_stack_range{si,di}.
+	(output_probe_stack_range): New.
+	(output_probe_stack_range_1): Renamed from output_probe_stack_range.
+	(output_probe_stack_range_stack_clash): New.
+	(rs6000_emit_prologue): Emit notes into dump file as requested.
+	* rs6000.md (allocate_stack): Handle -fstack-clash-protection.
+	(probe_stack_range<P:mode>): Operand 0 is now early-clobbered.
+	Add additional operand and pass it to output_probe_stack_range.
+
 2017-09-22  Richard Sandiford  <richard.sandiford@linaro.org>
 	    Alan Hayward  <alan.hayward@arm.com>
 	    David Sherwood  <david.sherwood@arm.com>
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 3f86aba947e..781349b850e 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -128,7 +128,7 @@  extern void rs6000_emit_sISEL (machine_mode, rtx[]);
 extern void rs6000_emit_sCOND (machine_mode, rtx[]);
 extern void rs6000_emit_cbranch (machine_mode, rtx[]);
 extern char * output_cbranch (rtx, const char *, int, rtx_insn *);
-extern const char * output_probe_stack_range (rtx, rtx);
+extern const char * output_probe_stack_range (rtx, rtx, rtx);
 extern void rs6000_emit_dot_insn (rtx dst, rtx src, int dot, rtx ccreg);
 extern bool rs6000_emit_set_const (rtx, rtx);
 extern int rs6000_emit_cmove (rtx, rtx, rtx, rtx);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e5ef63889b7..9e486f592b9 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -25458,6 +25458,201 @@  rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)
   emit_insn (gen_stack_tie (gen_rtx_PARALLEL (VOIDmode, p)));
 }
 
+/* Allocate SIZE_INT bytes on the stack using a store with update style insn
+   and set the appropriate attributes for the generated insn.  Return the
+   generated insn.
+
+   SIZE_INT is used to create the CFI note for the allocation.
+
+   SIZE_RTX is an rtx containing the size of the adjustment.  Note that
+   since stacks grow to lower addresses its runtime value is -SIZE_INT.
+
+   ORIG_SP contains the backchain value that must be stored at *sp.  */
+
+static rtx_insn *
+rs6000_emit_allocate_stack_1 (HOST_WIDE_INT size_int, rtx size_rtx, rtx orig_sp)
+{
+  rtx_insn *insn;
+
+  if (Pmode == SImode)
+    insn = emit_insn (gen_movsi_update_stack (stack_pointer_rtx,
+					      stack_pointer_rtx,
+					      size_rtx,
+					      orig_sp));
+  else
+    insn = emit_insn (gen_movdi_di_update_stack (stack_pointer_rtx,
+						 stack_pointer_rtx,
+						 size_rtx,
+						 orig_sp));
+  rtx par = PATTERN (insn);
+  gcc_assert (GET_CODE (par) == PARALLEL);
+  rtx set = XVECEXP (par, 0, 0);
+  gcc_assert (GET_CODE (set) == SET);
+  rtx mem = SET_DEST (set);
+  gcc_assert (MEM_P (mem));
+  MEM_NOTRAP_P (mem) = 1;
+  set_mem_alias_set (mem, get_frame_alias_set ());
+
+  RTX_FRAME_RELATED_P (insn) = 1;
+  add_reg_note (insn, REG_FRAME_RELATED_EXPR,
+		gen_rtx_SET (stack_pointer_rtx,
+			     gen_rtx_PLUS (Pmode,
+					   stack_pointer_rtx,
+					   GEN_INT (-size_int))));
+
+  /* Emit a blockage to ensure the allocation/probing insns are
+     not optimized, combined, removed, etc.  Add REG_STACK_CHECK
+     note for similar reasons.  */
+  if (flag_stack_clash_protection)
+    {
+      add_reg_note (insn, REG_STACK_CHECK, const0_rtx);
+      emit_insn (gen_blockage ());
+    }
+
+  return insn;
+}
+
+/* Allocate ORIG_SIZE bytes on the stack and probe the newly
+   allocated space every STACK_CLASH_PROTECTION_PROBE_INTERVAL bytes.
+
+   COPY_REG, if non-null, should contain a copy of the original
+   stack pointer at exit from this function.
+
+   This is subtly different than the Ada probing in that it tries hard to
+   prevent attacks that jump the stack guard.  Thus it is never allowed to
+   allocate more than STACK_CLASH_PROTECTION_PROBE_INTERVAL bytes of stack
+   space without a suitable probe.  */
+static rtx_insn *
+rs6000_emit_probe_stack_range_stack_clash (HOST_WIDE_INT orig_size,
+					   rtx copy_reg)
+{
+  rtx orig_sp = copy_reg;
+
+  HOST_WIDE_INT probe_interval
+    = 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
+
+  /* Round the size down to a multiple of PROBE_INTERVAL.  */
+  HOST_WIDE_INT rounded_size = ROUND_DOWN (orig_size, probe_interval);
+
+  /* If explicitly requested,
+       or the rounded size is not the same as the original size
+       or the the rounded size is greater than a page,
+     then we will need a copy of the original stack pointer.  */
+  if (rounded_size != orig_size
+      || rounded_size > probe_interval
+      || copy_reg)
+    {
+      /* If the caller requested a copy of the incoming stack pointer,
+	 then ORIG_SP == COPY_REG and will not be NULL.
+
+	 If no copy was requested, then we use r0 to hold the copy.  */
+      if (orig_sp == NULL_RTX)
+	orig_sp = gen_rtx_REG (Pmode, 0);
+      emit_move_insn (orig_sp, stack_pointer_rtx);
+    }
+
+  /* There's three cases here.
+
+     One is a single probe which is the most common and most efficiently
+     implemented as it does not have to have a copy of the original
+     stack pointer if there are no residuals.
+
+     Second is unrolled allocation/probes which we use if there's just
+     a few of them.  It needs to save the original stack pointer into a
+     temporary for use as a source register in the allocation/probe.
+
+     Last is a loop.  This is the most uncommon case and least efficient.  */
+  rtx_insn *retval = NULL;
+  if (rounded_size == probe_interval)
+    {
+      retval = rs6000_emit_allocate_stack_1 (probe_interval,
+					     GEN_INT (-probe_interval),
+					     stack_pointer_rtx);
+
+      dump_stack_clash_frame_info (PROBE_INLINE, rounded_size != orig_size);
+    }
+  else if (rounded_size <= 8 * probe_interval)
+    {
+      /* The ABI requires using the store with update insns to allocate
+	 space and store the backchain into the stack
+
+	 So we save the current stack pointer into a temporary, then
+	 emit the store-with-update insns to store the saved stack pointer
+	 into the right location in each new page.  */
+      rtx probe_int = GEN_INT (-probe_interval);
+      for (int i = 0; i < rounded_size; i += probe_interval)
+	{
+	  rtx_insn *insn
+	    = rs6000_emit_allocate_stack_1 (probe_interval,
+					    probe_int, orig_sp);
+	  if (!retval)
+	    retval = insn;
+	}
+
+      dump_stack_clash_frame_info (PROBE_INLINE, rounded_size != orig_size);
+    }
+  else
+    {
+      /* Compute the ending address.  */
+      rtx end_addr
+	= copy_reg ? gen_rtx_REG (Pmode, 0) : gen_rtx_REG (Pmode, 12);
+      rtx rs = GEN_INT (-rounded_size);
+      rtx_insn *insn;
+      if (add_operand (rs, Pmode))
+	insn = emit_insn (gen_add3_insn (end_addr, stack_pointer_rtx, rs));
+      else
+	{
+	  emit_move_insn (end_addr, GEN_INT (-rounded_size));
+	  insn = emit_insn (gen_add3_insn (end_addr, end_addr,
+					   stack_pointer_rtx));
+	  /* Describe the effect of INSN to the CFI engine.  */
+	  add_reg_note (insn, REG_FRAME_RELATED_EXPR,
+			gen_rtx_SET (end_addr,
+				     gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+						   rs)));
+	}
+      RTX_FRAME_RELATED_P (insn) = 1;
+
+      /* Emit the loop.  */
+      if (TARGET_64BIT)
+	retval = emit_insn (gen_probe_stack_rangedi (stack_pointer_rtx,
+						     stack_pointer_rtx, orig_sp,
+						     end_addr));
+      else
+	retval = emit_insn (gen_probe_stack_rangesi (stack_pointer_rtx,
+						     stack_pointer_rtx, orig_sp,
+						     end_addr));
+      RTX_FRAME_RELATED_P (retval) = 1;
+      /* Describe the effect of INSN to the CFI engine.  */
+      add_reg_note (retval, REG_FRAME_RELATED_EXPR,
+		    gen_rtx_SET (stack_pointer_rtx, end_addr));
+
+      /* Emit a blockage to ensure the allocation/probing insns are
+	 not optimized, combined, removed, etc.  Other cases handle this
+	 within their call to rs6000_emit_allocate_stack_1.  */
+      emit_insn (gen_blockage ());
+
+      dump_stack_clash_frame_info (PROBE_LOOP, rounded_size != orig_size);
+    }
+
+  if (orig_size != rounded_size)
+    {
+      /* Allocate (and implicitly probe) any residual space.   */
+      HOST_WIDE_INT residual = orig_size - rounded_size;
+
+      rtx_insn *insn = rs6000_emit_allocate_stack_1 (residual,
+						     GEN_INT (-residual),
+						     orig_sp);
+
+      /* If the residual was the only allocation, then we can return the
+	 allocating insn.  */
+      if (!retval)
+	retval = insn;
+    }
+
+  return retval;
+}
+
 /* Emit the correct code for allocating stack space, as insns.
    If COPY_REG, make sure a copy of the old frame is left there.
    The generated code may use hard register 0 as a temporary.  */
@@ -25469,7 +25664,6 @@  rs6000_emit_allocate_stack (HOST_WIDE_INT size, rtx copy_reg, int copy_off)
   rtx stack_reg = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
   rtx tmp_reg = gen_rtx_REG (Pmode, 0);
   rtx todec = gen_int_mode (-size, Pmode);
-  rtx par, set, mem;
 
   if (INTVAL (todec) != -size)
     {
@@ -25509,6 +25703,23 @@  rs6000_emit_allocate_stack (HOST_WIDE_INT size, rtx copy_reg, int copy_off)
 	warning (0, "stack limit expression is not supported");
     }
 
+  if (flag_stack_clash_protection)
+    {
+      if (size < (1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE)))
+	dump_stack_clash_frame_info (NO_PROBE_SMALL_FRAME, true);
+      else
+	{
+	  rtx_insn *insn = rs6000_emit_probe_stack_range_stack_clash (size,
+								      copy_reg);
+
+	  /* If we asked for a copy with an offset, then we still need add in
+	     the offset.  */
+	  if (copy_reg && copy_off)
+	    emit_insn (gen_add3_insn (copy_reg, copy_reg, GEN_INT (copy_off)));
+	  return insn;
+	}
+    }
+
   if (copy_reg)
     {
       if (copy_off != 0)
@@ -25527,28 +25738,11 @@  rs6000_emit_allocate_stack (HOST_WIDE_INT size, rtx copy_reg, int copy_off)
       todec = tmp_reg;
     }
   
-  insn = emit_insn (TARGET_32BIT
-		    ? gen_movsi_update_stack (stack_reg, stack_reg,
-					      todec, stack_reg)
-		    : gen_movdi_di_update_stack (stack_reg, stack_reg,
-						 todec, stack_reg));
   /* Since we didn't use gen_frame_mem to generate the MEM, grab
      it now and set the alias set/attributes. The above gen_*_update
      calls will generate a PARALLEL with the MEM set being the first
      operation. */
-  par = PATTERN (insn);
-  gcc_assert (GET_CODE (par) == PARALLEL);
-  set = XVECEXP (par, 0, 0);
-  gcc_assert (GET_CODE (set) == SET);
-  mem = SET_DEST (set);
-  gcc_assert (MEM_P (mem));
-  MEM_NOTRAP_P (mem) = 1;
-  set_mem_alias_set (mem, get_frame_alias_set ());
-
-  RTX_FRAME_RELATED_P (insn) = 1;
-  add_reg_note (insn, REG_FRAME_RELATED_EXPR,
-		gen_rtx_SET (stack_reg, gen_rtx_PLUS (Pmode, stack_reg,
-						      GEN_INT (-size))));
+  insn = rs6000_emit_allocate_stack_1 (size, todec, stack_reg);
   return insn;
 }
 
@@ -25630,9 +25824,9 @@  rs6000_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size)
 	 until it is equal to ROUNDED_SIZE.  */
 
       if (TARGET_64BIT)
-	emit_insn (gen_probe_stack_rangedi (r12, r12, r0));
+	emit_insn (gen_probe_stack_rangedi (r12, r12, stack_pointer_rtx, r0));
       else
-	emit_insn (gen_probe_stack_rangesi (r12, r12, r0));
+	emit_insn (gen_probe_stack_rangesi (r12, r12, stack_pointer_rtx, r0));
 
 
       /* Step 4: probe at FIRST + SIZE if we cannot assert at compile-time
@@ -25646,8 +25840,8 @@  rs6000_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size)
 /* Probe a range of stack addresses from REG1 to REG2 inclusive.  These are
    absolute addresses.  */
 
-const char *
-output_probe_stack_range (rtx reg1, rtx reg2)
+static const char *
+output_probe_stack_range_1 (rtx reg1, rtx reg2)
 {
   static int labelno = 0;
   char loop_lab[32];
@@ -25714,6 +25908,63 @@  interesting_frame_related_regno (unsigned int regno)
   return save_reg_p (regno);
 }
 
+/* Probe a range of stack addresses from REG1 to REG3 inclusive.  These are
+   absolute addresses.  REG2 contains the backchain that must be stored into
+   *sp at each allocation.
+
+   This is subtly different than the Ada probing above in that it tries hard
+   to prevent attacks that jump the stack guard.  Thus, it is never allowed
+   to allocate more than PROBE_INTERVAL bytes of stack space without a
+   suitable probe.  */
+
+static const char *
+output_probe_stack_range_stack_clash (rtx reg1, rtx reg2, rtx reg3)
+{
+  static int labelno = 0;
+  char loop_lab[32];
+  rtx xops[3];
+
+  HOST_WIDE_INT probe_interval
+    = 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
+
+  ASM_GENERATE_INTERNAL_LABEL (loop_lab, "LPSRL", labelno++);
+
+  ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, loop_lab);
+
+  /* This allocates and probes.  */
+  xops[0] = reg1;
+  xops[1] = reg2;
+  xops[2] = GEN_INT (-probe_interval);
+  if (TARGET_64BIT)
+    output_asm_insn ("stdu %1,%2(%0)", xops);
+  else
+    output_asm_insn ("stwu %1,%2(%0)", xops);
+
+  /* Jump to LOOP_LAB if TEST_ADDR != LAST_ADDR.  */
+  xops[0] = reg1;
+  xops[1] = reg3;
+  if (TARGET_64BIT)
+    output_asm_insn ("cmpd 0,%0,%1", xops);
+  else
+    output_asm_insn ("cmpw 0,%0,%1", xops);
+
+  fputs ("\tbne 0,", asm_out_file);
+  assemble_name_raw (asm_out_file, loop_lab);
+  fputc ('\n', asm_out_file);
+
+  return "";
+}
+
+/* Wrapper around the output_probe_stack_range routines.  */
+const char *
+output_probe_stack_range (rtx reg1, rtx reg2, rtx reg3)
+{
+  if (flag_stack_clash_protection)
+    return output_probe_stack_range_stack_clash (reg1, reg2, reg3);
+  else
+    return output_probe_stack_range_1 (reg1, reg3);
+}
+
 /* Add to 'insn' a note which is PATTERN (INSN) but with REG replaced
    with (plus:P (reg 1) VAL), and with REG2 replaced with REPL2 if REG2
    is not NULL.  It would be nice if dwarf2out_frame_debug_expr could
@@ -27327,6 +27578,13 @@  rs6000_emit_prologue (void)
 	  }
     }
 
+  /* If we are emitting stack probes, but allocate no stack, then
+     just note that in the dump file.  */
+  if (flag_stack_clash_protection
+      && dump_file
+      && !info->push_p)
+    dump_stack_clash_frame_info (NO_PROBE_NO_FRAME, false);
+
   /* Update stack and set back pointer unless this is V.4,
      for which it was done previously.  */
   if (!WORLD_SAVE_P (info) && info->push_p
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 13ba7438124..2dc2ec82e16 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -10250,10 +10250,20 @@ 
 ;;
 ;; First, an insn to allocate new stack space for dynamic use (e.g., alloca).
 ;; We move the back-chain and decrement the stack pointer.
-
+;;
+;; Operand1 is more naturally reg_or_short_operand.  However, for a large
+;; constant alloca, using that predicate will force the generic code to put
+;; the constant size into a register before calling the expander.
+;;
+;; As a result the expander would not have the constant size information
+;; in those cases and would have to generate less efficient code.
+;;
+;; Thus we allow reg_or_cint_operand instead so that the expander can see
+;; the constant size.  The value is forced into a register if necessary.
+;;
 (define_expand "allocate_stack"
   [(set (match_operand 0 "gpc_reg_operand" "")
-	(minus (reg 1) (match_operand 1 "reg_or_short_operand" "")))
+	(minus (reg 1) (match_operand 1 "reg_or_cint_operand" "")))
    (set (reg 1)
 	(minus (reg 1) (match_dup 1)))]
   ""
@@ -10263,6 +10273,15 @@ 
   rtx neg_op0;
   rtx insn, par, set, mem;
 
+  /* By allowing reg_or_cint_operand as the predicate we can get
+     better code for stack-clash-protection because we do not lose
+     size information.  But the rest of the code expects the operand
+     to be reg_or_short_operand.  If it isn't, then force it into
+     a register.  */
+  rtx orig_op1 = operands[1];
+  if (!reg_or_short_operand (operands[1], Pmode))
+    operands[1] = force_reg (Pmode, operands[1]);
+
   emit_move_insn (chain, stack_bot);
 
   /* Check stack bounds if necessary.  */
@@ -10275,6 +10294,51 @@ 
       emit_insn (gen_cond_trap (LTU, available, operands[1], const0_rtx));
     }
 
+  /* Allocate and probe if requested.
+     This may look similar to the loop we use for prologue allocations,
+     but it is critically different.  For the former we know the loop
+     will iterate, but do not know that generally here.  The former
+     uses that knowledge to rotate the loop.  Combining them would be
+     possible with some performance cost.  */
+  if (flag_stack_clash_protection)
+    {
+      rtx rounded_size, last_addr, residual;
+      HOST_WIDE_INT probe_interval;
+      compute_stack_clash_protection_loop_data (&rounded_size, &last_addr,
+						&residual, &probe_interval,
+						orig_op1);
+
+      /* We do occasionally get in here with constant sizes, we might
+	 as well do a reasonable job when we obviously can.  */
+      if (rounded_size != const0_rtx)
+	{
+	  rtx loop_lab, end_loop;
+	  bool rotated = CONST_INT_P (rounded_size);
+
+	  emit_stack_clash_protection_probe_loop_start (&loop_lab, &end_loop,
+							last_addr, rotated);
+
+	  if (Pmode == SImode)
+	    emit_insn (gen_movsi_update_stack (stack_pointer_rtx,
+					       stack_pointer_rtx,
+					       GEN_INT (-probe_interval),
+					       chain));
+	  else
+	    emit_insn (gen_movdi_di_update_stack (stack_pointer_rtx,
+					          stack_pointer_rtx,
+					          GEN_INT (-probe_interval),
+					          chain));
+	  emit_stack_clash_protection_probe_loop_end (loop_lab, end_loop,
+						      last_addr, rotated);
+	}
+
+      /* Now handle residuals.  We just have to set operands[1] correctly
+	 and let the rest of the expander run.  */
+      operands[1] = residual;
+      if (!CONST_INT_P (residual))
+	operands[1] = force_reg (Pmode, operands[1]);
+    }
+
   if (GET_CODE (operands[1]) != CONST_INT
       || INTVAL (operands[1]) < -32767
       || INTVAL (operands[1]) > 32768)
@@ -11413,12 +11477,13 @@ 
    (set_attr "length" "4")])
 
 (define_insn "probe_stack_range<P:mode>"
-  [(set (match_operand:P 0 "register_operand" "=r")
+  [(set (match_operand:P 0 "register_operand" "=&r")
 	(unspec_volatile:P [(match_operand:P 1 "register_operand" "0")
-			    (match_operand:P 2 "register_operand" "r")]
+			    (match_operand:P 2 "register_operand" "r")
+			    (match_operand:P 3 "register_operand" "r")]
 			   UNSPECV_PROBE_STACK_RANGE))]
   ""
-  "* return output_probe_stack_range (operands[0], operands[2]);"
+  "* return output_probe_stack_range (operands[0], operands[2], operands[3]);"
   [(set_attr "type" "three")])
 
 ;; Compare insns are next.  Note that the RS/6000 has two types of compares,
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 39838f595fb..501f9f01582 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@ 
+2017-09-22  Jeff Law  <law@redhat.com>
+
+	* lib/target-supports.exp
+	(check_effective_target_supports_stack_clash_protection): Enable for
+	rs6000 and powerpc targets.
+
 2017-09-22  Richard Sandiford  <richard.sandiford@linaro.org>
 	    Alan Hayward  <alan.hayward@arm.com>
 	    David Sherwood  <david.sherwood@arm.com>
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 887a801bbd8..3ded1e32f49 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8639,12 +8639,12 @@  proc check_effective_target_autoincdec { } {
 proc check_effective_target_supports_stack_clash_protection { } {
 
    # Temporary until the target bits are fully ACK'd.
-#  if { [istarget aarch*-*-*]
-#       || [istarget powerpc*-*-*] || [istarget rs6000*-*-*] } {
+#  if { [istarget aarch*-*-*] } {
 #	return 1
 #  }
 
     if { [istarget x86_64-*-*] || [istarget i?86-*-*] 
+	  || [istarget powerpc*-*-*] || [istarget rs6000*-*-*]
 	  || [istarget s390*-*-*] } {
 	return 1
     }