Message ID | 93ce7fc1-e41e-282f-a574-234a83d57d7d@redhat.com |
---|---|
State | New |
Headers | show |
Hi! On Tue, Jul 11, 2017 at 03:19:57PM -0600, Jeff Law wrote: > It introduces a new style of stack probing -fstack-check=clash, > including documentation of the new option, how it differs from > -fstack-check=specific, etc. > > FWIW -fstack-check=specific is dreadfully named. I haven't tried to > address that. -fstack-check=clash is itself not such a super name either. It's not checking stack, and it isn't clashing: it just does a store to every page the stack will touch (minus the first and last or something). How does this work for targets that want to enable this by default? How does that interact with checking for stack size overflow? > However, a sufficiently smart compiler could realize that a call to a > noreturn function could be converted into a jump, even if the caller has > a frame because that frame need not be torn down. Currently GCC will never make a tail call to a noreturn function (see calls.c:can_implement_as_sibling_call_p). It would be nice (for code size, if nothing else) if that was changed. But you know all this :-) > @@ -11333,7 +11334,8 @@ target support in the compiler but comes with the following drawbacks: > @enumerate > @item > Modified allocation strategy for large objects: they are always > -allocated dynamically if their size exceeds a fixed threshold. > +allocated dynamically if their size exceeds a fixed threshold. Note this > +may change the semantics of some code. How so? Semantics of valid (portable) code, as well? It would be nice to have some detail here, not just a threat :-) > +@samp{clash} is designed to prevent jumping the stack guard page as seen in > +stack clash style attacks. However, it does not guarantee sufficient stack > +space to handle a signal in the event that the program hits the stack guard > +and is thus incompatible with Ada's needs. Is there some way we could have both? In principle stack-clash protection is completely orthogonal to -fstack-check. > /* Check the stack and entirely rely on the target configuration > - files, i.e. do not use the generic mechanism at all. */ > + files, i.e. do not use the generic mechanism at all. This > + does not prevent stack guard jumping and stack clash style > + attacks. */ > FULL_BUILTIN_STACK_CHECK > }; > + else if (!strcmp (arg, "clash")) > + { > + /* This is the stack checking method, designed to prevent > + stack-clash attacks. */ > + if (!STACK_GROWS_DOWNWARD) > + sorry ("-fstack-check=clash not implemented on this target"); > + else > + opts->x_flag_stack_check = (STACK_CHECK_BUILTIN > + ? FULL_BUILTIN_STACK_CHECK > + : (STACK_CHECK_STATIC_BUILTIN > + ? STACK_CLASH_BUILTIN_STACK_CHECK > + : GENERIC_STACK_CHECK)); > + } So targets that define STACK_CHECK_BUILTIN (spu and alpha) do not get stack clash protection if you asked for it specifically, without warning, if I read that correctly? > +# Return 1 if the target has support for stack probing designed > +# to avoid stack-clash style attacks > +# > +# This is used to restrict the stack-clash mitigation tests to > +# just those targets that have been explicitly supported Missing full stop, twice. More later in the file. > +proc check_effective_target_stack_clash_protected { } { The name is maybe not so great: nothing is protected until you actually use the option. "supported", maybe? > +# Return 1 if the target's calling sequence or its ABI > +# create implicit stack probes at *sp at function > +# entry. > +proc check_effective_target_caller_implicit_probes { } { "At function entry" isn't really true for Power ("when setting up a stack frame", instead -- and you are required to set one up before calling anything). > +# The stack realignment often causes residual stack allocations and > +# can also make it difficult to elide loops or residual allocations > +# for dynamic allocations > +proc check_effective_target_callee_realigns_stack { } { Does this return 1 if the callee always realigns the stack? Or maybe realigns the stack? Segher
On 07/12/2017 06:31 PM, Segher Boessenkool wrote: > Hi! > > On Tue, Jul 11, 2017 at 03:19:57PM -0600, Jeff Law wrote: >> It introduces a new style of stack probing -fstack-check=clash, >> including documentation of the new option, how it differs from >> -fstack-check=specific, etc. >> >> FWIW -fstack-check=specific is dreadfully named. I haven't tried to >> address that. > > -fstack-check=clash is itself not such a super name either. It's not > checking stack, and it isn't clashing: it just does a store to every > page the stack will touch (minus the first and last or something). Yea. I don't particularly like it either. Suggestions? I considered "probe" as well, but "specific" also does probing. In the end I used the part of the marketing name of the exploits. I also considered trying to separate the fact that we are doing stack probing from the exact method of probing. So something like: -fstack-check -fstack-check-probe=frob And the current default would map to -fstack-check -fstack-check-probe=ahead <probing ahead of need> I didn't particularly like that either and using something like -fstack-check -fstack-check-probe=frob is a bit cumbersome. It's also the case that we're changing two key aspects relative to -fstack-check=specific. 1. We never probe ahead of need. ie, if a function requests N bytes of stack space, then we'll never probe beyond N bytes. In contrast -fstack-check=specific will tend to probe 2-3 pages beyond the N byte request. 2. We probe as each page is allocated. in contrast most -fstack-check=specific implementations allocate all the space, then probe into the space. THe concept I keep coming back to is that we're changing probing policy. as-needed vs ahead-of-need for how much to probe. And "as-allocated" rather than "after-allocation" for when we emit the probe. Certainly open to ideas on the interface aspects. > > How does this work for targets that want to enable this by default? How > does that interact with checking for stack size overflow? I don't currently have a way to enable it by default -- for my tests I just slam the value I want into the initializer in common.opt :-) It's independent of stack size overflow checking. They could (in theory) even co-exist on ports that support stack size overflow checking, but I didn't test that. > >> However, a sufficiently smart compiler could realize that a call to a >> noreturn function could be converted into a jump, even if the caller has >> a frame because that frame need not be torn down. > > Currently GCC will never make a tail call to a noreturn function (see > calls.c:can_implement_as_sibling_call_p). It would be nice (for code > size, if nothing else) if that was changed. But you know all this :-) :-) Initially when I started looking at this issue I fully expected to need to explicitly reject tail calls to noreturn functions to make things safe. Imagine my surprise when I found out that we look at the preds of the exit block to find opportunities. Which obviously doesn't work with noreturn calls. On the theory that someone might want to change that one day, there is a test in the suite that will complain if we have a tail optimized call to a noreturn function. At the least it will spark a discussion about the validity of such an optimization in a world where jumping the guard is a concern. > >> @@ -11333,7 +11334,8 @@ target support in the compiler but comes with the following drawbacks: >> @enumerate >> @item >> Modified allocation strategy for large objects: they are always >> -allocated dynamically if their size exceeds a fixed threshold. >> +allocated dynamically if their size exceeds a fixed threshold. Note this >> +may change the semantics of some code. > > How so? Semantics of valid (portable) code, as well? It would be nice > to have some detail here, not just a threat :-) I'd have to go back and run the testsuite with generic checking enabled by default. I didn't dig deeply (because generic testing just isn't interesting for so many reasons). But what happened was we had a large auto object, which gets turned into an alloca'd object with generic checking. We created that alloca'd object at the wrong lexical scope which mucked up its expected persistence. I'm sure I'd spot it trivially once I set up the test again. > >> +@samp{clash} is designed to prevent jumping the stack guard page as seen in >> +stack clash style attacks. However, it does not guarantee sufficient stack >> +space to handle a signal in the event that the program hits the stack guard >> +and is thus incompatible with Ada's needs. > > Is there some way we could have both? With kernel support, yes. The kernel would set up a reserved stack area contiguous with the guard and the two areas would move in unison. Once they're unable to move, the kernel could map in the reserved page(s) and trigger the signal handler. > > In principle stack-clash protection is completely orthogonal to > -fstack-check. To some degree, yes. But on other levels they are tightly related. In fact, the indirection with get_stack_check_protect allows us to use the existing -fstack-check=specific prologue code in many targets to give a fairly high amount of stack-clash protection. The biggest problem is that -fstack-check=specific code allocates all the stack it needs. Then after allocation is complete it goes back and probes the pages it allocated. That leaves the target vulnerable to an async signal arriving after allocation, but before probing and the signal handler running on a clashed stack. > >> /* Check the stack and entirely rely on the target configuration >> - files, i.e. do not use the generic mechanism at all. */ >> + files, i.e. do not use the generic mechanism at all. This >> + does not prevent stack guard jumping and stack clash style >> + attacks. */ >> FULL_BUILTIN_STACK_CHECK >> }; > >> + else if (!strcmp (arg, "clash")) >> + { >> + /* This is the stack checking method, designed to prevent >> + stack-clash attacks. */ >> + if (!STACK_GROWS_DOWNWARD) >> + sorry ("-fstack-check=clash not implemented on this target"); >> + else >> + opts->x_flag_stack_check = (STACK_CHECK_BUILTIN >> + ? FULL_BUILTIN_STACK_CHECK >> + : (STACK_CHECK_STATIC_BUILTIN >> + ? STACK_CLASH_BUILTIN_STACK_CHECK >> + : GENERIC_STACK_CHECK)); >> + } > > So targets that define STACK_CHECK_BUILTIN (spu and alpha) do not get > stack clash protection if you asked for it specifically, without warning, > if I read that correctly? That's an unknown. I'd have to dig into the guts of the alpha and spu to understand precisely how their STACK_CHECK_BUILTIN works. > >> +# Return 1 if the target has support for stack probing designed >> +# to avoid stack-clash style attacks >> +# >> +# This is used to restrict the stack-clash mitigation tests to >> +# just those targets that have been explicitly supported > > Missing full stop, twice. More later in the file. Will fix. > >> +proc check_effective_target_stack_clash_protected { } { > > The name is maybe not so great: nothing is protected until you actually > use the option. "supported", maybe? I hate all the names... "supports_stack_clash_protection" perhaps? > >> +# Return 1 if the target's calling sequence or its ABI >> +# create implicit stack probes at *sp at function >> +# entry. >> +proc check_effective_target_caller_implicit_probes { } { > > "At function entry" isn't really true for Power ("when setting up a > stack frame", instead -- and you are required to set one up before > calling anything). I think it's close enough -- I'll ponder a better name. s390x doesn't really have caller implicit probes either, but stack saves in the callee act like them (because the caller allocates the space for the callee's save area). > >> +# The stack realignment often causes residual stack allocations and >> +# can also make it difficult to elide loops or residual allocations >> +# for dynamic allocations >> +proc check_effective_target_callee_realigns_stack { } { > > Does this return 1 if the callee always realigns the stack? Or maybe > realigns the stack? Yea, I can clarify that it's the "callee may realign the stack". Thanks for the comments questions. I've got more to respond to, but probably won't get to it tonight. jeff
Hi again, On Wed, Jul 12, 2017 at 09:56:09PM -0600, Jeff Law wrote: > >> FWIW -fstack-check=specific is dreadfully named. I haven't tried to > >> address that. > > > > -fstack-check=clash is itself not such a super name either. It's not > > checking stack, and it isn't clashing: it just does a store to every > > page the stack will touch (minus the first and last or something). > Yea. I don't particularly like it either. Suggestions? I considered > "probe" as well, but "specific" also does probing. In the end I used > the part of the marketing name of the exploits. I don't think it should be inside -fstack-check at all. Sure, the mechanisms implementing it overlap a bit (more on some targets, less on others), but how will a user ask for clash protection _and_ for stack checking? So something like -fstack-clash-protection, -fstack-touch-all-pages, together with whatever -fstack-check option is wanted (and yes the existing ones have very non-specific, non-descriptive, non-obvious names). > 1. We never probe ahead of need. ie, if a function requests N bytes of > stack space, then we'll never probe beyond N bytes. In contrast > -fstack-check=specific will tend to probe 2-3 pages beyond the N byte > request. s/need/immediate need/. Nod. > 2. We probe as each page is allocated. in contrast most > -fstack-check=specific implementations allocate all the space, then > probe into the space. Right, and that leaves some dangerous openings for exploitation. > Certainly open to ideas on the interface aspects. The interface is much harder to change than any aspect of the GCC implementation. Have to get it right at once, almost. > > How does this work for targets that want to enable this by default? How > > does that interact with checking for stack size overflow? > I don't currently have a way to enable it by default -- for my tests I > just slam the value I want into the initializer in common.opt :-) > > It's independent of stack size overflow checking. They could (in > theory) even co-exist on ports that support stack size overflow > checking, but I didn't test that. Okay, that was my impression as well. But that interface won't allow it (unless every -fstack-check=X gets an -fstack-check=X+clash twin). > We created that alloca'd object at the wrong lexical scope which mucked > up its expected persistence. I'm sure I'd spot it trivially once I set > up the test again. That sounds like it might be a bug even. > >> /* Check the stack and entirely rely on the target configuration > >> - files, i.e. do not use the generic mechanism at all. */ > >> + files, i.e. do not use the generic mechanism at all. This > >> + does not prevent stack guard jumping and stack clash style > >> + attacks. */ > >> FULL_BUILTIN_STACK_CHECK > >> }; > > > >> + else if (!strcmp (arg, "clash")) > >> + { > >> + /* This is the stack checking method, designed to prevent > >> + stack-clash attacks. */ > >> + if (!STACK_GROWS_DOWNWARD) > >> + sorry ("-fstack-check=clash not implemented on this target"); > >> + else > >> + opts->x_flag_stack_check = (STACK_CHECK_BUILTIN > >> + ? FULL_BUILTIN_STACK_CHECK > >> + : (STACK_CHECK_STATIC_BUILTIN > >> + ? STACK_CLASH_BUILTIN_STACK_CHECK > >> + : GENERIC_STACK_CHECK)); > >> + } > > > > So targets that define STACK_CHECK_BUILTIN (spu and alpha) do not get > > stack clash protection if you asked for it specifically, without warning, > > if I read that correctly? > That's an unknown. I'd have to dig into the guts of the alpha and spu > to understand precisely how their STACK_CHECK_BUILTIN works. Both just define it to 1. So this code then specialises to else if (!strcmp (arg, "clash")) { opts->x_flag_stack_check = FULL_BUILTIN_STACK_CHECK; } which it says above does *not* protect against stack clash attacks. Which seems backward. > >> +proc check_effective_target_stack_clash_protected { } { > > > > The name is maybe not so great: nothing is protected until you actually > > use the option. "supported", maybe? > I hate all the names... "supports_stack_clash_protection" perhaps? Works for me! > >> +# Return 1 if the target's calling sequence or its ABI > >> +# create implicit stack probes at *sp at function > >> +# entry. > >> +proc check_effective_target_caller_implicit_probes { } { > > > > "At function entry" isn't really true for Power ("when setting up a > > stack frame", instead -- and you are required to set one up before > > calling anything). > I think it's close enough -- I'll ponder a better name. s390x doesn't > really have caller implicit probes either, but stack saves in the callee > act like them (because the caller allocates the space for the callee's > save area). Oh the name is fine, but maybe expand the comment a bit. Or even just s/at function entry/for every function/ ? Segher
On 07/13/2017 03:32 PM, Segher Boessenkool wrote: > Hi again, > > On Wed, Jul 12, 2017 at 09:56:09PM -0600, Jeff Law wrote: >>>> FWIW -fstack-check=specific is dreadfully named. I haven't tried to >>>> address that. >>> >>> -fstack-check=clash is itself not such a super name either. It's not >>> checking stack, and it isn't clashing: it just does a store to every >>> page the stack will touch (minus the first and last or something). >> Yea. I don't particularly like it either. Suggestions? I considered >> "probe" as well, but "specific" also does probing. In the end I used >> the part of the marketing name of the exploits. > > I don't think it should be inside -fstack-check at all. Sure, the > mechanisms implementing it overlap a bit (more on some targets, less > on others), but how will a user ask for clash protection _and_ for > stack checking? The biggest problem with separating them is we would end up with a fair amount code that ultimately looks like if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK || flag_whatever_the_new_thing_is) { probe the stack } We can certainly do that. It'll touch a few more places (particularly in backends that have checks for Ada stack overflow, but for which I haven't written stack clash protection), but the changes would be simple and repetitive. > So something like -fstack-clash-protection, -fstack-touch-all-pages, > together with whatever -fstack-check option is wanted (and yes the > existing ones have very non-specific, non-descriptive, non-obvious names). Given the heavy use of "stack clash" in the press I'd lean towards -fstack-clash-protection. THat makes it easier for folks to find the right option to protect themselves. >> Certainly open to ideas on the interface aspects. > > The interface is much harder to change than any aspect of the GCC > implementation. Have to get it right at once, almost. Yup. It gets backed in even faster than normal in this case I suspect. Red Hat will almost certainly pick up the bits and start backporting them to RHEL 7, RHEL 6 and RHEL 5. So the flag will be out in the wild before gcc-8 hits the streets. If you go back to my original message from several weeks ago, getting the UI right and the basic concepts were my primary goals. Details like what probe instruction to use are, IMHO, much less important. > >>> How does this work for targets that want to enable this by default? How >>> does that interact with checking for stack size overflow? >> I don't currently have a way to enable it by default -- for my tests I >> just slam the value I want into the initializer in common.opt :-) >> >> It's independent of stack size overflow checking. They could (in >> theory) even co-exist on ports that support stack size overflow >> checking, but I didn't test that. > > Okay, that was my impression as well. But that interface won't allow > it (unless every -fstack-check=X gets an -fstack-check=X+clash twin). THere may have been some mis-communication here, but it's a good place to spend some time thinking about interplay between options. First, there's -fstack-limit-*. Essentially there's an RTX which defines the limit for the stack pointer. If you cross that limit a trap gets executed. PPC, s390 and likely other ports have support for this. It should interoperate with -fstack-clash-protection and/or -fstack-check. Some ports have their own probing option. x86 comes to mind. The x86 uses that for stack probing on windows where extending the stack has to happen explicitly (which is why they're not subject to stack-clash style attacks). The options probably shouldn't be mixed. There's -fstack-check and -fstack-clash-protection. I think with the direction we're going they are fundamentally incompatible because neither the compiler nor kernel do anything to guarantee enough stack is available after hitting the guard for -fstack-clash-protection. And there's shrink wrapping. This was originally raised by the aarch64 guys and deserves some thought. I'm particularly worried about aarch64 if it was to shrink-wrap some particular register saves that we wanted to use as an implicit probe. There may be others that I'm missing. > >> We created that alloca'd object at the wrong lexical scope which mucked >> up its expected persistence. I'm sure I'd spot it trivially once I set >> up the test again. > > That sounds like it might be a bug even. That was my thought as well. It's c-torture/execute/20071029-1.c. > >>>> /* Check the stack and entirely rely on the target configuration >>>> - files, i.e. do not use the generic mechanism at all. */ >>>> + files, i.e. do not use the generic mechanism at all. This >>>> + does not prevent stack guard jumping and stack clash style >>>> + attacks. */ >>>> FULL_BUILTIN_STACK_CHECK >>>> }; >>> >>>> + else if (!strcmp (arg, "clash")) >>>> + { >>>> + /* This is the stack checking method, designed to prevent >>>> + stack-clash attacks. */ >>>> + if (!STACK_GROWS_DOWNWARD) >>>> + sorry ("-fstack-check=clash not implemented on this target"); >>>> + else >>>> + opts->x_flag_stack_check = (STACK_CHECK_BUILTIN >>>> + ? FULL_BUILTIN_STACK_CHECK >>>> + : (STACK_CHECK_STATIC_BUILTIN >>>> + ? STACK_CLASH_BUILTIN_STACK_CHECK >>>> + : GENERIC_STACK_CHECK)); >>>> + } >>> >>> So targets that define STACK_CHECK_BUILTIN (spu and alpha) do not get >>> stack clash protection if you asked for it specifically, without warning, >>> if I read that correctly? >> That's an unknown. I'd have to dig into the guts of the alpha and spu >> to understand precisely how their STACK_CHECK_BUILTIN works. > > Both just define it to 1. So this code then specialises to > > else if (!strcmp (arg, "clash")) > { > opts->x_flag_stack_check = FULL_BUILTIN_STACK_CHECK; > } > > which it says above does *not* protect against stack clash attacks. > Which seems backward. Right. That was one of the other reasons why my initial patches left those enums alone and introduced a new flag -- probing policy. Those enum constants, depending on how you read them almost imply levels of backend support for Ada style stack overflow checking. That was one of the reasons why early (not posted) versions of my patch left that stuff alone -- instead they introduced the separate concept of probing policy. Jakub argued internally to put things under -fstack-check. > >>>> +proc check_effective_target_stack_clash_protected { } { >>> >>> The name is maybe not so great: nothing is protected until you actually >>> use the option. "supported", maybe? >> I hate all the names... "supports_stack_clash_protection" perhaps? > > Works for me! > >>>> +# Return 1 if the target's calling sequence or its ABI >>>> +# create implicit stack probes at *sp at function >>>> +# entry. >>>> +proc check_effective_target_caller_implicit_probes { } { >>> >>> "At function entry" isn't really true for Power ("when setting up a >>> stack frame", instead -- and you are required to set one up before >>> calling anything). >> I think it's close enough -- I'll ponder a better name. s390x doesn't >> really have caller implicit probes either, but stack saves in the callee >> act like them (because the caller allocates the space for the callee's >> save area). > > Oh the name is fine, but maybe expand the comment a bit. Or even just > s/at function entry/for every function/ ? I'll use "at function entry" if I can't come up with something better and expand the comments. None of the options are strictly correct for s390, but "at function entry" is less incorrect than "for every function". jeff
On Thu, Jul 13, 2017 at 04:38:00PM -0600, Jeff Law wrote: > On 07/13/2017 03:32 PM, Segher Boessenkool wrote: > >>> -fstack-check=clash is itself not such a super name either. It's not > >>> checking stack, and it isn't clashing: it just does a store to every > >>> page the stack will touch (minus the first and last or something). > >> Yea. I don't particularly like it either. Suggestions? I considered > >> "probe" as well, but "specific" also does probing. In the end I used > >> the part of the marketing name of the exploits. > > > > I don't think it should be inside -fstack-check at all. Sure, the > > mechanisms implementing it overlap a bit (more on some targets, less > > on others), but how will a user ask for clash protection _and_ for > > stack checking? > The biggest problem with separating them is we would end up with a fair > amount code that ultimately looks like > > if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK > || flag_whatever_the_new_thing_is) > { > probe the stack > } But only in backends, right? And some backends will be actually simpler, afaics. > We can certainly do that. It'll touch a few more places (particularly > in backends that have checks for Ada stack overflow, but for which I > haven't written stack clash protection), but the changes would be simple > and repetitive. Sounds good :-) > >> Certainly open to ideas on the interface aspects. > > > > The interface is much harder to change than any aspect of the GCC > > implementation. Have to get it right at once, almost. > Yup. It gets backed in even faster than normal in this case I suspect. > Red Hat will almost certainly pick up the bits and start backporting > them to RHEL 7, RHEL 6 and RHEL 5. So the flag will be out in the wild > before gcc-8 hits the streets. Are you planning to backport it for mainline GCC as well? > >> It's independent of stack size overflow checking. They could (in > >> theory) even co-exist on ports that support stack size overflow > >> checking, but I didn't test that. > > > > Okay, that was my impression as well. But that interface won't allow > > it (unless every -fstack-check=X gets an -fstack-check=X+clash twin). > THere may have been some mis-communication here, but it's a good place > to spend some time thinking about interplay between options. > > First, there's -fstack-limit-*. Essentially there's an RTX which > defines the limit for the stack pointer. If you cross that limit a trap > gets executed. PPC, s390 and likely other ports have support for this. > It should interoperate with -fstack-clash-protection and/or -fstack-check. Yup. > There's -fstack-check and -fstack-clash-protection. I think with the > direction we're going they are fundamentally incompatible because > neither the compiler nor kernel do anything to guarantee enough stack is > available after hitting the guard for -fstack-clash-protection. Hrm, I have to think about that. > And there's shrink wrapping. This was originally raised by the aarch64 > guys and deserves some thought. I'm particularly worried about aarch64 > if it was to shrink-wrap some particular register saves that we wanted > to use as an implicit probe. For normal shrink-wrapping, the volatile reg stores will always happen in the same prologue that sets up the stack frame as well, so there won't be any problem (except the usual stack ties you might need in the prologue, etc.) (*) For separate shrink-wrapping, yeah you'll need any bb that could potentially store to the stack require the component for the register you use as implicit probe. This is pretty nasty. > >> We created that alloca'd object at the wrong lexical scope which mucked > >> up its expected persistence. I'm sure I'd spot it trivially once I set > >> up the test again. > > > > That sounds like it might be a bug even. > That was my thought as well. It's c-torture/execute/20071029-1.c. Ah cool, thanks for digging it up. Segher (*) Related, I know you don't want to scan generated code to see if all probes are in place, but it seems you pretty much have to if you want to make sure no later pass deletes the probes. Well, something for targets to worry about :-)
On Tue, 2017-07-11 at 15:19 -0600, Jeff Law wrote: [...] > diff --git a/gcc/opts.c b/gcc/opts.c > index 7460c2b..61f5bb0 100644 > --- a/gcc/opts.c > +++ b/gcc/opts.c > @@ -2243,6 +2243,19 @@ common_handle_option (struct gcc_options > *opts, > opts->x_flag_stack_check = STACK_CHECK_BUILTIN > ? FULL_BUILTIN_STACK_CHECK > : GENERIC_STACK_CHECK; > + else if (!strcmp (arg, "clash")) > + { > + /* This is the stack checking method, designed to prevent > + stack-clash attacks. */ > + if (!STACK_GROWS_DOWNWARD) > + sorry ("-fstack-check=clash not implemented on this > target"); Minor nitpick: shouldn't options be quoted in diagnostics? So this should be: sorry ("%<-fstack-check=clash%> not implemented on this target"); (or whatever it ends up being called) [...]
On 07/13/2017 05:37 PM, Segher Boessenkool wrote: > On Thu, Jul 13, 2017 at 04:38:00PM -0600, Jeff Law wrote: >> On 07/13/2017 03:32 PM, Segher Boessenkool wrote: >>>>> -fstack-check=clash is itself not such a super name either. It's not >>>>> checking stack, and it isn't clashing: it just does a store to every >>>>> page the stack will touch (minus the first and last or something). >>>> Yea. I don't particularly like it either. Suggestions? I considered >>>> "probe" as well, but "specific" also does probing. In the end I used >>>> the part of the marketing name of the exploits. >>> >>> I don't think it should be inside -fstack-check at all. Sure, the >>> mechanisms implementing it overlap a bit (more on some targets, less >>> on others), but how will a user ask for clash protection _and_ for >>> stack checking? >> The biggest problem with separating them is we would end up with a fair >> amount code that ultimately looks like >> >> if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK >> || flag_whatever_the_new_thing_is) >> { >> probe the stack >> } > > But only in backends, right? And some backends will be actually > simpler, afaics. I suspect we might end up with one in explow.c as well. But yes, it'd primarily be buried in the targets. >> Yup. It gets baked in even faster than normal in this case I suspect. >> Red Hat will almost certainly pick up the bits and start backporting >> them to RHEL 7, RHEL 6 and RHEL 5. So the flag will be out in the wild >> before gcc-8 hits the streets. > > Are you planning to backport it for mainline GCC as well? The hope is we reach a consensus for mainline GCC, then we backport whatever goes into mainline to the relevant RHEL releases. THe backports into RHEL would likely happen before gcc-8 hits the streets. I've got a little time before I have to invert that process and go into RHEL first. I *really* want to avoid that for a multitude of reasons. > >> There's -fstack-check and -fstack-clash-protection. I think with the >> direction we're going they are fundamentally incompatible because >> neither the compiler nor kernel do anything to guarantee enough stack is >> available after hitting the guard for -fstack-clash-protection. > > Hrm, I have to think about that. Even if the kernel implements the reserved page stuff mentioned earlier in the thread, I'm not sure when we'd be able to reliably depend on that capability (or even check for it). ISTM that -fstack-check continues to be Ada centric and we have a new option to deal with stack-clash protection and the two simply just aren't allowed to be enabled together. > >> And there's shrink wrapping. This was originally raised by the aarch64 >> guys and deserves some thought. I'm particularly worried about aarch64 >> if it was to shrink-wrap some particular register saves that we wanted >> to use as an implicit probe. > > For normal shrink-wrapping, the volatile reg stores will always happen > in the same prologue that sets up the stack frame as well, so there > won't be any problem (except the usual stack ties you might need in the > prologue, etc.) (*) For separate shrink-wrapping, yeah you'll need any > bb that could potentially store to the stack require the component for > the register you use as implicit probe. This is pretty nasty. I'm going to try and look at this tomorrow to get a feel for the aarch64 implementation separate shrink wrapping. For ppc I'm not too worried -- the implicit backchain probes are so effective at eliminating explicit probes that I didn't bother writing any code to track the register saves as implicit probes. I'm pretty sure it just works on ppc with separate shrink wrapping. > >>>> We created that alloca'd object at the wrong lexical scope which mucked >>>> up its expected persistence. I'm sure I'd spot it trivially once I set >>>> up the test again. >>> >>> That sounds like it might be a bug even. >> That was my thought as well. It's c-torture/execute/20071029-1.c. > > Ah cool, thanks for digging it up. NP. As expected, it was trivial to slam in the initializer and rerun the testsuite. As soon as I saw the failure and looked at the source I knew I'd found it again. The other thing to remember about -fstack-check=generic is that once your frame gets big, it just gives up. That's why it takes large automatic objects and turns them into alloca'd objects -- to reduce the size of the prologue allocated frame. I looked at -fstack-check=generic just long enough to realize it wasn't really an option to cover s390. Then I promptly moved onto more useful pursuits. > (*) Related, I know you don't want to scan generated code to see if > all probes are in place, but it seems you pretty much have to if you > want to make sure no later pass deletes the probes. Well, something > for targets to worry about :-) Right. The dumping that's currently done just gives us a view into what the prologue code wanted to emit. Passes after prologue generation can get in the way -- combine-stack-adjustments comes immediately to mind. end-to-end testing absolutely requires scanning the final rtl dump or the assembly code or DSOs/executables. My plan was to fault that in as-needed. For c-s-a, we've introduced a new note that gets attached to the allocation step when protecting against stack-clash. If c-s-a sees that note it will refuse to combine the stack adjustment with any others. I wrote a quick and dirty x86 specific test for that. You'll find those bits in patch #5. FOr some architectures you can get a good sense of problematical sequences with objdump and some greps. It's far from automated and nowhere near something I'd submit into GCC :-) Jeff
On 07/13/2017 06:37 PM, David Malcolm wrote: > On Tue, 2017-07-11 at 15:19 -0600, Jeff Law wrote: > > [...] > >> diff --git a/gcc/opts.c b/gcc/opts.c >> index 7460c2b..61f5bb0 100644 >> --- a/gcc/opts.c >> +++ b/gcc/opts.c >> @@ -2243,6 +2243,19 @@ common_handle_option (struct gcc_options >> *opts, >> opts->x_flag_stack_check = STACK_CHECK_BUILTIN >> ? FULL_BUILTIN_STACK_CHECK >> : GENERIC_STACK_CHECK; >> + else if (!strcmp (arg, "clash")) >> + { >> + /* This is the stack checking method, designed to prevent >> + stack-clash attacks. */ >> + if (!STACK_GROWS_DOWNWARD) >> + sorry ("-fstack-check=clash not implemented on this >> target"); > > Minor nitpick: shouldn't options be quoted in diagnostics? > > So this should be: > > sorry ("%<-fstack-check=clash%> not implemented on this target"); > > (or whatever it ends up being called) Thanks. Fixed for the next version. jeff
On Thu, Jul 13, 2017 at 04:32:02PM -0500, Segher Boessenkool wrote: > I don't think it should be inside -fstack-check at all. Sure, the > mechanisms implementing it overlap a bit (more on some targets, less > on others), but how will a user ask for clash protection _and_ for > stack checking? Are we willing to implement that? What would we do in that case? Probe in each function we emit both the current page that -fstack-check=clash would probe and a page or how many below that how -fstack-check=specific or generic would probe? Even if yes, we can still use -fstack-check=clash,specific for that (similarly to how we support -fsanitize=shift,unreachable). But it is still a stack check kind. Jakub
On 07/14/2017 01:40 AM, Jakub Jelinek wrote: > On Thu, Jul 13, 2017 at 04:32:02PM -0500, Segher Boessenkool wrote: >> I don't think it should be inside -fstack-check at all. Sure, the >> mechanisms implementing it overlap a bit (more on some targets, less >> on others), but how will a user ask for clash protection _and_ for >> stack checking? > > Are we willing to implement that? What would we do in that case? I'd change every existing target that has a backend stack probing implementatoin to use a moving-sp style. ie, allocate page, probe page, allocate page, probe page. We'd then want to use the -fstack-check routines rather than the new routines. But in the end I don't think using the options together makes all that much sense. We really should look at -fstack-check as Ada specific, even though its implementation is in the target files and middle end. Jeff
On Thu, Jul 13, 2017 at 10:35:55PM -0600, Jeff Law wrote: > >> There's -fstack-check and -fstack-clash-protection. I think with the > >> direction we're going they are fundamentally incompatible because > >> neither the compiler nor kernel do anything to guarantee enough stack is > >> available after hitting the guard for -fstack-clash-protection. > > > > Hrm, I have to think about that. > Even if the kernel implements the reserved page stuff mentioned earlier > in the thread, I'm not sure when we'd be able to reliably depend on that > capability (or even check for it). > > ISTM that -fstack-check continues to be Ada centric and we have a new > option to deal with stack-clash protection and the two simply just > aren't allowed to be enabled together. So this essentially means Ada programs cannot get stack clash protection at all? And if -fstack-check is really only useful for Ada, we shouldn't mix up that option with the stack clash thing. > >> And there's shrink wrapping. This was originally raised by the aarch64 > >> guys and deserves some thought. I'm particularly worried about aarch64 > >> if it was to shrink-wrap some particular register saves that we wanted > >> to use as an implicit probe. > > > > For normal shrink-wrapping, the volatile reg stores will always happen > > in the same prologue that sets up the stack frame as well, so there > > won't be any problem (except the usual stack ties you might need in the > > prologue, etc.) (*) For separate shrink-wrapping, yeah you'll need any > > bb that could potentially store to the stack require the component for > > the register you use as implicit probe. This is pretty nasty. > I'm going to try and look at this tomorrow to get a feel for the aarch64 > implementation separate shrink wrapping. > > For ppc I'm not too worried -- the implicit backchain probes are so > effective at eliminating explicit probes that I didn't bother writing > any code to track the register saves as implicit probes. I'm pretty > sure it just works on ppc with separate shrink wrapping. With what is in trunk so far, yeah ;-) In principle the "set up a stack frame" part of the prologue can be made a separate shrink-wrapping component and then we'd have to worry. But it would be complicated to do that and there is no real benefit I see so far, so it won't happen. Nothing for you to have to worry about in any case :-) Segher
On 07/14/2017 10:34 AM, Segher Boessenkool wrote: > On Thu, Jul 13, 2017 at 10:35:55PM -0600, Jeff Law wrote: >>>> There's -fstack-check and -fstack-clash-protection. I think with the >>>> direction we're going they are fundamentally incompatible because >>>> neither the compiler nor kernel do anything to guarantee enough stack is >>>> available after hitting the guard for -fstack-clash-protection. >>> >>> Hrm, I have to think about that. >> Even if the kernel implements the reserved page stuff mentioned earlier >> in the thread, I'm not sure when we'd be able to reliably depend on that >> capability (or even check for it). >> >> ISTM that -fstack-check continues to be Ada centric and we have a new >> option to deal with stack-clash protection and the two simply just >> aren't allowed to be enabled together. > > So this essentially means Ada programs cannot get stack clash protection > at all? And if -fstack-check is really only useful for Ada, we shouldn't > mix up that option with the stack clash thing. It's not a simple yes/no. For Ada the model is that the entire application will be compiled with -fstack-check and that each function that allocates space probes 2-3 pages beyond their immediate need to ensure space to run the signal handler (damn the large register files :-) Those are absolutely critical to keep in mind. In combination they allow prologues to skip the first 2-3 page probes (they were done earlier in the call chain). So from a standpoint of probing each and every allocated page, Ada with -fstack-check will do that if and only if every function is compiled with -fstack-check. But the actual allocation/probing tends to look like this in the prologues allocate all the stack space probe third page probe fourth page probe fifth page etc So there's a race condition between the allocation point and when the probes execute which means an async signal handler could be running on a clashed stack/heap. So an Ada program fully compiled with -fstack-check will mostly be protected from a stack-clash style attack. They could achieve better protection by fixing the -fstack-check=specific prologue sequences so that they allocate and probe a page at a time. That (mostly) closes the issue with signal handlers running on a clashed stack/heap. They could achieve a level of protection in mixed code environments by not skipping page probes. But that would result in significantly worse code given the current implementation of -fstack-check. > > With what is in trunk so far, yeah ;-) In principle the "set up a stack > frame" part of the prologue can be made a separate shrink-wrapping > component and then we'd have to worry. But it would be complicated to > do that and there is no real benefit I see so far, so it won't happen. > Nothing for you to have to worry about in any case :-) That's consistent with my thoughts as well. You could separately wrap the allocation, but it's not likely to be that beneficial and it would seem to add a fair amount of complexity. jeff
diff --git a/gcc/common.opt b/gcc/common.opt index e81165c..8eec29f 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2300,7 +2300,7 @@ Apply variable expansion when loops are unrolled. fstack-check= Common Report RejectNegative Joined --fstack-check=[no|generic|specific] Insert stack checking code into the program. +-fstack-check=[no|generic|clash|specific] Insert stack checking code into the program. fstack-check Common Alias(fstack-check=, specific, no) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 3e5cee8..e72f3e9 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -11324,8 +11324,9 @@ generation of code to ensure that they see the stack being extended. You can additionally specify a string parameter: @samp{no} means no checking, @samp{generic} means force the use of old-style checking, -@samp{specific} means use the best checking method and is equivalent -to bare @option{-fstack-check}. +@samp{clash} means use a checking method designed to prevent stack clash +style attacks, @samp{specific} means use target specific +checking methods and is equivalent to bare @option{-fstack-check}. Old-style checking is a generic mechanism that requires no specific target support in the compiler but comes with the following drawbacks: @@ -11333,7 +11334,8 @@ target support in the compiler but comes with the following drawbacks: @enumerate @item Modified allocation strategy for large objects: they are always -allocated dynamically if their size exceeds a fixed threshold. +allocated dynamically if their size exceeds a fixed threshold. Note this +may change the semantics of some code. @item Fixed limit on the size of the static frame of functions: when it is @@ -11345,8 +11347,25 @@ Inefficiency: because of both the modified allocation strategy and the generic implementation, code performance is hampered. @end enumerate -Note that old-style stack checking is also the fallback method for -@samp{specific} if no target support has been added in the compiler. +Note that old-style stack checking is the fallback method for @samp{clash} +and @samp{specific} if no target support for either of those has been added +in the compiler. + +Also note that @samp{clash} requires target dependent prologue sequences that +are different than @samp{specific}. However, some targets may use +@samp{specific} style prologues if they have not defined @samp{clash} style +prologues at the loss of some functionality (such as @command{valgrind}) and +protection in certain difficult to trigger corner cases. + +@samp{specific} style checking is designed for Ada's needs to detect +infinite recursion and stack overflows. @samp{specific} is an excellent +choice when compiling Ada code. It is not generally sufficient to protect +against attacks that jump the stack guard page. + +@samp{clash} is designed to prevent jumping the stack guard page as seen in +stack clash style attacks. However, it does not guarantee sufficient stack +space to handle a signal in the event that the program hits the stack guard +and is thus incompatible with Ada's needs. @item -fstack-limit-register=@var{reg} @itemx -fstack-limit-symbol=@var{sym} diff --git a/gcc/flag-types.h b/gcc/flag-types.h index 5faade5..9fad822 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -178,11 +178,27 @@ enum stack_check_type /* Check the stack and rely on the target configuration files to check the static frame of functions, i.e. use the generic - mechanism only for dynamic stack allocations. */ + mechanism only for dynamic stack allocations. + + This approach attempts to guarantee enough stack space is always + available to handle a signal and assumes the entire program is + compiled with stack checking. */ STATIC_BUILTIN_STACK_CHECK, + /* Check the stack and rely on the target configuration files to + check the static frame of functions, i.e. use the generic + mechanism only for dynamic stack allocations. + + This approach attempts to make code immune to attacks which jump + the stack guard page and stack clash style attacks. It works + is mixed code, but does not guarantee the ability to handle a + signal. */ + STACK_CLASH_BUILTIN_STACK_CHECK, + /* Check the stack and entirely rely on the target configuration - files, i.e. do not use the generic mechanism at all. */ + files, i.e. do not use the generic mechanism at all. This + does not prevent stack guard jumping and stack clash style + attacks. */ FULL_BUILTIN_STACK_CHECK }; diff --git a/gcc/opts.c b/gcc/opts.c index 7460c2b..61f5bb0 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -2243,6 +2243,19 @@ common_handle_option (struct gcc_options *opts, opts->x_flag_stack_check = STACK_CHECK_BUILTIN ? FULL_BUILTIN_STACK_CHECK : GENERIC_STACK_CHECK; + else if (!strcmp (arg, "clash")) + { + /* This is the stack checking method, designed to prevent + stack-clash attacks. */ + if (!STACK_GROWS_DOWNWARD) + sorry ("-fstack-check=clash not implemented on this target"); + else + opts->x_flag_stack_check = (STACK_CHECK_BUILTIN + ? FULL_BUILTIN_STACK_CHECK + : (STACK_CHECK_STATIC_BUILTIN + ? STACK_CLASH_BUILTIN_STACK_CHECK + : GENERIC_STACK_CHECK)); + } else if (!strcmp (arg, "specific")) /* This is the new stack checking method. */ opts->x_flag_stack_check = STACK_CHECK_BUILTIN diff --git a/gcc/testsuite/gcc.dg/stack-check-2.c b/gcc/testsuite/gcc.dg/stack-check-2.c new file mode 100644 index 0000000..8be7dee --- /dev/null +++ b/gcc/testsuite/gcc.dg/stack-check-2.c @@ -0,0 +1,66 @@ +/* The goal here is to ensure that we never consider a call to a noreturn + function as a potential tail call. + + Right now GCC discovers potential tail calls by looking at the + predecessors of the exit block. A call to a non-return function + has no successors and thus can never match that first filter. + + But that could change one day and we want to catch it. The problem + is the compiler could potentially optimize a tail call to a nonreturn + function, even if the caller has a frame. That breaks the assumption + that calls probe *sp when saving the return address that some targets + depend on to elide stack probes. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fstack-check=clash -fdump-tree-tailc -fdump-tree-optimized" } */ +/* { dg-require-effective-target stack_clash_protected } */ + +extern void foo (void) __attribute__ ((__noreturn__)); + + +void +test_direct_1 (void) +{ + foo (); +} + +void +test_direct_2 (void) +{ + return foo (); +} + +void (*indirect)(void)__attribute__ ((noreturn)); + + +void +test_indirect_1 () +{ + (*indirect)(); +} + +void +test_indirect_2 (void) +{ + return (*indirect)();; +} + + +typedef void (*pvfn)() __attribute__ ((noreturn)); + +void (*indirect_casted)(void); + +void +test_indirect_casted_1 () +{ + (*(pvfn)indirect_casted)(); +} + +void +test_indirect_casted_2 (void) +{ + return (*(pvfn)indirect_casted)(); +} +/* { dg-final { scan-tree-dump-not "tail call" "tailc" } } */ +/* { dg-final { scan-tree-dump-not "tail call" "optimized" } } */ + diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index fe5e777..61ffb68 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -8468,3 +8468,61 @@ proc check_effective_target_arm_coproc4_ok { } { return [check_cached_effective_target arm_coproc4_ok \ check_effective_target_arm_coproc4_ok_nocache] } + +# Return 1 if the target has support for stack probing designed +# to avoid stack-clash style attacks +# +# This is used to restrict the stack-clash mitigation tests to +# just those targets that have been explicitly supported +# +# In addition to the prologue work on those targets, each target's +# properties should be described in the functions below so that +# tests do not become a mess of unreadable target conditions. +# +proc check_effective_target_stack_clash_protected { } { + if { [istarget aarch*-*-*] || [istarget x86_64-*-*] + || [istarget i?86-*-*] || [istarget s390*-*-*] + || [istarget powerpc*-*-*] || [istarget rs6000*-*-*] } { + return 1 + } + return 0 +} + +# Return 1 if the target creates a frame pointer for non-leaf functions +# Note we ignore cases where we apply tail call optimization here. +proc check_effective_target_frame_pointer_for_non_leaf { } { + if { [istarget aarch*-*-*] } { + return 1 + } + return 0 +} + +# Return 1 if the target's calling sequence or its ABI +# create implicit stack probes at *sp at function +# entry. +proc check_effective_target_caller_implicit_probes { } { + if { [istarget x86_64-*-*] || [istarget i?86-*-*] + || [istarget powerpc*-*-*] || [istarget rs6000*-*-*] } { + return 1 + } + + # s390's ABI has a register save area allocated by the + # caller for use by the callee. The mere existence does + # not constitute a probe by the caller, but when the slots + # used by the callee those stores are implicit probes + if { [istarget s390*-*-*] } { + return 1 + } + + return 0 +} + +# The stack realignment often causes residual stack allocations and +# can also make it difficult to elide loops or residual allocations +# for dynamic allocations +proc check_effective_target_callee_realigns_stack { } { + if { [istarget x86_64-*-*] || [istarget i?86-*-*] } { + return 1 + } + return 0 +}