From patchwork Wed Jul 19 05:17:55 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 790791 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-458469-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="N/vvuiTm"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xC51m1s5qz9t1t for ; Wed, 19 Jul 2017 15:19:39 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=EfeQqkHHDlAw+YEAlxThbQgiEB+5cZ0L8gVMxmOhoBN4XoDn8g Cd76zaSGQfcixME4eDkmr8ICBzy69NZ4I6dMJOEmmqMslp7RafIiMtZNoXE2kq1e qN/FPMOtZMcWepZHDvI11PCtfBF5BmjsNz9e/B7Tb5kb0xXSjEtiyxTEs= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; s= default; bh=0pB16pidqqiB8M7PyW5wDAjX1LA=; b=N/vvuiTmZHnT+zga5st1 pnFV50f+mNsxKHaNUuEHYRdCiWGwr2ljt4zEUViyGRJYv2iFdZgMxuCv/MtdKJlQ Lv4DBdHkUD3FddM9njSeHtmdgJwiR8KPVcU2y6rWUx3xRHpITbR1wg+BHe1LMUOS 6v3txfI1q8ih+d0CZRSh5h0= Received: (qmail 34368 invoked by alias); 19 Jul 2017 05:18:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 32826 invoked by uid 89); 19 Jul 2017 05:18:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=stacks, considerably, prohibitively X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 19 Jul 2017 05:17:57 +0000 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CCB5FC058EBD for ; Wed, 19 Jul 2017 05:17:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com CCB5FC058EBD Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=law@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com CCB5FC058EBD Received: from localhost.localdomain (ovpn-116-17.phx2.redhat.com [10.3.116.17]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9C10717119 for ; Wed, 19 Jul 2017 05:17:55 +0000 (UTC) To: gcc-patches From: Jeff Law Subject: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08 V2 Message-ID: <1c8a2b56-1ad1-153f-6081-d7d0120375aa@redhat.com> Date: Tue, 18 Jul 2017 23:17:55 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 X-IsSubscribed: yes So this patch has changed considerably since V1 as well. First, we no longer track the bulk of the register stores in the prologue. Those may be separately shrink wrapped and thus not executed on all paths and as such are not candidates for implicit probes. Second, per the discussions we've had on-list, we're less aggressive at probing. We assume the caller has not pushed us more than 1kbyte into the stack guard. Thus stacks of < 3kbytes in the callee need no probes. Third, the implicit probe tracking is simplified. I'm exceedingly happy to find out that we can never have a nonzero initial_adjust and callee_adjust at the same time. That's a significant help. We still use the save of lr/fp as an implicit probe. This ought to be much more efficient than the prior version. Hopefully this is closer to something the aarch64 maintainers are comfortable with. --- * config/aarch/aarch64.c (aarch64_output_probe_stack_range): Handle -fstack-clash-protection probing too. (aarch64_allocate_and_probe_stack_space): New function. (aarch64_expand_prologue): Assert we never have both an initial adjustment and callee save adjustment. Track distance between SP and most recent probe. Use aarch64_allocate_and_probe_stack_space when -fstack-clash-protect is enabled rather than just adjusting sp. Dump actions via dump_stack_clash_frame_info. static void aarch64_save_callee_saves (machine_mode mode, HOST_WIDE_INT start_offset, @@ -3605,6 +3617,68 @@ aarch64_set_handled_components (sbitmap components) cfun->machine->reg_is_wrapped_separately[regno] = true; } +/* Allocate SIZE bytes of stack space using SCRATCH_REG as a scratch + register. + + LAST_PROBE_OFFSET contains the offset between the stack pointer and + the last known probe. As LAST_PROBE_OFFSET crosses PROBE_INTERVAL + emit a probe and adjust LAST_PROBE_OFFSET. */ +static void +aarch64_allocate_and_probe_stack_space (int scratchreg, HOST_WIDE_INT size, + HOST_WIDE_INT *last_probe_offset) +{ + rtx temp = gen_rtx_REG (word_mode, scratchreg); + + HOST_WIDE_INT rounded_size = size & -PROBE_INTERVAL; + HOST_WIDE_INT residual = size - rounded_size; + + /* We can handle a small number of allocations/probes inline. Otherwise + punt to a loop. */ + if (rounded_size && rounded_size <= 4 * PROBE_INTERVAL) + { + for (HOST_WIDE_INT i = 0; i < rounded_size; i += PROBE_INTERVAL) + { + /* We should never need a scratch register for this adjustment. */ + aarch64_sub_sp (-1, PROBE_INTERVAL, true); + + /* We just allocated PROBE_INTERVAL bytes. Thus, a probe is + mandatory. Note that LAST_PROBE_OFFSET does not change here. */ + emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx, + (PROBE_INTERVAL + - GET_MODE_SIZE (word_mode)))); + } + dump_stack_clash_frame_info (PROBE_INLINE, size != rounded_size); + } + else if (rounded_size) + { + /* Compute the ending address. */ + emit_move_insn (temp, GEN_INT (-rounded_size)); + emit_insn (gen_add3_insn (temp, stack_pointer_rtx, temp)); + + /* This allocates and probes the stack. Like the inline version above + it does not need to change LAST_PROBE_OFFSET. + + It almost certainly does not update CFIs correctly. */ + emit_insn (gen_probe_stack_range (temp, temp, temp)); + dump_stack_clash_frame_info (PROBE_LOOP, size != rounded_size); + } + + /* Handle any residuals. */ + if (residual) + { + aarch64_sub_sp (-1, residual, true); + *last_probe_offset += residual; + if (*last_probe_offset >= PROBE_INTERVAL) + { + *last_probe_offset -= PROBE_INTERVAL; + emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx, + (residual + - GET_MODE_SIZE (word_mode)))); + } + } + return; +} + /* AArch64 stack frames generated by this compiler look like: +-------------------------------+ @@ -3686,10 +3760,59 @@ aarch64_expand_prologue (void) aarch64_emit_probe_stack_range (get_stack_check_protect (), frame_size); } - aarch64_sub_sp (IP0_REGNUM, initial_adjust, true); + /* We do not fully protect aarch64 against stack clash style attacks + as doing so would be prohibitively expensive. + + We assume that a caller can not push the stack pointer more than 1k + into the guard, which allows the current function to allocate up to + 3k of total space without any probing. + + In the relatively rare case where we are going to emit probes to + protect against stack-clash, we start the function with a probe + and probe every PROBE_INTERVAL bytes after that. + + We have to track how much space has been allocated, but we do not + track stores into the stack as implicit probes. */ + if (flag_stack_clash_protection) + { + if (frame_size == 0) + dump_stack_clash_frame_info (NO_PROBE_NO_FRAME, false); + else if (frame_size < 3 * 1024) + dump_stack_clash_frame_info (NO_PROBE_SMALL_FRAME, true); + else + { + /* This probes into the red zone, which is sub-optimal, but we + allow it to avoid the async signal race. */ + emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx, + - GET_MODE_SIZE (word_mode))); + } + } + + /* In theory we should never have both an initial adjustment + and a callee save adjustment. Verify that is the case since the + code below does not handle it for -fstack-clash-protection. */ + gcc_assert (initial_adjust == 0 || callee_adjust == 0); + + /* We have to track the offset to the last probe in the stack so that + we know when to emit probes for stack clash protection. + + If this function needs probes (most do not), then the code above + already emitted one. Thus we can consider the last probe into the + stack was at offset zero. */ + HOST_WIDE_INT last_probe_offset = 0; + if (flag_stack_clash_protection && initial_adjust != 0) + aarch64_allocate_and_probe_stack_space (IP0_REGNUM, initial_adjust, + &last_probe_offset); + else + aarch64_sub_sp (IP0_REGNUM, initial_adjust, true); if (callee_adjust != 0) - aarch64_push_regs (reg1, reg2, callee_adjust); + { + aarch64_push_regs (reg1, reg2, callee_adjust); + + /* We just wrote *sp, so we can trivially adjust LAST_PROBE_OFFSET. */ + last_probe_offset = 0; + } if (frame_pointer_needed) { @@ -3707,7 +3830,12 @@ aarch64_expand_prologue (void) callee_adjust != 0 || frame_pointer_needed); aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM, callee_adjust != 0 || frame_pointer_needed); - aarch64_sub_sp (IP1_REGNUM, final_adjust, !frame_pointer_needed); + + if (flag_stack_clash_protection && final_adjust != 0) + aarch64_allocate_and_probe_stack_space (IP1_REGNUM, final_adjust, + &last_probe_offset); + else + aarch64_sub_sp (IP1_REGNUM, final_adjust, !frame_pointer_needed); } /* Return TRUE if we can use a simple_return insn. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 0a8b40a..8764d62 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -2830,6 +2830,9 @@ aarch64_output_probe_stack_range (rtx reg1, rtx reg2) char loop_lab[32]; rtx xops[2]; + if (flag_stack_clash_protection) + reg1 = stack_pointer_rtx; + ASM_GENERATE_INTERNAL_LABEL (loop_lab, "LPSRL", labelno++); /* Loop. */ @@ -2841,7 +2844,14 @@ aarch64_output_probe_stack_range (rtx reg1, rtx reg2) output_asm_insn ("sub\t%0, %0, %1", xops); /* Probe at TEST_ADDR. */ - output_asm_insn ("str\txzr, [%0]", xops); + if (flag_stack_clash_protection) + { + gcc_assert (xops[0] == stack_pointer_rtx); + xops[1] = GEN_INT (PROBE_INTERVAL - 8); + output_asm_insn ("str\txzr, [%0, %1]", xops); + } + else + output_asm_insn ("str\txzr, [%0]", xops); /* Test if TEST_ADDR == LAST_ADDR. */ xops[1] = reg2;