From patchwork Wed Jan 31 05:35:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 867786 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-472340-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="ektsKQfr"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zWX600lFhz9ryQ for ; Wed, 31 Jan 2018 16:35:51 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=QFNz1jJXEeRGHq4Jtkd+Lj5SjYnF+o0Aq0X9VNJdlwmPg5ahKR TUJekuldo0o/k8G19rTX2+WcQwgyaykaQTEtbcMXAdwFmDvEyFSy1yVfI3hC2bzt uclWm1Q0eK2/+PIhhK8+NwWXiY8ySGaA5/LFG3S/bXyiHQgwTNw5S2biU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:message-id:date:mime-version:content-type; s= default; bh=0Hs9MJq3woWl1tRPHkVzQmEQ13s=; b=ektsKQfrT+XaVvE7KvTi h89RC8CLSGZuGRqWdXcNq+omECPpRlqjsdu0V3xuxqCFfFJwq5BJgmWBEC+zT6wI 3FS56TJ/PrM5jkcPAXHBAKYai3ewPzsnUjK0DlShgRwwttnm/LZz/yFs63kHtfK5 y1EdZCACsn2Tsy6QXGm5nbA= Received: (qmail 103395 invoked by alias); 31 Jan 2018 05:35:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 103385 invoked by uid 89); 31 Jan 2018 05:35:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_HELO_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=Thankfully X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 31 Jan 2018 05:35:41 +0000 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C1F9B36809; Wed, 31 Jan 2018 05:35:39 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-50.rdu2.redhat.com [10.10.112.50]) by smtp.corp.redhat.com (Postfix) with ESMTP id D8D5E5C543; Wed, 31 Jan 2018 05:35:38 +0000 (UTC) From: Jeff Law To: Uros Bizjak Cc: gcc-patches Subject: [PATCH][RFA][PR target/84128] Fix restore of scratch register used in probing loops Message-ID: <328b4297-f032-6bf0-9252-1d7ee7b50133@redhat.com> Date: Tue, 30 Jan 2018 22:35:37 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 X-IsSubscribed: yes Whee, fun, this appears to have been broken for a very long time, possibly since the introduction of -fstack-check in 2010. Thankfully it only affects 32 bit and only in relatively constrained circumstances. -fstack-check and -fstack-clash both potentially create a loop for stack probing. In that case they both need a scratch register to hold the loop upper bound. The code to allocate a scratch register first starts with the caller-saved registers as they're zero cost. Then it'll use any callee saved register that is actually saved. If neither is available (say because all the caller-saved regs are used for parameter passing and there are no callee saved register used), then the allocation routine will push %eax on the stack and the deallocation routine will pop it off to restore its value. Of course there is a *stack allocation* between those two points. So the pop ends up restoring garbage into the register. Obviously the restore routine needs to use reg+d addressing to get to the stack slot and the allocated space needs to be deallocated by the epilogue. But sadly there's enough assertions sprinkled around that prevent that from working as-is. So what this patch does is continue to use the push to allocate the register. And it uses reg+d to restore the register after the probing loops. The "trick" is that the space allocated by the push becomes part of the normal stack frame after we restore the scratch register's value. Ie, if we push a 4 byte register, then we reduce the size of the main allocation request by 4 bytes. And everything just works. Clearly this code had not be exercised. So I hacked up things so that we always generated a probing loop and always assumed that we needed to save/restore a scratch register and enabled stack clash protections by default. I bootstrapped that and compared testsuite runs against a run which just had stack clash protection on by default. That did turn up an issue, but it was with my testing hacks, not with this patch :-) I (of course) also did the usual bootstrap and regression tests, using x86_64 and i686. Hopefully this is the last iteration on the x86/x86_64 stack clash bits :-) The one concern I have is do we need to tell the CFI machinery that %eax's value was restored to its entry value? OK for the trunk? Or do we need some magic CFI bit to describe the restore of %eax? Thanks, Jeff PR target/84128 * config/i386/i386.c (release_scratch_register_on_entry): Add new OFFSET argument. Use it to restore the scratch register rathern than a pop insn. (ix86_adjust_stack_and_probe_stack_clash): Un-constify SIZE. If we have to save a temporary register, decrement SIZE appropriately. Pass SIZE as the offset to find the saved register into release_scratch_register_on_entry. (ix86_adjust_stack_and_probe): Likewise. (ix86_emit_probe_stack_range): Likewise. PR target/84128 * gcc.target/i386/pr84128.c: New test. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index fef34a1..93ce79c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12567,22 +12567,25 @@ get_scratch_register_on_entry (struct scratch_reg *sr) } } -/* Release a scratch register obtained from the preceding function. */ +/* Release a scratch register obtained from the preceding function. + + Note there will always be some kind of stack adjustment between + allocation and releasing the scratch register. So we can't just + pop the scratch register off the stack if we were forced to save it + (the stack pointer itself has a different value). + + Instead we're passed the offset into the stack where the value will + be found and the space becomes part of the local frame that is + deallocated by the epilogue. */ static void -release_scratch_register_on_entry (struct scratch_reg *sr) +release_scratch_register_on_entry (struct scratch_reg *sr, HOST_WIDE_INT offset) { if (sr->saved) { - struct machine_function *m = cfun->machine; - rtx x, insn = emit_insn (gen_pop (sr->reg)); - - /* The RTX_FRAME_RELATED_P mechanism doesn't know about pop. */ - RTX_FRAME_RELATED_P (insn) = 1; - x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (UNITS_PER_WORD)); - x = gen_rtx_SET (stack_pointer_rtx, x); - add_reg_note (insn, REG_FRAME_RELATED_EXPR, x); - m->fs.sp_offset -= UNITS_PER_WORD; + rtx x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (offset)); + x = gen_rtx_SET (sr->reg, gen_rtx_MEM (word_mode, x)); + emit_insn (x); } } @@ -12597,7 +12600,7 @@ release_scratch_register_on_entry (struct scratch_reg *sr) pushed on the stack. */ static void -ix86_adjust_stack_and_probe_stack_clash (const HOST_WIDE_INT size, +ix86_adjust_stack_and_probe_stack_clash (HOST_WIDE_INT size, const bool int_registers_saved) { struct machine_function *m = cfun->machine; @@ -12713,6 +12716,12 @@ ix86_adjust_stack_and_probe_stack_clash (const HOST_WIDE_INT size, struct scratch_reg sr; get_scratch_register_on_entry (&sr); + /* If we needed to save a register, then account for any space + that was pushed (we are not going to pop the register when + we do the restore). */ + if (sr.saved) + size -= UNITS_PER_WORD; + /* Step 1: round SIZE down to a multiple of the interval. */ HOST_WIDE_INT rounded_size = size & -probe_interval; @@ -12761,7 +12770,9 @@ ix86_adjust_stack_and_probe_stack_clash (const HOST_WIDE_INT size, m->fs.cfa_reg == stack_pointer_rtx); dump_stack_clash_frame_info (PROBE_LOOP, size != rounded_size); - release_scratch_register_on_entry (&sr); + /* This does not deallocate the space reserved for the scratch + register. That will be deallocated in the epilogue. */ + release_scratch_register_on_entry (&sr, size); } /* Make sure nothing is scheduled before we are done. */ @@ -12774,7 +12785,7 @@ ix86_adjust_stack_and_probe_stack_clash (const HOST_WIDE_INT size, pushed on the stack. */ static void -ix86_adjust_stack_and_probe (const HOST_WIDE_INT size, +ix86_adjust_stack_and_probe (HOST_WIDE_INT size, const bool int_registers_saved) { /* We skip the probe for the first interval + a small dope of 4 words and @@ -12847,6 +12858,11 @@ ix86_adjust_stack_and_probe (const HOST_WIDE_INT size, get_scratch_register_on_entry (&sr); + /* If we needed to save a register, then account for any space + that was pushed (we are not going to pop the register when + we do the restore). */ + if (sr.saved) + size -= UNITS_PER_WORD; /* Step 1: round SIZE to the previous multiple of the interval. */ @@ -12906,7 +12922,9 @@ ix86_adjust_stack_and_probe (const HOST_WIDE_INT size, (get_probe_interval () + dope)))); - release_scratch_register_on_entry (&sr); + /* This does not deallocate the space reserved for the scratch + register. That will be deallocated in the epilogue. */ + release_scratch_register_on_entry (&sr, size); } /* Even if the stack pointer isn't the CFA register, we need to correctly @@ -13015,6 +13033,11 @@ ix86_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size, get_scratch_register_on_entry (&sr); + /* If we needed to save a register, then account for any space + that was pushed (we are not going to pop the register when + we do the restore). */ + if (sr.saved) + size -= UNITS_PER_WORD; /* Step 1: round SIZE to the previous multiple of the interval. */ @@ -13055,7 +13078,9 @@ ix86_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size, sr.reg), rounded_size - size)); - release_scratch_register_on_entry (&sr); + /* This does not deallocate the space reserved for the scratch + register. That will be deallocated in the epilogue. */ + release_scratch_register_on_entry (&sr, size); } /* Make sure nothing is scheduled before we are done. */ diff --git a/gcc/testsuite/gcc.target/i386/pr84128.c b/gcc/testsuite/gcc.target/i386/pr84128.c new file mode 100644 index 0000000..a8323fd6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr84128.c @@ -0,0 +1,30 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -march=i686 -mtune=generic -fstack-clash-protection" } */ +/* { dg-require-effective-target ia32 } */ + +__attribute__ ((noinline, noclone, weak, regparm (3))) +int +f1 (long arg0, int (*pf) (long, void *)) +{ + unsigned char buf[32768]; + return pf (arg0, buf); +} + +__attribute__ ((noinline, noclone, weak)) +int +f2 (long arg0, void *ignored) +{ + if (arg0 != 17) + __builtin_abort (); + return 19; +} + +int +main (void) +{ + if (f1 (17, f2) != 19) + __builtin_abort (); + return 0; +} + +