From patchwork Wed May 25 13:30:54 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dominik Vogt X-Patchwork-Id: 626187 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rFCpp2Z0tz9s5w for ; Wed, 25 May 2016 23:31:14 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=bDsLcaBL; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:reply-to:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=agzSDUDPrBH7x3cDT iXCoKyZA5HD2EdSHIHeuEZ4+NJ4hDAzv0Kx1zkzDWL7s8JWEZgIMoGKUIoPqaFA2 zxrK0sNlqYx/AuykXR0EMz92iJ3tNdRg+ON9hZWPEprTLKdrouqwcZtZFU2dNdIx cYgurZdavEHqtK3/COegMCI64I= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:reply-to:references:mime-version :content-type:in-reply-to; s=default; bh=XLspib6M1H6drSlfC0+Ty1o xm5U=; b=bDsLcaBLNiR3CJO1DesmLV5TRP9fivTKbKbQk4SG4XypMsxKl7yy3xN Nw3aAAt7rpiwQeBS/pw3tHcXR6mwSPfsF2HyoFKDNRs7j1zHFUx93Fz0zhAqu04b /JCxHfJ2eulzeLnbGineY7ISd+K/OWCsoq7l6T5X01Eh4zN5Q5Cw= Received: (qmail 53086 invoked by alias); 25 May 2016 13:31:05 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 52166 invoked by uid 89); 25 May 2016 13:31:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=jeffs, Round, ist X-HELO: e06smtp11.uk.ibm.com Received: from e06smtp11.uk.ibm.com (HELO e06smtp11.uk.ibm.com) (195.75.94.107) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Wed, 25 May 2016 13:31:01 +0000 Received: from localhost by e06smtp11.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 25 May 2016 14:30:58 +0100 Received: from d06dlp01.portsmouth.uk.ibm.com (9.149.20.13) by e06smtp11.uk.ibm.com (192.168.101.141) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 25 May 2016 14:30:57 +0100 X-IBM-Helo: d06dlp01.portsmouth.uk.ibm.com X-IBM-MailFrom: vogt@linux.vnet.ibm.com X-IBM-RcptTo: gcc-patches@gcc.gnu.org Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id 1B6C817D8042 for ; Wed, 25 May 2016 14:32:00 +0100 (BST) Received: from d06av03.portsmouth.uk.ibm.com (d06av03.portsmouth.uk.ibm.com [9.149.37.213]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u4PDUuC611272662 for ; Wed, 25 May 2016 13:30:56 GMT Received: from d06av03.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av03.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u4PDUt9D008703 for ; Wed, 25 May 2016 07:30:55 -0600 Received: from bl3ahm9f.de.ibm.com (dyn-9-152-212-210.boeblingen.de.ibm.com [9.152.212.210]) by d06av03.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id u4PDUtq5008692; Wed, 25 May 2016 07:30:55 -0600 Received: from dvogt by bl3ahm9f.de.ibm.com with local (Exim 4.76) (envelope-from ) id 1b5Yti-000239-SF; Wed, 25 May 2016 15:30:54 +0200 Date: Wed, 25 May 2016 14:30:54 +0100 From: Dominik Vogt To: Jeff Law , gcc-patches@gcc.gnu.org, Andreas Krebbel Subject: Re: [PATCH 1/2][v3] Drop excess size used for run time allocated stack variables. Message-ID: <20160525133054.GA6938@linux.vnet.ibm.com> Reply-To: vogt@linux.vnet.ibm.com Mail-Followup-To: vogt@linux.vnet.ibm.com, Jeff Law , gcc-patches@gcc.gnu.org, Andreas Krebbel References: <20160429221242.GA2205@linux.vnet.ibm.com> <20160503141753.GA17351@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160503141753.GA17351@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16052513-0041-0000-0000-00001B9E858D X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused On Tue, May 03, 2016 at 03:17:53PM +0100, Dominik Vogt wrote: > Version two of the patch including a test case. > > On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote: > > On 04/29/2016 04:12 PM, Dominik Vogt wrote: > > >The attached patch removes excess stack space allocation with > > >alloca in some situations. Plese check the commit message in the > > >patch for details. > > > However, I would strongly recommend some tests, even if they are > > target specific. You can always copy pr36728-1 into the s390x > > directory and look at size of the generated stack. Simliarly for > > pr50938 for x86. > > However, x86 uses the "else" branch in round_push, i.e. it uses > "virtual_preferred_stack_boundary_rtx" to calculate the number of > bytes to add for stack alignment. That value is unknown at the > time round_push is called, so the test case fails on such targets, > and I've no idea how to fix this properly. Third version of the patch with the suggested cleanup in the first patch and the functional stuff in the second one. The first patch is based on Jeff's draft with the change suggested by Eric and more cleanup added by me. Tested and bootstrapped on s390x biarch (but did not look for performance regressions as the change should be a no-op). Ciao Dominik ^_^ ^_^ From 01264dac2f62c16a2c7e684a674aa9e9bc27b434 Mon Sep 17 00:00:00 2001 From: Dominik Vogt Date: Wed, 25 May 2016 10:23:57 +0100 Subject: [PATCH 1/2] Minor cleanup to allocate_dynamic_stack_space. --- gcc/explow.c | 96 +++++++++++++++++++----------------------------------------- 1 file changed, 30 insertions(+), 66 deletions(-) diff --git a/gcc/explow.c b/gcc/explow.c index e0ce201..09a0330 100644 --- a/gcc/explow.c +++ b/gcc/explow.c @@ -1174,8 +1174,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, HOST_WIDE_INT stack_usage_size = -1; rtx_code_label *final_label; rtx final_target, target; - unsigned extra_align = 0; - bool must_align; + unsigned extra; /* If we're asking for zero bytes, it doesn't matter what we point to since we can't dereference it. But return a reasonable @@ -1246,48 +1245,21 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; /* We will need to ensure that the address we return is aligned to - REQUIRED_ALIGN. If STACK_DYNAMIC_OFFSET is defined, we don't - always know its final value at this point in the compilation (it - might depend on the size of the outgoing parameter lists, for - example), so we must align the value to be returned in that case. - (Note that STACK_DYNAMIC_OFFSET will have a default nonzero value if - STACK_POINTER_OFFSET or ACCUMULATE_OUTGOING_ARGS are defined). - We must also do an alignment operation on the returned value if - the stack pointer alignment is less strict than REQUIRED_ALIGN. - - If we have to align, we must leave space in SIZE for the hole - that might result from the alignment operation. */ - - must_align = (crtl->preferred_stack_boundary < required_align); - if (must_align) - { - if (required_align > PREFERRED_STACK_BOUNDARY) - extra_align = PREFERRED_STACK_BOUNDARY; - else if (required_align > STACK_BOUNDARY) - extra_align = STACK_BOUNDARY; - else - extra_align = BITS_PER_UNIT; - } + REQUIRED_ALIGN. At this point in the compilation, we don't always + know the final value of the STACK_DYNAMIC_OFFSET used in function.c + (it might depend on the size of the outgoing parameter lists, for + example), so we must preventively align the value. We leave space + in SIZE for the hole that might result from the alignment operation. */ - /* ??? STACK_POINTER_OFFSET is always defined now. */ -#if defined (STACK_DYNAMIC_OFFSET) || defined (STACK_POINTER_OFFSET) - must_align = true; - extra_align = BITS_PER_UNIT; -#endif - - if (must_align) - { - unsigned extra = (required_align - extra_align) / BITS_PER_UNIT; + extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT; + size = plus_constant (Pmode, size, extra); + size = force_operand (size, NULL_RTX); - size = plus_constant (Pmode, size, extra); - size = force_operand (size, NULL_RTX); - - if (flag_stack_usage_info) - stack_usage_size += extra; + if (flag_stack_usage_info) + stack_usage_size += extra; - if (extra && size_align > extra_align) - size_align = extra_align; - } + if (extra && size_align > BITS_PER_UNIT) + size_align = BITS_PER_UNIT; /* Round the size to a multiple of the required stack alignment. Since the stack if presumed to be rounded before this allocation, @@ -1361,13 +1333,10 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, if (MALLOC_ABI_ALIGNMENT >= required_align) ask = size; else - { - ask = expand_binop (Pmode, add_optab, size, - gen_int_mode (required_align / BITS_PER_UNIT - 1, - Pmode), - NULL_RTX, 1, OPTAB_LIB_WIDEN); - must_align = true; - } + ask = expand_binop (Pmode, add_optab, size, + gen_int_mode (required_align / BITS_PER_UNIT - 1, + Pmode), + NULL_RTX, 1, OPTAB_LIB_WIDEN); func = init_one_libfunc ("__morestack_allocate_stack_space"); @@ -1478,24 +1447,19 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, target = final_target; } - if (must_align) - { - /* CEIL_DIV_EXPR needs to worry about the addition overflowing, - but we know it can't. So add ourselves and then do - TRUNC_DIV_EXPR. */ - target = expand_binop (Pmode, add_optab, target, - gen_int_mode (required_align / BITS_PER_UNIT - 1, - Pmode), - NULL_RTX, 1, OPTAB_LIB_WIDEN); - target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target, - gen_int_mode (required_align / BITS_PER_UNIT, - Pmode), - NULL_RTX, 1); - target = expand_mult (Pmode, target, - gen_int_mode (required_align / BITS_PER_UNIT, - Pmode), - NULL_RTX, 1); - } + /* CEIL_DIV_EXPR needs to worry about the addition overflowing, + but we know it can't. So add ourselves and then do + TRUNC_DIV_EXPR. */ + target = expand_binop (Pmode, add_optab, target, + gen_int_mode (required_align / BITS_PER_UNIT - 1, + Pmode), + NULL_RTX, 1, OPTAB_LIB_WIDEN); + target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target, + gen_int_mode (required_align / BITS_PER_UNIT, Pmode), + NULL_RTX, 1); + target = expand_mult (Pmode, target, + gen_int_mode (required_align / BITS_PER_UNIT, Pmode), + NULL_RTX, 1); /* Now that we've committed to a return value, mark its alignment. */ mark_reg_pointer (target, required_align); -- 2.3.0