From patchwork Mon Jun 7 19:00:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 1488898 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FzN4P2mBrz9s5R for ; Tue, 8 Jun 2021 05:00:52 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7BE7D3898509 for ; Mon, 7 Jun 2021 19:00:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx1.tachyum.com (mx1.tachyum.com [66.160.133.170]) by sourceware.org (Postfix) with ESMTPS id 470C2385DC22 for ; Mon, 7 Jun 2021 19:00:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 470C2385DC22 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tachyum.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tachyum.com Received: by mx1.tachyum.com (Postfix, from userid 1000) id 5EF5310057C9; Mon, 7 Jun 2021 12:00:23 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-Spam-Level: X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 Received: from THQ-EX1.tachyum.com (unknown [10.7.1.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.tachyum.com (Postfix) with ESMTPS id AB1221005283 for ; Mon, 7 Jun 2021 12:00:22 -0700 (PDT) Received: from [10.0.96.2] (10.0.96.2) by THQ-EX1.tachyum.com (10.7.1.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.14; Mon, 7 Jun 2021 12:00:22 -0700 From: Jeff Law To: GCC Patches Subject: Aligning stack offsets for spills Message-ID: <98179c8e-bcec-83ed-5b99-6f54791bd7cd@tachyum.com> Date: Mon, 7 Jun 2021 13:00:21 -0600 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 Content-Language: en-US X-Originating-IP: [10.0.96.2] X-ClientProxiedBy: THQ-EX3.tachyum.com (10.7.1.26) To THQ-EX1.tachyum.com (10.7.1.6) X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" So, as many of you know I left Red Hat a while ago and joined Tachyum.  We're building a new processor and we've come across an issue where I think we need upstream discussion. I can't divulge many of the details right now, but one of the quirks of our architecture is that reg+d addressing modes for our vector loads/stores require the displacement to be aligned.  This is an artifact of how these instructions are encoded. Obviously we can emit a load of the address into a register when the displacement isn't aligned.  From a correctness point that works perfectly.  Unfortunately, it's a significant performance hit on some standard benchmarks (spec) where we have a great number of spills of vector objects into the stack at unaligned offsets in the hot parts of the code. We've considered 3 possible approaches to solve this problem. 1. When the displacement isn't properly aligned, allocate more space in assign_stack_local so that we can make the offset aligned.  The downside is this potentially burns a lot of stack space, but in practice the cost was minimal (16 bytes in a 9k frame)  From a performance standpoint this works perfectly. 2. Abuse the register elimination code to create a second pointer into the stack.  Spills would start as + offset, then either get eliminated to sp+offset' when the offset is aligned or gpr+offset'' when the offset wasn't properly aligned. We started a bit down this path, but with #1 working so well, we didn't get this approach to proof-of-concept. 3. Hack up the post-reload optimizers to fix things up as best as we can.  This may still be advantageous, but again with #1 working so well, we didn't explore this in any significant way.  We may still look at this at some point in other contexts. Here's what we're playing with.  Obviously we'd need a target hook to drive this behavior.  I was thinking that we'd pass in any slot offset alignment requirements (from the target hook) to assign_stack_local and that would bubble down to this point in try_fit_stack_local: diff --git a/gcc/function.c b/gcc/function.c index d616f5f64f4..7f441b87a63 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -307,6 +307,14 @@ try_fit_stack_local (poly_int64 start, poly_int64 length,    frame_off = targetm.starting_frame_offset () % frame_alignment;    frame_phase = frame_off ? frame_alignment - frame_off : 0; +  if (known_eq (size, 64) && alignment < 64) +    alignment = 64; +    /* Round the frame offset to the specified alignment.  */    if (FRAME_GROWS_DOWNWARD) Thoughts? Jeff