From patchwork Wed Sep 11 19:05:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1161234 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-508892-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="jggN/od0"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46TBDr27bKz9s7T for ; Thu, 12 Sep 2019 05:05:32 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=default; b=akgDSXpPI6qhAIGQ7VNswOPqo58nh lrG10VpEqlNkEdxmHES30GNgfR7AXIPspKhj342UrQpl7s/FBJ7lQNn3BlS87g0x hGLCxCaOSRtHq9AoCu/CS69jTgFUYg0Kp1DOhkbplImLdz+M422lPvYeLWaPDA9o X2IVNwZCW6cUag= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:references:date:in-reply-to:message-id:mime-version :content-type; s=default; bh=w/IjhcCd9JsmjoZSGfUGBzOJVAM=; b=jgg N/od0tDpUgn6Xcj08VKwWsd9s9XsrSTQtRAhVUyaYIbevBeXloyfI3LQoEqLDQWx m0MlqYzinAfBiKEXQ1hfKlr4bu1I1ZYk8xwZpX7CwXEiWhUqRTXxT+9RI9Y3xSH1 6JzzRgDHFO2lLONkvGhkL1YytVJ4XkjuRsmiYD1E= Received: (qmail 80094 invoked by alias); 11 Sep 2019 19:05:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 80086 invoked by uid 89); 11 Sep 2019 19:05:25 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-9.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS autolearn=ham version=3.3.1 spammy=i386expandh, UD:i386-expand.h, i386-expand.h X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 11 Sep 2019 19:05:23 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2CEE228 for ; Wed, 11 Sep 2019 12:05:22 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C92503F59C for ; Wed, 11 Sep 2019 12:05:21 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [04/32] [x86] Robustify vzeroupper handling across calls References: Date: Wed, 11 Sep 2019 20:05:20 +0100 In-Reply-To: (Richard Sandiford's message of "Wed, 11 Sep 2019 20:02:26 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-IsSubscribed: yes One of the effects of the function_abi series is to make -fipa-ra work for partially call-clobbered registers. E.g. if a call preserves only the low 32 bits of a register R, we handled the partial clobber separately from -fipa-ra, and so treated the upper bits of R as clobbered even if we knew that the target function doesn't touch R. "Fixing" this caused problems for the vzeroupper handling on x86. The pass that inserts the vzerouppers assumes that no 256-bit or 512-bit values are live across a call unless the call takes a 256-bit or 512-bit argument: /* Needed mode is set to AVX_U128_CLEAN if there are no 256bit or 512bit modes used in function arguments. */ This implicitly relies on: /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED. The only ABI that saves SSE registers across calls is Win64 (thus no need to check the current ABI here), and with AVX enabled Win64 only guarantees that the low 16 bytes are saved. */ static bool ix86_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED, unsigned int regno, machine_mode mode) { return SSE_REGNO_P (regno) && GET_MODE_SIZE (mode) > 16; } The comment suggests that this code is only needed for Win64 and that not testing for Win64 is just a simplification. But in practice it was needed for correctness on GNU/Linux and other targets too, since without it the RA would be able to keep 256-bit and 512-bit values in SSE registers across calls that are known not to clobber them. This patch conservatively treats calls as AVX_U128_ANY if the RA can see that some SSE registers are not touched by a call. There are then no regressions if the ix86_hard_regno_call_part_clobbered check is disabled for GNU/Linux (not something we should do, was just for testing). If in fact we want -fipa-ra to pretend that all functions clobber SSE registers above 128 bits, it'd certainly be possible to arrange that. But IMO that would be an optimisation decision, whereas what the patch is fixing is a correctness decision. So I think we should have this check even so. 2019-09-11 Richard Sandiford gcc/ * config/i386/i386.c: Include function-abi.h. (ix86_avx_u128_mode_needed): Treat function calls as AVX_U128_ANY if they preserve some 256-bit or 512-bit SSE registers. Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c 2019-09-10 19:56:55.601105594 +0100 +++ gcc/config/i386/i386.c 2019-09-11 19:47:28.506233865 +0100 @@ -95,6 +95,7 @@ #define IN_TARGET_CODE 1 #include "i386-builtins.h" #include "i386-expand.h" #include "i386-features.h" +#include "function-abi.h" /* This file should be included last. */ #include "target-def.h" @@ -13511,6 +13512,15 @@ ix86_avx_u128_mode_needed (rtx_insn *ins } } + /* If the function is known to preserve some SSE registers, + RA and previous passes can legitimately rely on that for + modes wider than 256 bits. It's only safe to issue a + vzeroupper if all SSE registers are clobbered. */ + const function_abi &abi = call_insn_abi (insn); + if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS], + abi.mode_clobbers (V4DImode))) + return AVX_U128_ANY; + return AVX_U128_CLEAN; }