From patchwork Mon Jan 7 17:40:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 1021467 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-493558-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Q4uhjNka"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43YN2d08qbz9sCs for ; Tue, 8 Jan 2019 04:40:24 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; q=dns; s=default; b=iK/Tr93MxsDUbX+rM6lEnUhCoqcmu MuShoyO4tp8pKxC1fV2Ly+YT9YMFVonO2LOetFPtnf8PTFH0JHA+urwNV6pPU34V qa9T55FsFNeuBAJWFan132QXkhFImfIsij31zxlOuiVWqqcOvk0S+Q09C5KcpufV U1zkKxr0FtlS9Q= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; s=default; bh=SrJogv4HMm1boIsv0uBKTjUbL8g=; b=Q4u hjNka+cVn674JxYI2WpVVtUZeMD1b4zViykJM5331VEyAsqeQpG5zsu5cpeM+yNG uojQ8D7ktuIyDP1amtrcVskr+VvneYuUrvCcpTZI4qA2JigqRxaVLjsQPE7gmwrz g5UGfQYe/iZ5o1teve7Tdh+3sytHNwsL1GUMJXgY= Received: (qmail 58650 invoked by alias); 7 Jan 2019 17:40:17 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 58635 invoked by uid 89); 7 Jan 2019 17:40:17 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, NO_DNS_FOR_FROM, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mga12.intel.com Received: from mga12.intel.com (HELO mga12.intel.com) (192.55.52.136) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 07 Jan 2019 17:40:15 +0000 Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Jan 2019 09:40:14 -0800 Received: from gnu-cfl-1.sc.intel.com ([172.25.70.237]) by fmsmga004.fm.intel.com with ESMTP; 07 Jan 2019 09:40:14 -0800 Received: by gnu-cfl-1.sc.intel.com (Postfix, from userid 1000) id 2DB701800B2; Mon, 7 Jan 2019 09:40:14 -0800 (PST) Date: Mon, 7 Jan 2019 09:40:14 -0800 From: "H.J. Lu" To: gcc-patches@gcc.gnu.org Cc: Uros Bizjak Subject: [PATCH] x86: Don't generate vzeroupper if caller is AVX_U128_DIRTY Message-ID: <20190107174014.GA17007@intel.com> Reply-To: "H.J. Lu" MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) There is no need to generate vzeroupper if caller uses upper bits of AVX/AVX512 registers, We track caller's avx_u128_state and avoid vzeroupper when caller's avx_u128_state is AVX_U128_DIRTY. Tested on i686 and x86-64 with and without --with-arch=native. OK for trunk? Thanks. H.J. --- gcc/ PR target/88717 * config/i386/i386.c (ix86_avx_u128_mode_entry): Set caller_avx_u128_dirty to true when caller is AVX_U128_DIRTY. (ix86_avx_u128_mode_exit): Set exit mode to AVX_U128_DIRTY if caller is AVX_U128_DIRTY. * config/i386/i386.h (machine_function): Add caller_avx_u128_dirty. gcc/testsuite/ PR target/88717 * gcc.target/i386/pr88717.c: New test. --- gcc/config/i386/i386.c | 10 +++++++++- gcc/config/i386/i386.h | 3 +++ gcc/testsuite/gcc.target/i386/pr88717.c | 24 ++++++++++++++++++++++++ 3 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr88717.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index d01278d866f..9b49a2c1d9c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -19100,7 +19100,11 @@ ix86_avx_u128_mode_entry (void) rtx incoming = DECL_INCOMING_RTL (arg); if (incoming && ix86_check_avx_upper_register (incoming)) - return AVX_U128_DIRTY; + { + /* Caller is AVX_U128_DIRTY. */ + cfun->machine->caller_avx_u128_dirty = true; + return AVX_U128_DIRTY; + } } return AVX_U128_CLEAN; @@ -19130,6 +19134,10 @@ ix86_mode_entry (int entity) static int ix86_avx_u128_mode_exit (void) { + /* Exit mode is set to AVX_U128_DIRTY if caller is AVX_U128_DIRTY. */ + if (cfun->machine->caller_avx_u128_dirty) + return AVX_U128_DIRTY; + rtx reg = crtl->return_rtx; /* Exit mode is set to AVX_U128_DIRTY if there are 256bit diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 83b025e0cf5..c053b657a55 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -2747,6 +2747,9 @@ struct GTY(()) machine_function { /* If true, ENDBR is queued at function entrance. */ BOOL_BITFIELD endbr_queued_at_entrance : 1; + /* If true, caller is AVX_U128_DIRTY. */ + BOOL_BITFIELD caller_avx_u128_dirty : 1; + /* The largest alignment, in bytes, of stack slot actually used. */ unsigned int max_used_stack_alignment; diff --git a/gcc/testsuite/gcc.target/i386/pr88717.c b/gcc/testsuite/gcc.target/i386/pr88717.c new file mode 100644 index 00000000000..01680998f1b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88717.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512f -mvzeroupper" } */ + +#include + +__m128 +foo1 (__m256 x) +{ + return _mm256_castps256_ps128 (x); +} + +void +foo2 (float *p, __m256 x) +{ + *p = ((__v8sf)x)[0]; +} + +void +foo3 (float *p, __m512 x) +{ + *p = ((__v16sf)x)[0]; +} + +/* { dg-final { scan-assembler-not "vzeroupper" } } */