From patchwork Fri Nov 16 07:50:13 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 199506 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 758602C0094 for ; Fri, 16 Nov 2012 18:50:29 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1353657030; h=Comment: DomainKey-Signature:Received:Received:Received:Received: MIME-Version:Received:Received:In-Reply-To:References:Date: Message-ID:Subject:From:To:Cc:Content-Type:Mailing-List: Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:Sender:Delivered-To; bh=ik1nMqR8B7fsbviWSWGOBhT9cfs=; b=n1mmAipv8kmrJcDUFUxh0eNz4d+bqtKJ1rMeCIbPU5zPS/t6pCpZ+sCwcYytoL Nx0a5nIbxlVoLVIbmN4xLeaKkeEH4RMuOBlnG16wE/wY9iwUgkKeGVSWfVMzHs+0 aKqWZBsUOavpQ+/SeQ0myvlo0ciUSq0iaSou+iRPOqSxk= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:MIME-Version:Received:Received:In-Reply-To:References:Date:Message-ID:Subject:From:To:Cc:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=ytsE+mTqviYVfSvdaHz0q1ymnOrj7tU3eDYM3pG3vbLWVkmwPj64XFQio/2AYV ilHWiMLvK2M+ebHEyRe8TdEuNO56rXxQK7bzO/EhZhmXCYNR62TqygfOicn5XcYj h1VmirkMKQO2NkeL+fyfJXCM0DuikGtG7FNbTtjzJcqiQ=; Received: (qmail 18784 invoked by alias); 16 Nov 2012 07:50:23 -0000 Received: (qmail 18773 invoked by uid 22791); 16 Nov 2012 07:50:22 -0000 X-SWARE-Spam-Status: No, hits=-4.9 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, KHOP_RCVD_TRUST, KHOP_THREADED, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE, TW_AV, TW_VZ, TW_ZJ X-Spam-Check-By: sourceware.org Received: from mail-pa0-f47.google.com (HELO mail-pa0-f47.google.com) (209.85.220.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 16 Nov 2012 07:50:14 +0000 Received: by mail-pa0-f47.google.com with SMTP id fa11so1613736pad.20 for ; Thu, 15 Nov 2012 23:50:14 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.203.198 with SMTP id ks6mr6665194pbc.35.1353052214154; Thu, 15 Nov 2012 23:50:14 -0800 (PST) Received: by 10.66.246.232 with HTTP; Thu, 15 Nov 2012 23:50:13 -0800 (PST) In-Reply-To: References: <20121109123617.GA1886@tucnak.redhat.com> Date: Fri, 16 Nov 2012 08:50:13 +0100 Message-ID: Subject: Re: [PATCH] Vzeroupper placement/47440 From: Uros Bizjak To: gcc-patches@gcc.gnu.org Cc: Vladimir Yakovlev , "H.J. Lu" , Igor Zamyatin , Jakub Jelinek Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Fri, Nov 9, 2012 at 2:28 PM, Uros Bizjak wrote: > Finally, having a post-reload mode-switching pass, we can double-check > that there are no live SSE registers at vzeroupper insertion point. As > vzeroupper is only an optimization, we want to play safe and cancel > vzeroupper insertion in this case > > There is no degradation for x86_64 gABI targets, since all SSE > registers are call-clobbered. Vzeroupper is conditionally inserted > just before call insn, where all registers are saved to stack and > already dead. The vzeroupper at function exit is not problematic. Patch was committed to mainline SVN with the following ChangeLog: 2012-11-16 Uros Bizjak * config/i386/i386-protos.h (ix86_emit_mode_set): Add third argument. * config/i386/i386.h (EMIT_MODE_SET): Update. * config/i386/i386.c (ix86_avx_emit_vzeroupper): New function. (ix86_emit_mode_set) : Call ix86_avx_emit_vzeroupper. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, configured with --with-arch=corei7-avx --with-tune=corei7-avx. Uros. Index: i386-protos.h =================================================================== --- i386-protos.h (revision 193549) +++ i386-protos.h (working copy) @@ -172,8 +172,11 @@ extern int ix86_mode_after (int, int, rtx); extern int ix86_mode_entry (int); extern int ix86_mode_exit (int); -extern void ix86_emit_mode_set (int, int); +#ifdef HARD_CONST +extern void ix86_emit_mode_set (int, int, HARD_REG_SET); +#endif + extern void x86_order_regs_for_local_alloc (void); extern void x86_function_profiler (FILE *, int); extern void x86_emit_floatuns (rtx [2]); Index: i386.c =================================================================== --- i386.c (revision 193549) +++ i386.c (working copy) @@ -15477,16 +15477,38 @@ emit_move_insn (new_mode, reg); } +/* Emit vzeroupper. */ + +void +ix86_avx_emit_vzeroupper (HARD_REG_SET regs_live) +{ + int i; + + /* Cancel automatic vzeroupper insertion if there are + live call-saved SSE registers at the insertion point. */ + + for (i = FIRST_SSE_REG; i <= LAST_SSE_REG; i++) + if (TEST_HARD_REG_BIT (regs_live, i) && !call_used_regs[i]) + return; + + if (TARGET_64BIT) + for (i = FIRST_REX_SSE_REG; i <= LAST_REX_SSE_REG; i++) + if (TEST_HARD_REG_BIT (regs_live, i) && !call_used_regs[i]) + return; + + emit_insn (gen_avx_vzeroupper ()); +} + /* Generate one or more insns to set ENTITY to MODE. */ void -ix86_emit_mode_set (int entity, int mode) +ix86_emit_mode_set (int entity, int mode, HARD_REG_SET regs_live) { switch (entity) { case AVX_U128: if (mode == AVX_U128_CLEAN) - emit_insn (gen_avx_vzeroupper ()); + ix86_avx_emit_vzeroupper (regs_live); break; case I387_TRUNC: case I387_FLOOR: Index: i386.h =================================================================== --- i386.h (revision 193549) +++ i386.h (working copy) @@ -2226,7 +2226,7 @@ are to be inserted. */ #define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE) \ - ix86_emit_mode_set ((ENTITY), (MODE)) + ix86_emit_mode_set ((ENTITY), (MODE), (HARD_REGS_LIVE)) /* Avoid renaming of stack registers, as doing so in combination with scheduling just increases amount of live registers at time and in