From patchwork Mon Nov 5 18:34:07 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 197267 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id D2B752C00A3 for ; Tue, 6 Nov 2012 05:34:20 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1352745261; h=Comment: DomainKey-Signature:Received:Received:Received:Received: MIME-Version:Received:Received:In-Reply-To:References:Date: Message-ID:Subject:From:To:Cc:Content-Type:Mailing-List: Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:Sender:Delivered-To; bh=faSIGsipuCrnab615OCJ10cN5jQ=; b=AGXcDANaNragcVY0w2YmdX2MSgkv5gbL36RHjBhh+CQRvd/0/2ajqcWqJsLOC9 J4Z7RXte/jqOV0wUSMJnT3ykmHzCj2tw9PwxhmcxH8dvrHBk2Pd1puvJz3YIGSmA WnEVSsFa2ya3bm4mCA3pfqBKJqN6hY3U0tO5Jw+iLSIy4= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:MIME-Version:Received:Received:In-Reply-To:References:Date:Message-ID:Subject:From:To:Cc:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=D1iSVToPX48xO04fncIvkGGbUvfmjFX2+lwkejG8aJEH+KspFKh+yqjMbQ6wfa cHOT7l60XATK20h4Ec7SqI7XCGCrys7BITghdWIwLaUZ7sc2fvIcnigdNZxpVtx4 MYocPWf7S6brKxQypRzAEx4NyQ5duauDPTZ6QZoQ5y2Zk=; Received: (qmail 21222 invoked by alias); 5 Nov 2012 18:34:15 -0000 Received: (qmail 20986 invoked by uid 22791); 5 Nov 2012 18:34:13 -0000 X-SWARE-Spam-Status: No, hits=-4.9 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, KHOP_RCVD_TRUST, KHOP_THREADED, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE, TW_AV, TW_VZ, TW_ZJ X-Spam-Check-By: sourceware.org Received: from mail-pa0-f47.google.com (HELO mail-pa0-f47.google.com) (209.85.220.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 05 Nov 2012 18:34:08 +0000 Received: by mail-pa0-f47.google.com with SMTP id fa11so3911620pad.20 for ; Mon, 05 Nov 2012 10:34:07 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.226.67 with SMTP id rq3mr32622156pbc.121.1352140447746; Mon, 05 Nov 2012 10:34:07 -0800 (PST) Received: by 10.66.246.232 with HTTP; Mon, 5 Nov 2012 10:34:07 -0800 (PST) In-Reply-To: <1411479.F0z64l7D3N@polaris> References: <1411479.F0z64l7D3N@polaris> Date: Mon, 5 Nov 2012 19:34:07 +0100 Message-ID: Subject: Re: [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return From: Uros Bizjak To: Eric Botcazou Cc: gcc-patches@gcc.gnu.org, vbyakovl23@gmail.com, Kaz Kojima Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Mon, Nov 5, 2012 at 7:05 PM, Eric Botcazou wrote: >> This sequence breaks assumption in mode-switching.c, that final >> function return register load operates in MODE_EXIT mode and triggere >> following code: >> >> for (j = n_entities - 1; j >= 0; j--) >> { >> int e = entity_map[j]; >> int mode = MODE_NEEDED (e, return_copy); >> >> if (mode != num_modes[e] && mode != MODE_EXIT (e)) >> break; >> } >> >> As discussed above, modes of loads, generated from __builtin_apply >> have no connection to function return mode. mode-switching.c does >> detect __builtin_apply situation and raises maybe_builtin_apply flag, >> but doesn't use it to short-circuit wrong check. In proposed patch, we >> detect this situation and raise force_late_switch in the same way, as >> SH4 does for its "late" fpscr emission. > > If I understand correctly, we need to insert the vzeroupper because the > function returns double in SSE registers but we generate an OImode load > instead of a DFmode load because of the __builtin_return. So we're in the > forced_late_switch case but we fail to recognize the tweaked return value load > since the number of registers doesn't match. > > If so, I'd rather add another special case, like for the SH4, instead of a > generic bypass for maybe_builtin_apply, something along the lines of: > > /* For the x86 with AVX, we might be using a larger load for a value > returned in SSE registers and we want to put the final mode switch > after this return value copy. */ > if (copy_start == ret_start > && nregs == hard_regno_nregs[ret_start][GET_MODE (ret_reg)] > && copy_num >= nregs > && OBJECT_P (SET_SRC (return_copy_pat))) > forced_late_switch = 1; Yes, this approach also works. I assume it is OK to commit attached patch? 2012-11-05 Eric Botcazou Uros Bizjak * mode-switching.c (create_pre_exit): Force late switch for __builtin_return case, when value, returned in SSE register, was loaded using OImode load. Tested on x86_64-pc-linux-gnu, also with to-be-committed avx-vzeroupper-27.c Thanks, Uros. Index: mode-switching.c =================================================================== --- mode-switching.c (revision 193174) +++ mode-switching.c (working copy) @@ -1,6 +1,6 @@ /* CPU mode switching Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008, - 2009, 2010 Free Software Foundation, Inc. + 2009, 2010, 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -342,6 +342,17 @@ create_pre_exit (int n_entities, int *entity_map, } if (j >= 0) { + /* For the x86 with AVX, we might be using a larger + load for a value returned in SSE registers and we + want to put the final mode switch after this + return value copy. */ + if (copy_start == ret_start + && nregs + == hard_regno_nregs[ret_start][GET_MODE (ret_reg)] + && copy_num >= nregs + && OBJECT_P (SET_SRC (return_copy_pat))) + forced_late_switch = 1; + /* For the SH4, floating point loads depend on fpscr, thus we might need to put the final mode switch after the return value copy. That is still OK,