From patchwork Tue Jan 19 15:13:05 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Bruel X-Patchwork-Id: 570008 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 83C47140BAE for ; Wed, 20 Jan 2016 02:13:27 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=DtYNltqn; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type:content-transfer-encoding; q=dns; s= default; b=RmUVEj+pFeVOfKpsw0ibFaVAD/AzN9fQ3vCeyjABW4Rv5FSZ7+cUj 4f417hziJJ+YxoeTtsNhmVBVJNwY4obTc1RILKyUW7Q4mh87eZsasTLYE9w8L1VM Z3PUz4gHH7KTaFWmjW5lVyicWATjH/pKACE1PJ76FyC2Q+g7Wlhccg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type:content-transfer-encoding; s=default; bh=PM8TOhjcxw+Zyniwom/mEd1ntNw=; b=DtYNltqnS8A9tV2J0zEpjK8WQOTp crsJlWAe/n5hL9CTRWibQIz98MLeJVn0AmxYipjusInaI+DFYbtNEVhG5H1hXlRt j6jjf2E8hW2LFc/Gq3m1AAYzMPrfXXQidbpom1cmf2BWTQhA7+idnpm23NGyVqGs J6g29d/7KnCI5uw= Received: (qmail 121414 invoked by alias); 19 Jan 2016 15:13:19 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 121399 invoked by uid 89); 19 Jan 2016 15:13:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.9 required=5.0 tests=BAYES_50, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, KHOP_DYNAMIC, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=lightly, H*RU:pps.filterd, Hx-spam-relays-external:pps.filterd, unsignedp X-HELO: mx07-00178001.pphosted.com Received: from mx07-00178001.pphosted.com (HELO mx07-00178001.pphosted.com) (62.209.51.94) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Tue, 19 Jan 2016 15:13:17 +0000 Received: from pps.filterd (m0046037.ppops.net [127.0.0.1]) by m0046037.ppops.net (8.15.0.59/8.15.0.59) with SMTP id u0JF7PET015068; Tue, 19 Jan 2016 16:13:10 +0100 Received: from beta.dmz-eu.st.com (beta.dmz-eu.st.com [164.129.1.35]) by m0046037.ppops.net with ESMTP id 20fcgujuyr-1 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 19 Jan 2016 16:13:10 +0100 Received: from zeta.dmz-eu.st.com (zeta.dmz-eu.st.com [164.129.230.9]) by beta.dmz-eu.st.com (STMicroelectronics) with ESMTP id 4FEB634; Tue, 19 Jan 2016 15:12:21 +0000 (GMT) Received: from Webmail-eu.st.com (safex1hubcas5.st.com [10.75.90.71]) by zeta.dmz-eu.st.com (STMicroelectronics) with ESMTP id 4DC4624CC; Tue, 19 Jan 2016 15:13:07 +0000 (GMT) Received: from [164.129.122.197] (164.129.122.197) by webmail-eu.st.com (10.75.90.13) with Microsoft SMTP Server (TLS) id 8.3.406.0; Tue, 19 Jan 2016 16:13:06 +0100 Subject: Re: [PATCH][ARM, AARCH64] target/PR68674: relayout vector_types in expand_expr To: Richard Biener References: <568FB9AA.5020706@st.com> <569E4FCA.3070704@st.com> CC: "kyrylo.tkachov@foss.arm.com" , Richard Earnshaw , "ramana.radhakrishnan@foss.arm.com" , "bschmidt@redhat.com" , GCC Patches From: Christian Bruel X-No-Archive: yes Message-ID: <569E5281.2050009@st.com> Date: Tue, 19 Jan 2016 16:13:05 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <569E4FCA.3070704@st.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-01-19_07:, , signatures=0 X-IsSubscribed: yes On 01/19/2016 04:01 PM, Christian Bruel wrote: > Hi Richard, > > thanks for your input, > > On 01/18/2016 12:36 PM, Richard Biener wrote: >> On Fri, Jan 8, 2016 at 2:29 PM, Christian Bruel wrote: >>> When compiling code with attribute targets on arm or aarch64, >>> vector_type_mode returns different results (eg Vmode or BLKmode) depending >>> on the current simd flags that are not set between functions. >>> >>> for example the following code: >>> >>> #include >>> >>> extern int8x8_t a; >>> extern int8x8_t b; >>> >>> int16x8_t >>> __attribute__ ((target("fpu=neon"))) >>> foo(void) >>> { >>> return vaddl_s8 (a, b); >>> } >>> >>> Triggers gcc_asserts in copy_to_mode_regs while expanding NEON builtins , >>> because the mismatch and DECL_MODE current's TYPE_MODE used in >>> expand_builtin for global variables. >>> >>> but the best explanation is in the vector_type_mode: >>> /* Vector types need to re-check the target flags each time we report >>> the machine mode. We need to do this because attribute target can >>> change the result of vector_mode_supported_p and have_regs_of_mode >>> on a per-function basis. Thus the TYPE_MODE of a VECTOR_TYPE can >>> change on a per-function basis. */ >>> >>> I first tried to hack the 2 machine descriptions to insert convert_to_mode >>> or relayout_decls here and there, but I found this very fragile. Instead a >>> more central relayout the of type while expanding gave good results, as >>> proposed here. >>> >>> bootstraped and tested with no regression for arm, aarch64 and i586. >>> >>> Does this look to be the right approach ? >>> >>> nb: for testing this patch is complementary with >>> >>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00332.html >>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00248.html >>> >>> thanks for your comments. >> A x86 specific testcase that ICEs as well: >> >> typedef int v8si __attribute__((vector_size(32))); >> v8si a; >> v8si __attribute__((target("avx"))) foo() >> { >> return a; >> } >> >> in your patch not using the shared DECL_RTL of the global var >> "fixes" this so I think a conceptually better fix would be to >> "adjust" DECL_RTL from globals via a adjust_address (or so). >> >> Also given that we do >> >> /* ... fall through ... */ >> >> case FUNCTION_DECL: >> case RESULT_DECL: >> decl_rtl = DECL_RTL (exp); >> expand_decl_rtl: >> gcc_assert (decl_rtl); >> decl_rtl = copy_rtx (decl_rtl); >> >> thus always "unshare" DECL_RTL anyway it might be not so >> bad to simply do >> >> decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0); >> >> instead of that to avoid one copy. >> >> Index: expr.c >> =================================================================== >> --- expr.c (revision 232496) >> +++ expr.c (working copy) >> @@ -9597,7 +9597,10 @@ expand_expr_real_1 (tree exp, rtx target >> decl_rtl = DECL_RTL (exp); >> expand_decl_rtl: >> gcc_assert (decl_rtl); >> - decl_rtl = copy_rtx (decl_rtl); >> + if (MEM_P (decl_rtl)) >> + decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0); >> + else >> + decl_rtl = copy_rtx (decl_rtl); >> /* Record writes to register variables. */ >> if (modifier == EXPAND_WRITE >> && REG_P (decl_rtl) >> >> untested apart from on the x86_64 testcase (which it fixes). One could guard >> this further to only apply on vector typed decls with mismatched mode of course. >> >> I think that re-layouting globals is not very good design. >> >> Richard. > A few other ICEs with this implementation, for instance if the context > is not in a function, such as > > typedef __simd64_int8_t int8x8_t; > > extern int8x8_t b; > int8x8_t *a = &b; > > So, to avoid a var re-layout and a copy_rtx (implied by adjust_address > btw). What about just calling 'change_address' ? like: (very lightly tested) > > Index: expr.c > =================================================================== > --- expr.c (revision 232564) > +++ expr.c (working copy) > @@ -9392,7 +9392,8 @@ > enum expand_modifier modifier, rtx *alt_rtl, > bool inner_reference_p) > { > - rtx op0, op1, temp, decl_rtl; > + rtx op0, op1, temp; > + rtx decl_rtl = NULL_RTX; > tree type; > int unsignedp; > machine_mode mode, dmode; > @@ -9590,11 +9591,22 @@ > && (TREE_STATIC (exp) || DECL_EXTERNAL (exp))) > layout_decl (exp, 0); > > + decl_rtl = DECL_RTL (exp); > + > + if (MEM_P (decl_rtl) > + && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode)) > + { > + if (current_function_decl > + && (! reload_completed && !reload_in_progress)) > + decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0); > + } > + > /* ... fall through ... */ > > case FUNCTION_DECL: > case RESULT_DECL: > - decl_rtl = DECL_RTL (exp); > + if (! decl_rtl) > + decl_rtl = DECL_RTL (exp); > expand_decl_rtl: > gcc_assert (decl_rtl); > decl_rtl = copy_rtx (decl_rtl); > > I'm not sure that moving the code in the 'expand_decl_rtl' label is > best, as we'd need to test for exp and the case should only happen for > global vars (not functions or results) Here is the alternative implementation, shorter after all. testing in progress. > > thanks, > Index: expr.c =================================================================== --- expr.c (revision 232570) +++ expr.c (working copy) @@ -9597,6 +9597,15 @@ decl_rtl = DECL_RTL (exp); expand_decl_rtl: gcc_assert (decl_rtl); + + if (exp && code == VAR_DECL && MEM_P (decl_rtl) + && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode)) + { + if (current_function_decl + && (! reload_completed && !reload_in_progress)) + decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0); + } + decl_rtl = copy_rtx (decl_rtl); /* Record writes to register variables. */ if (modifier == EXPAND_WRITE