From patchwork Thu Nov 29 17:16:20 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Lyon X-Patchwork-Id: 202792 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 9F34D2C0085 for ; Fri, 30 Nov 2012 04:16:45 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1354814206; h=Comment: DomainKey-Signature:Received:Received:Received:Received: MIME-Version:Received:Received:Date:Message-ID:Subject:From:To: Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:Sender:Delivered-To; bh=3lB8Eew eZOSQW2KMvacZmQ+9wA0=; b=F/MvaA9CMZODPL6lOtnwN81fIupWEAlzFybcR0r 6bd0dFkJxiinJUtSrWec7AwD/SCBC+H8J8r5HXKh9+pny2pfbC5IA2MLihZqYUBe 0Mx3C7KlvusO1VJKzUhfQreslRKX71kl84KOgzrn+tkVgPUWcNqeAomSuPLMNUnK 7fJk= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:X-Google-DKIM-Signature:MIME-Version:Received:Received:Date:Message-ID:Subject:From:To:Content-Type:X-Gm-Message-State:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=Y32NNb2kdA5M7lqEFTkET6+QhWqTAPpRphCzr8PuN5f/o0MRJMhrlV4XZ+MF7+ tvnW9qK6Ltp615h2Zkd8ZnR+bNjv4y58FmRFhX2MLOShjnIKxB5whTU1dkgwqij3 9hwhyCuTkMQXQyOxPyTnpVGjd4pfMLDDr7vTGs5YyY17o=; Received: (qmail 9853 invoked by alias); 29 Nov 2012 17:16:30 -0000 Received: (qmail 9820 invoked by uid 22791); 29 Nov 2012 17:16:26 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE, TW_VB X-Spam-Check-By: sourceware.org Received: from mail-qa0-f54.google.com (HELO mail-qa0-f54.google.com) (209.85.216.54) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 29 Nov 2012 17:16:21 +0000 Received: by mail-qa0-f54.google.com with SMTP id j15so518887qaq.20 for ; Thu, 29 Nov 2012 09:16:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type :x-gm-message-state; bh=XkGsfWCIJUZTqeBeEGo8drcx7xv7g3QbXj2xOX/CfRI=; b=ZgSe7sQdGKAaWG/ArdXB8d2ednjxR4xn99RLrWNcL4PdKdcCSIDyaiyRjFLi+gNGpX mjHkTeY5UGSayZ5LpKQII49crGHACdfJxJLlUi/ov+DWT1UPxiG+qmLoQb+YcJLvppOz fUl54hbod8+Uch0+8iSAWaNQ1NMoq6qc0kgbsfv/0CCAzR/qlmUzwmtdHhBBtguUCn4x 6d3WiT/LQ7yJMrr+CVTK54pE6QvPMv9E1DfLIpxPiSEiyke11P3nQRbC/yq1R+Ux4Sm0 WlDo4Eg9sWXgeq4k55WGDP5riYr0c0yIxswEqgMkgQTJbScKIBkiYYmIPKCnCdkKg1Ra 3BOA== MIME-Version: 1.0 Received: by 10.49.4.193 with SMTP id m1mr29843530qem.38.1354209380423; Thu, 29 Nov 2012 09:16:20 -0800 (PST) Received: by 10.49.84.106 with HTTP; Thu, 29 Nov 2012 09:16:20 -0800 (PST) Date: Thu, 29 Nov 2012 18:16:20 +0100 Message-ID: Subject: [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem From: Christophe Lyon To: "gcc-patches@gcc.gnu.org" X-Gm-Message-State: ALoCoQk4sa8FjOD2IX4ahVI2YuWelXw46xmbSdoY2eXPh2W1Hy4gP8Ep/WTd+Cu2IiR9gpB/UNpQ X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, I have been working on a patch to avoid using Neon for 64 bits bitops when it's too expensive to move data between core and Neon registers. Benchmarking and validation look OK on the 4.7 branch (compiler configured for thumb and hard FP) - validation on cortex-a9 board OK - bencharking shows 10.5% improvement on spec2k's crafty bench. On other benches we are between -0.5% and +0.5%. On trunk I have noticed a regression in gfortran when using modulo scheduling: sms-1.f90 now fails, but I suspect it's not because of this patch since forcing compilation for armv5t makes the same test fail with and without my patch. Specifically, I have observed that the loop: 862e: 3b01 subs r3, #1 8630: ef70 08a1 vadd.i64 d16, d16, d17 8634: ec51 0b30 vmov r0, r1, d16 8638: e9e2 0102 strd r0, r1, [r2, #8]! 863c: d1f7 bne.n 862e in transformed into: 862e: 3901 subs r1, #1 8630: 1912 adds r2, r2, r4 8632: eb43 0305 adc.w r3, r3, r5 8636: e9e0 2302 strd r2, r3, [r0, #8]! 863a: d1f8 bne.n 862e with my patch. This is wrong because adds/adc clobber the flags used to control the loop. The patch is: 2012-11-28 Christophe Lyon gcc/ * config/arm/arm-protos.h (tune_params): Add prefer_neon_for_64bits field. * config/arm/arm.c (prefer_neon_for_64bits): New variable. (arm_slowmul_tune): Default prefer_neon_for_64bits to false. (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto. (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto. (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto. (arm_cortex_a9_tune, arm_fa726te_tune): Ditto. (arm_option_override): Handle -mneon-for-64bits new option. * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro. (prefer_neon_for_64bits): Declare new variable. * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to avoid_neon_for_64bits and neon_for_64bits. (arch_enabled): Handle new arch types. (one_cmpldi2): Use new arch names. * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon) (anddi3_neon, xordi3_neon, ashldi3_neon, di3_neon): Use neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead of onlya8. Is it OK for trunk? diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index d942c5b..c92f055 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -247,6 +247,8 @@ struct tune_params performance. The first element covers Thumb state and the second one is for ARM state. */ bool logical_op_non_short_circuit[2]; + /* Prefer Neon for 64-bit bitops. */ + bool prefer_neon_for_64bits; }; extern const struct tune_params *current_tune; diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 286a6c5..9efd215 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -816,6 +816,10 @@ int arm_arch_thumb2; int arm_arch_arm_hwdiv; int arm_arch_thumb_hwdiv; +/* Nonzero if we should use Neon to handle 64-bits operations rather + than core registers. */ +int prefer_neon_for_64bits = 0; + /* In case of a PRE_INC, POST_INC, PRE_DEC, POST_DEC memory reference, we must report the mode of the memory reference from TARGET_PRINT_OPERAND to TARGET_PRINT_OPERAND_ADDRESS. */ @@ -895,6 +899,7 @@ const struct tune_params arm_slowmul_tune = arm_default_branch_cost, false, /* Prefer LDRD/STRD. */ {true, true}, /* Prefer non short circuit. */ + false /* Prefer Neon for 64-bits bitops. */ }; const struct tune_params arm_fastmul_tune = @@ -908,6 +913,7 @@ const struct tune_params arm_fastmul_tune = arm_default_branch_cost, false, /* Prefer LDRD/STRD. */ {true, true}, /* Prefer non short circuit. */ + false /* Prefer Neon for 64-bits bitops. */ }; /* StrongARM has early execution of branches, so a sequence that is worth @@ -924,6 +930,7 @@ const struct tune_params arm_strongarm_tune = arm_default_branch_cost, false, /* Prefer LDRD/STRD. */ {true, true}, /* Prefer non short circuit. */ + false /* Prefer Neon for 64-bits bitops. */ }; const struct tune_params arm_xscale_tune = @@ -937,6 +944,7 @@ const struct tune_params arm_xscale_tune = arm_default_branch_cost, false, /* Prefer LDRD/STRD. */ {true, true}, /* Prefer non short circuit. */ + false /* Prefer Neon for 64-bits bitops. */ }; const struct tune_params arm_9e_tune = @@ -950,6 +958,7 @@ const struct tune_params arm_9e_tune = arm_default_branch_cost, false, /* Prefer LDRD/STRD. */ {true, true}, /* Prefer non short circuit. */ + false /* Prefer Neon for 64-bits bitops. */ }; const struct tune_params arm_v6t2_tune = @@ -963,6 +972,7 @@ const struct tune_params arm_v6t2_tune = arm_default_branch_cost, false, /* Prefer LDRD/STRD. */ {true, true}, /* Prefer non short circuit. */ + false /* Prefer Neon for 64-bits bitops. */ }; /* Generic Cortex tuning. Use more specific tunings if appropriate. */ @@ -977,6 +987,7 @@ const struct tune_params arm_cortex_tune = arm_default_branch_cost, false, /* Prefer LDRD/STRD. */ {true, true}, /* Prefer non short circuit. */ + false /* Prefer Neon for 64-bits bitops. */ }; const struct tune_params arm_cortex_a15_tune = @@ -990,6 +1001,7 @@ const struct tune_params arm_cortex_a15_tune = arm_default_branch_cost, true, /* Prefer LDRD/STRD. */ {true, true}, /* Prefer non short circuit. */ + false /* Prefer Neon for 64-bits bitops. */ }; /* Branches can be dual-issued on Cortex-A5, so conditional execution is @@ -1006,6 +1018,7 @@ const struct tune_params arm_cortex_a5_tune = arm_cortex_a5_branch_cost, false, /* Prefer LDRD/STRD. */ {false, false}, /* Prefer non short circuit. */ + false /* Prefer Neon for 64-bits bitops. */ }; const struct tune_params arm_cortex_a9_tune = @@ -1019,6 +1032,7 @@ const struct tune_params arm_cortex_a9_tune = arm_default_branch_cost, false, /* Prefer LDRD/STRD. */ {true, true}, /* Prefer non short circuit. */ + false /* Prefer Neon for 64-bits bitops. */ }; /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than @@ -1034,6 +1048,7 @@ const struct tune_params arm_v6m_tune = arm_default_branch_cost, false, /* Prefer LDRD/STRD. */ {false, false}, /* Prefer non short circuit. */ + false /* Prefer Neon for 64-bits bitops. */ }; const struct tune_params arm_fa726te_tune = @@ -1047,6 +1062,7 @@ const struct tune_params arm_fa726te_tune = arm_default_branch_cost, false, /* Prefer LDRD/STRD. */ {true, true}, /* Prefer non short circuit. */ + false /* Prefer Neon for 64-bits bitops. */ }; @@ -2077,6 +2093,12 @@ arm_option_override (void) global_options.x_param_values, global_options_set.x_param_values); + /* Use Neon to perform 64-bits operations rather than core + registers. */ + prefer_neon_for_64bits = current_tune->prefer_neon_for_64bits; + if (use_neon_for_64bits == 1) + prefer_neon_for_64bits = true; + /* Use the alternative scheduling-pressure algorithm by default. */ maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 2, global_options.x_param_values, diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index f520cc7..c71d85f 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -356,6 +356,9 @@ extern void (*arm_lang_output_object_attributes_hook)(void); #define TARGET_IDIV ((TARGET_ARM && arm_arch_arm_hwdiv) \ || (TARGET_THUMB2 && arm_arch_thumb_hwdiv)) +/* Should NEON be used for 64-bits bitops. */ +#define TARGET_PREFER_NEON_64BITS (prefer_neon_for_64bits) + /* True iff the full BPABI is being used. If TARGET_BPABI is true, then TARGET_AAPCS_BASED must be true -- but the converse does not hold. TARGET_BPABI implies the use of the BPABI runtime library, @@ -541,6 +544,10 @@ extern int arm_arch_arm_hwdiv; /* Nonzero if chip supports integer division instruction in Thumb mode. */ extern int arm_arch_thumb_hwdiv; +/* Nonzero if we should use Neon to handle 64-bits operations rather + than core registers. */ +extern int prefer_neon_for_64bits; + #ifndef TARGET_DEFAULT #define TARGET_DEFAULT (MASK_APCS_FRAME) #endif diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index ac507ef..afde613 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -202,7 +202,7 @@ ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without ; arm_arch6. This attribute is used to compute attribute "enabled", ; use type "any" to enable an alternative in all cases. -(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,neon_onlya8,nota8,neon_nota8,iwmmxt,iwmmxt2" +(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,nota8,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2" (const_string "any")) (define_attr "arch_enabled" "no,yes" @@ -241,18 +241,18 @@ (eq_attr "tune" "cortexa8")) (const_string "yes") - (and (eq_attr "arch" "neon_onlya8") - (eq_attr "tune" "cortexa8") - (match_test "TARGET_NEON")) + (and (eq_attr "arch" "avoid_neon_for_64bits") + (match_test "TARGET_NEON") + (not (match_test "TARGET_PREFER_NEON_64BITS"))) (const_string "yes") (and (eq_attr "arch" "nota8") (not (eq_attr "tune" "cortexa8"))) (const_string "yes") - (and (eq_attr "arch" "neon_nota8") - (not (eq_attr "tune" "cortexa8")) - (match_test "TARGET_NEON")) + (and (eq_attr "arch" "neon_for_64bits") + (match_test "TARGET_NEON") + (match_test "TARGET_PREFER_NEON_64BITS")) (const_string "yes") (and (eq_attr "arch" "iwmmxt2") @@ -4370,7 +4370,7 @@ [(set_attr "length" "*,8,8,*") (set_attr "predicable" "no,yes,yes,no") (set_attr "neon_type" "neon_int_1,*,*,neon_int_1") - (set_attr "arch" "neon_nota8,*,*,neon_onlya8")] + (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits")] ) (define_expand "one_cmplsi2" diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt index fb12c55..83b6002 100644 --- a/gcc/config/arm/arm.opt +++ b/gcc/config/arm/arm.opt @@ -251,3 +251,7 @@ that may trigger Cortex-M3 errata. munaligned-access Target Report Var(unaligned_access) Init(2) Enable unaligned word and halfword accesses to packed data. + +mneon-for-64bits +Target Report RejectNegative Var(use_neon_for_64bits) Init(0) +Use Neon to perform 64-bits operations rather than core registers. diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 2103580..8b0e877 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -617,7 +617,7 @@ [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1,*,*,*") (set_attr "conds" "*,clob,clob,*,clob,clob,clob") (set_attr "length" "*,8,8,*,8,8,8") - (set_attr "arch" "nota8,*,*,onlya8,*,*,*")] + (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits,*,*,*")] ) (define_insn "*sub3_neon" @@ -654,7 +654,7 @@ [(set_attr "neon_type" "neon_int_2,*,*,*,neon_int_2") (set_attr "conds" "*,clob,clob,clob,*") (set_attr "length" "*,8,8,8,*") - (set_attr "arch" "nota8,*,*,*,onlya8")] + (set_attr "arch" "neon_for_64bits,*,*,*,avoid_neon_for_64bits")] ) (define_insn "*mul3_neon" @@ -816,7 +816,7 @@ } [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1") (set_attr "length" "*,*,8,8,*,*") - (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")] + (set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")] ) ;; The concrete forms of the Neon immediate-logic instructions are vbic and @@ -861,7 +861,7 @@ } [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1") (set_attr "length" "*,*,8,8,*,*") - (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")] + (set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")] ) (define_insn "orn3_neon" @@ -957,7 +957,7 @@ veor\t%P0, %P1, %P2" [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1") (set_attr "length" "*,8,8,*") - (set_attr "arch" "nota8,*,*,onlya8")] + (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits")] ) (define_insn "one_cmpl2" @@ -1279,7 +1279,7 @@ } DONE; }" - [(set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8") + [(set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits") (set_attr "opt" "*,*,speed,speed,*,*")] ) @@ -1380,7 +1380,7 @@ DONE; }" - [(set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8") + [(set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits") (set_attr "opt" "*,*,speed,speed,*,*")] )