From patchwork Fri Mar 26 16:16:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1458874 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=lE4A6Fwj; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4F6RtF02slz9sR4 for ; Sat, 27 Mar 2021 03:16:20 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DBDA43857C63; Fri, 26 Mar 2021 16:16:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DBDA43857C63 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1616775378; bh=+5nQ9zpcst1PXzD7xdMb5V13jtoA2TLwp7rjwwCaLtE=; h=To:Subject:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=lE4A6FwjYck2aqiTcCc2n+E68/lL7nXNKzPVjF1ia21TuKMXctpZY5ykYkcep1PQM oUlPV32D7yo9egGJAMFWTZMeJ//fMQioIL4SrEO0E3jeBwrRQoH/INwCsc7tpkUQxz 9iyjZ09t8jbE/NgYMtfTDyI58k0k4QOu5SSolBwM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 27D07385800F for ; Fri, 26 Mar 2021 16:16:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 27D07385800F Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C76751476 for ; Fri, 26 Mar 2021 09:16:15 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6F6A43F792 for ; Fri, 26 Mar 2021 09:16:15 -0700 (PDT) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 06/13] aarch64: Add a CPU-specific cost table for Neoverse V1 References: Date: Fri, 26 Mar 2021 16:16:14 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 26 Mar 2021 16:12:42 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This patch adds dedicated vector costs for Neoverse V1. Previously we just used the Cortex-A57 costs, which isn't ideal given that Cortex-A57 doesn't support SVE. gcc/ * config/aarch64/aarch64.c (neoversev1_advsimd_vector_cost) (neoversev1_sve_vector_cost): New cost structures. (neoversev1_vector_cost): Likewise. (neoversev1_tunings): Use them. Enable use_new_vector_costs. --- gcc/config/aarch64/aarch64.c | 95 +++++++++++++++++++++++++++++++++++- 1 file changed, 93 insertions(+), 2 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 7f727413d01..2e9853e4c9b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1619,12 +1619,102 @@ static const struct tune_params neoversen1_tunings = &generic_prefetch_tune }; +static const advsimd_vec_cost neoversev1_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 4, /* ld2_st2_permute_cost */ + 4, /* ld3_st3_permute_cost */ + 5, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 4, /* reduc_i8_cost */ + 4, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 6, /* reduc_f16_cost */ + 3, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* This depends very much on what the scalar value is and + where it comes from. E.g. some constants take two dependent + instructions or a load, while others might be moved from a GPR. + 4 seems to be a reasonable compromise in practice. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const sve_vec_cost neoversev1_sve_vector_cost = +{ + { + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 4, /* ld2_st2_permute_cost */ + 7, /* ld3_st3_permute_cost */ + 8, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + /* Theoretically, a reduction involving 31 scalar ADDs could + complete in ~9 cycles and would have a cost of 31. [SU]ADDV + completes in 14 cycles, so give it a cost of 31 + 5. */ + 36, /* reduc_i8_cost */ + /* Likewise for 15 scalar ADDs (~5 cycles) vs. 12: 15 + 7. */ + 22, /* reduc_i16_cost */ + /* Likewise for 7 scalar ADDs (~3 cycles) vs. 10: 7 + 7. */ + 14, /* reduc_i32_cost */ + /* Likewise for 3 scalar ADDs (~2 cycles) vs. 10: 3 + 8. */ + 11, /* reduc_i64_cost */ + /* Theoretically, a reduction involving 15 scalar FADDs could + complete in ~9 cycles and would have a cost of 30. FADDV + completes in 13 cycles, so give it a cost of 30 + 4. */ + 34, /* reduc_f16_cost */ + /* Likewise for 7 scalar FADDs (~6 cycles) vs. 11: 14 + 5. */ + 19, /* reduc_f32_cost */ + /* Likewise for 3 scalar FADDs (~4 cycles) vs. 9: 6 + 5. */ + 11, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* See the comment above the Advanced SIMD versions. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 3, /* clast_cost */ + 19, /* fadda_f16_cost */ + 11, /* fadda_f32_cost */ + 8, /* fadda_f64_cost */ + 3 /* scatter_store_elt_cost */ +}; + +/* Neoverse V1 costs for vector insn classes. */ +static const struct cpu_vector_cost neoversev1_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 2, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &neoversev1_advsimd_vector_cost, /* advsimd */ + &neoversev1_sve_vector_cost /* sve */ +}; + static const struct tune_params neoversev1_tunings = { &cortexa76_extra_costs, &generic_addrcost_table, &generic_regmove_cost, - &cortexa57_vector_cost, + &neoversev1_vector_cost, &generic_branch_cost, &generic_approx_modes, SVE_256, /* sve_width */ @@ -1641,7 +1731,8 @@ static const struct tune_params neoversev1_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS), /* tune_flags. */ + (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS + | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS), /* tune_flags. */ &generic_prefetch_tune };