From patchwork Tue Jul 5 15:00:15 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 644816 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rkRsS6ylGz9sD9 for ; Wed, 6 Jul 2016 01:01:00 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=WtWCwQxx; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=nysAckQrqdF07D9N OAbcnTThbKZlz78+AYyK/SgCcyPLo57jIOkvgpaXcsfav8WmjAoxznur1Q4hSu7I 4tWIEDcT5NyOcxjF5x4m6bPVXtvWwhIVcy7xihCI+4hUh4A16flmEOkH7TgsPJbx ttSelqlwQSUkLelC+NMs43F9gmA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; s=default; bh=zfQUVxaFGConHdux6o7RPy JWFZs=; b=WtWCwQxxCT/PGjDq1TCMwt9WXo6HZGay3MAQNSNlx+v6n6EsiNdDJ/ Muf/EWJ5Kf636Pigy6oXSSIyd/p1A160Cwl+lh+zlqKrrCikXwye8WZz4cH0LkYK TpGgrtATQF7Y4WdCYvioAxTSKSsG5RkBr02UhsE8lotFGuNBqma+Y= Received: (qmail 34586 invoked by alias); 5 Jul 2016 15:00:49 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 34126 invoked by uid 89); 5 Jul 2016 15:00:48 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL, BAYES_50, SPF_PASS autolearn=ham version=3.3.2 spammy=ior, reservation, eq_attr, bypasses X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 05 Jul 2016 15:00:28 +0000 Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01lp0184.outbound.protection.outlook.com [213.199.154.184]) (Using TLS) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-31-LGSLlsP6MwuawSXHDl5-0g-1; Tue, 05 Jul 2016 16:00:17 +0100 Received: from HE1PR0801MB1482.eurprd08.prod.outlook.com (10.167.190.136) by HE1PR0801MB1386.eurprd08.prod.outlook.com (10.167.248.18) with Microsoft SMTP Server (TLS) id 15.1.534.14; Tue, 5 Jul 2016 15:00:15 +0000 Received: from HE1PR0801MB1482.eurprd08.prod.outlook.com ([10.167.190.136]) by HE1PR0801MB1482.eurprd08.prod.outlook.com ([10.167.190.136]) with mapi id 15.01.0534.015; Tue, 5 Jul 2016 15:00:15 +0000 From: Wilco Dijkstra To: GCC Patches CC: nd , James Greenhalgh Subject: [PATCH][AArch64] Improve Cortex-A53 integer scheduler Date: Tue, 5 Jul 2016 15:00:15 +0000 Message-ID: x-ms-office365-filtering-correlation-id: 15485849-40a0-493d-e072-08d3a4e512c3 x-microsoft-exchange-diagnostics: 1; HE1PR0801MB1386; 20:mCGAvHRqIWxB7eaXvDDUY5MhjklM6ZQPvayJ+OyC1hvNO+VBhf9Hce8606KqxPO3FoAlPQKyNarxJqKYcZxh8YdDkYpSnD6pZgLsPy3z5w+Q7FB8XzhCqvzeizg4FKsSuESsSCumUWkd8D2yzlqWstoFf67puv0pS7lk93ISmCg= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0801MB1386; nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6055026); SRVR:HE1PR0801MB1386; BCL:0; PCL:0; RULEID:; SRVR:HE1PR0801MB1386; x-forefront-prvs: 0994F5E0C5 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(7916002)(199003)(54534003)(189002)(377424004)(7736002)(575784001)(76576001)(74316002)(5002640100001)(7696003)(7846002)(305945005)(86362001)(122556002)(5003600100003)(450100001)(2906002)(101416001)(3280700002)(3660700001)(10400500002)(4326007)(33656002)(11100500001)(54356999)(50986999)(3846002)(6116002)(102836003)(106116001)(105586002)(106356001)(2900100001)(77096005)(66066001)(8676002)(81166006)(586003)(81156014)(189998001)(92566002)(19580395003)(87936001)(9686002)(229853001)(97736004)(68736007)(19580405001)(8936002)(110136002); DIR:OUT; SFP:1101; SCL:1; SRVR:HE1PR0801MB1386; H:HE1PR0801MB1482.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Jul 2016 15:00:15.5542 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0801MB1386 X-MC-Unique: LGSLlsP6MwuawSXHDl5-0g-1 This patch improves the accuracy of the Cortex-A53 integer scheduler, resulting in performance gains across a wide range of benchmarks. OK for commit? ChangeLog: 2016-07-05 Wilco Dijkstra * config/arm/cortex-a53.md: Use final_presence_set for in-order. (cortex_a53_shift): Add mov_shift. (cortex_a53_shift_reg): Add new reservation for register shifts. (cortex_a53_alu): Remove bfm. (cortex_a53_alu_shift): Add bfm, remove mov_shift. (cortex_a53_alu_extr): Add new reservation for EXTR. (bypasses): Improve bypass modelling. diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md index fc60bc26c7caf7e94064d7f292b877b12f333fca..70c0f4daabe0ccb8e32808f1af51f5460e087a18 100644 --- a/gcc/config/arm/cortex-a53.md +++ b/gcc/config/arm/cortex-a53.md @@ -30,6 +30,7 @@ (define_cpu_unit "cortex_a53_slot0" "cortex_a53") (define_cpu_unit "cortex_a53_slot1" "cortex_a53") +(final_presence_set "cortex_a53_slot1" "cortex_a53_slot0") (define_reservation "cortex_a53_slot_any" "cortex_a53_slot0\ @@ -71,41 +72,43 @@ (define_insn_reservation "cortex_a53_shift" 2 (and (eq_attr "tune" "cortexa53") - (eq_attr "type" "adr,shift_imm,shift_reg,mov_imm,mvn_imm")) + (eq_attr "type" "adr,shift_imm,mov_imm,mvn_imm,mov_shift")) "cortex_a53_slot_any") -(define_insn_reservation "cortex_a53_alu_rotate_imm" 2 +(define_insn_reservation "cortex_a53_shift_reg" 2 (and (eq_attr "tune" "cortexa53") - (eq_attr "type" "rotate_imm")) - "(cortex_a53_slot1) - | (cortex_a53_single_issue)") + (eq_attr "type" "shift_reg,mov_shift_reg")) + "cortex_a53_slot_any+cortex_a53_hazard") (define_insn_reservation "cortex_a53_alu" 3 (and (eq_attr "tune" "cortexa53") (eq_attr "type" "alu_imm,alus_imm,logic_imm,logics_imm, alu_sreg,alus_sreg,logic_reg,logics_reg, adc_imm,adcs_imm,adc_reg,adcs_reg, - bfm,csel,clz,rbit,rev,alu_dsp_reg, - mov_reg,mvn_reg, - mrs,multiple,no_insn")) + csel,clz,rbit,rev,alu_dsp_reg, + mov_reg,mvn_reg,mrs,multiple,no_insn")) "cortex_a53_slot_any") (define_insn_reservation "cortex_a53_alu_shift" 3 (and (eq_attr "tune" "cortexa53") (eq_attr "type" "alu_shift_imm,alus_shift_imm, crc,logic_shift_imm,logics_shift_imm, - alu_ext,alus_ext, - extend,mov_shift,mvn_shift")) + alu_ext,alus_ext,bfm,extend,mvn_shift")) "cortex_a53_slot_any") (define_insn_reservation "cortex_a53_alu_shift_reg" 3 (and (eq_attr "tune" "cortexa53") (eq_attr "type" "alu_shift_reg,alus_shift_reg, logic_shift_reg,logics_shift_reg, - mov_shift_reg,mvn_shift_reg")) + mvn_shift_reg")) "cortex_a53_slot_any+cortex_a53_hazard") -(define_insn_reservation "cortex_a53_mul" 3 +(define_insn_reservation "cortex_a53_alu_extr" 3 + (and (eq_attr "tune" "cortexa53") + (eq_attr "type" "rotate_imm")) + "cortex_a53_slot1|cortex_a53_single_issue") + +(define_insn_reservation "cortex_a53_mul" 4 (and (eq_attr "tune" "cortexa53") (ior (eq_attr "mul32" "yes") (eq_attr "mul64" "yes"))) @@ -189,49 +192,43 @@ (define_insn_reservation "cortex_a53_branch" 0 (and (eq_attr "tune" "cortexa53") (eq_attr "type" "branch,call")) - "cortex_a53_slot_any,cortex_a53_branch") + "cortex_a53_slot_any+cortex_a53_branch") ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; General-purpose register bypasses ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -;; Model bypasses for unshifted operands to ALU instructions. +;; Model bypasses for ALU to ALU instructions. -(define_bypass 1 "cortex_a53_shift" - "cortex_a53_shift") +(define_bypass 0 "cortex_a53_shift*" + "cortex_a53_alu") -(define_bypass 1 "cortex_a53_alu, - cortex_a53_alu_shift*, - cortex_a53_alu_rotate_imm, - cortex_a53_shift" +(define_bypass 1 "cortex_a53_shift*" + "cortex_a53_shift*,cortex_a53_alu_*") + +(define_bypass 1 "cortex_a53_alu*" "cortex_a53_alu") -(define_bypass 2 "cortex_a53_alu, - cortex_a53_alu_shift*" +(define_bypass 1 "cortex_a53_alu*" "cortex_a53_alu_shift*" "aarch_forward_to_shift_is_not_shifted_reg") -;; In our model, we allow any general-purpose register operation to -;; bypass to the accumulator operand of an integer MADD-like operation. +(define_bypass 2 "cortex_a53_alu*" + "cortex_a53_alu_*,cortex_a53_shift*") -(define_bypass 1 "cortex_a53_alu*, - cortex_a53_load*, - cortex_a53_mul" +;; Model a bypass from MUL/MLA to MLA instructions. + +(define_bypass 1 "cortex_a53_mul" "cortex_a53_mul" "aarch_accumulator_forwarding") -;; Model a bypass from MLA/MUL to many ALU instructions. +;; Model a bypass from MUL/MLA to ALU instructions. (define_bypass 2 "cortex_a53_mul" - "cortex_a53_alu, - cortex_a53_alu_shift*") - -;; We get neater schedules by allowing an MLA/MUL to feed an -;; early load address dependency to a load. + "cortex_a53_alu") -(define_bypass 2 "cortex_a53_mul" - "cortex_a53_load*" - "arm_early_load_addr_dep") +(define_bypass 3 "cortex_a53_mul" + "cortex_a53_alu_*,cortex_a53_shift*") ;; Model bypasses for loads which are to be consumed by the ALU. @@ -239,47 +236,37 @@ "cortex_a53_alu") (define_bypass 3 "cortex_a53_load1" - "cortex_a53_alu_shift*") + "cortex_a53_alu_*,cortex_a53_shift*") + +(define_bypass 3 "cortex_a53_load2" + "cortex_a53_alu") ;; Model a bypass for ALU instructions feeding stores. -(define_bypass 1 "cortex_a53_alu*" - "cortex_a53_store1, - cortex_a53_store2, - cortex_a53_store3plus" +(define_bypass 0 "cortex_a53_alu*,cortex_a53_shift*" + "cortex_a53_store*" "arm_no_early_store_addr_dep") ;; Model a bypass for load and multiply instructions feeding stores. -(define_bypass 2 "cortex_a53_mul, - cortex_a53_load1, - cortex_a53_load2, - cortex_a53_load3plus" - "cortex_a53_store1, - cortex_a53_store2, - cortex_a53_store3plus" +(define_bypass 1 "cortex_a53_mul, + cortex_a53_load*" + "cortex_a53_store*" "arm_no_early_store_addr_dep") ;; Model a GP->FP register move as similar to stores. -(define_bypass 1 "cortex_a53_alu*" +(define_bypass 0 "cortex_a53_alu*,cortex_a53_shift*" "cortex_a53_r2f") -(define_bypass 2 "cortex_a53_mul, - cortex_a53_load1, - cortex_a53_load2, - cortex_a53_load3plus" +(define_bypass 1 "cortex_a53_mul, + cortex_a53_load*" "cortex_a53_r2f") -;; Shifts feeding Load/Store addresses may not be ready in time. +;; Model flag forwarding to branches. -(define_bypass 3 "cortex_a53_shift" - "cortex_a53_load*" - "arm_early_load_addr_dep") - -(define_bypass 3 "cortex_a53_shift" - "cortex_a53_store*" - "arm_early_store_addr_dep") +(define_bypass 0 "cortex_a53_alu*,cortex_a53_shift*" + "cortex_a53_branch") ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Floating-point/Advanced SIMD.