From patchwork Tue Jul 28 13:12:36 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiong Wang X-Patchwork-Id: 501216 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 59E8A140AF3 for ; Tue, 28 Jul 2015 23:27:39 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=oExTlmgv; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=H7mh0MfjlElEQx4Ut51LEEMeSBD7l6daExmYIYyeDwFD/OK2FNeDz HBmd9QxKAC7tdkxfb8sw3V04Oax38YqN9LuA9wu9NqCuW2t7Bb5UqFbaFrtQkMC5 kqVJaWdJoZM7erYH3MFcYDSHM6i35ShkAbahSvcRQ4YESsgSipYYGM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=UpJq9AEvgpI+oZZeYXBynDw9Edc=; b=oExTlmgvEblCMSAcu1bd kmVQuJknNYaUeoslBmoBaYhvY7C7Te+IYN+oWRDRuXZPYQQn3kDhe4DcvnMLq/vY tANSSbCPX+UiiixnY5WjeuqRyNgOTKQc8GIVMQNgyPqS6Bxhk4Y7P6ZLuBUNSuV9 zTah8BtC0724LrM90rpTHPc= Received: (qmail 110595 invoked by alias); 28 Jul 2015 13:27:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 110585 invoked by uid 89); 28 Jul 2015 13:27:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 28 Jul 2015 13:27:27 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-37-Hhl6Nfx8R1SohaWkbhegQw-1; Tue, 28 Jul 2015 14:27:23 +0100 Received: from e104437-lin ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 28 Jul 2015 14:27:22 +0100 From: Jiong Wang To: gcc-patches Subject: [AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt Date: Tue, 28 Jul 2015 14:12:36 +0100 Message-ID: MIME-Version: 1.0 X-MC-Unique: Hhl6Nfx8R1SohaWkbhegQw-1 The instruction sequences for preparing argument for TLS descriptor runtime resolver and the later function call to resolver can actually be hoisted out of the loop. Currently we can't because we have exposed the hard register X0 as destination of "set". While GCC's RTL data flow infrastructure will skip or do very conservative assumption when hard register involved in and thus some loop IV opportunities are missed. This patch add another "tlsdesc_small_pseudo_" pattern, and avoid expose x0 to gcc generic code. Generally, we define a new register class FIXED_R0 which only contains register 0, so the instruction sequences generated from the new add pattern is the same as tlsdesc_small_, while the operand 0 is wrapped as pseudo register that RTL IV opt can handle it. Ideally, we should allow operand 0 to be any pseudo register, but then we can't model the override of x0 caused by the function call which is hidded by the UNSPEC. So here, we restricting operand 0 to be x0, the override of x0 can be reflected to the gcc. OK for trunk? 2015-07-28 Ramana Radhakrishnan Jiong Wang gcc/ * config/aarch64/aarch64.d (tlsdesc_small_pseudo_): New pattern. * config/aarch64/aarch64.h (reg_class): New enumeration FIXED_REG0. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. * config/aarch64/aarch64.c (aarch64_class_max_nregs): Likewise. (aarch64_register_move_cost): Likewise. (aarch64_load_symref_appropriately): Invoke the new added pattern if possible. * config/aarch64/constraints.md (Uc0): New constraint. gcc/testsuite. * gcc.target/aarch64/tlsdesc_hoist.c: New testcase. diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 3851564..fb4834a 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -454,6 +454,7 @@ extern unsigned long aarch64_isa_flags; enum reg_class { NO_REGS, + FIXED_REG0, CALLER_SAVE_REGS, GENERAL_REGS, STACK_REG, @@ -469,6 +470,7 @@ enum reg_class #define REG_CLASS_NAMES \ { \ "NO_REGS", \ + "FIXED_REG0" \ "CALLER_SAVE_REGS", \ "GENERAL_REGS", \ "STACK_REG", \ @@ -481,6 +483,7 @@ enum reg_class #define REG_CLASS_CONTENTS \ { \ { 0x00000000, 0x00000000, 0x00000000 }, /* NO_REGS */ \ + { 0x00000001, 0x00000000, 0x00000000 }, /* FIXED_REG0 */ \ { 0x0007ffff, 0x00000000, 0x00000000 }, /* CALLER_SAVE_REGS */ \ { 0x7fffffff, 0x00000000, 0x00000003 }, /* GENERAL_REGS */ \ { 0x80000000, 0x00000000, 0x00000000 }, /* STACK_REG */ \ diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ef07e05..f1f2cab 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1038,22 +1038,39 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, { machine_mode mode = GET_MODE (dest); rtx x0 = gen_rtx_REG (mode, R0_REGNUM); + rtx offset; rtx tp; gcc_assert (mode == Pmode || mode == ptr_mode); - /* In ILP32, the got entry is always of SImode size. Unlike - small GOT, the dest is fixed at reg 0. */ - if (TARGET_ILP32) - emit_insn (gen_tlsdesc_small_si (imm)); + if (can_create_pseudo_p ()) + { + rtx reg = gen_reg_rtx (mode); + + if (TARGET_ILP32) + emit_insn (gen_tlsdesc_small_pseudo_si (reg, imm)); + else + emit_insn (gen_tlsdesc_small_pseudo_di (reg, imm)); + + offset = reg; + } else - emit_insn (gen_tlsdesc_small_di (imm)); + { + /* In ILP32, the got entry is always of SImode size. Unlike + small GOT, the dest is fixed at reg 0. */ + if (TARGET_ILP32) + emit_insn (gen_tlsdesc_small_si (imm)); + else + emit_insn (gen_tlsdesc_small_di (imm)); + + offset = x0; + } tp = aarch64_load_tp (NULL); if (mode != Pmode) tp = gen_lowpart (mode, tp); - emit_insn (gen_rtx_SET (dest, gen_rtx_PLUS (mode, tp, x0))); + emit_insn (gen_rtx_SET (dest, gen_rtx_PLUS (mode, tp, offset))); set_unique_reg_note (get_last_insn (), REG_EQUIV, imm); return; } @@ -5099,6 +5116,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode) aarch64_vector_mode_p (mode) ? (GET_MODE_SIZE (mode) + UNITS_PER_VREG - 1) / UNITS_PER_VREG : (GET_MODE_SIZE (mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD; + case FIXED_REG0: case STACK_REG: return 1; @@ -6948,10 +6966,10 @@ aarch64_register_move_cost (machine_mode mode, = aarch64_tune_params.regmove_cost; /* Caller save and pointer regs are equivalent to GENERAL_REGS. */ - if (to == CALLER_SAVE_REGS || to == POINTER_REGS) + if (to == CALLER_SAVE_REGS || to == POINTER_REGS || to == FIXED_REG0) to = GENERAL_REGS; - if (from == CALLER_SAVE_REGS || from == POINTER_REGS) + if (from == CALLER_SAVE_REGS || from == POINTER_REGS || from == FIXED_REG0) from = GENERAL_REGS; /* Moving between GPR and stack cost is the same as GP2GP. */ diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c681cf1..3fcfe55 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4496,6 +4496,25 @@ [(set_attr "type" "call") (set_attr "length" "16")]) +;; The same as tlsdesc_small_ with hard register hiding. +;; The first operand is actually x0, while we wrap it under a delicated +;; register class so that before register allocation, it's seen as pseudo +;; register. The reason for doing this is we don't expose hard register X0 +;; as the destination of set as it will cause trouble for RTL loop iv. +;; RTL loop iv will abort ongoing optimization once it finds there is hard reg +;; as destination of set. +(define_insn "tlsdesc_small_pseudo_" + [(set (match_operand:PTR 0 "register_operand" "=Uc0") + (unspec:PTR [(match_operand 1 "aarch64_valid_symref" "S")] + UNSPEC_TLSDESC)) + (clobber (reg:DI LR_REGNUM)) + (clobber (reg:CC CC_REGNUM)) + (clobber (match_scratch:DI 2 "=r"))] + "TARGET_TLS_DESC" + "adrp\\t0, %A1\;ldr\\t%2, [%0, #%L1]\;add\\t%0, %0, %L1\;.tlsdesccall\\t%1\;blr\\t%2" + [(set_attr "type" "call") + (set_attr "length" "16")]) + (define_insn "stack_tie" [(set (mem:BLK (scratch)) (unspec:BLK [(match_operand:DI 0 "register_operand" "rk") diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index 5b189ea..b09d10d 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -24,6 +24,9 @@ (define_register_constraint "Ucs" "CALLER_SAVE_REGS" "@internal The caller save registers.") +(define_register_constraint "Uc0" "FIXED_REG0" + "@internal Represent X0/W0.") + (define_register_constraint "w" "FP_REGS" "Floating point and SIMD vector registers.") diff --git a/gcc/testsuite/gcc.target/aarch64/tlsdesc_hoist.c b/gcc/testsuite/gcc.target/aarch64/tlsdesc_hoist.c new file mode 100644 index 0000000..f9a6d7f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/tlsdesc_hoist.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target tls_native } */ +/* { dg-options "-O2 -fpic -fdump-rtl-loop2_invariant" } */ + +int cal (int, int); +__thread int tls_data; + +int +foo (int bound) +{ + int i = 0; + int sum = 0; + + for (i; i < bound; i++) + sum = cal (sum, tls_data); + + return sum; +} + +/* Insn sequences for TLS descriptor should be hoisted out of the loop. */ +/* { dg-final { scan-rtl-dump "Decided" "loop2_invariant" } } */