From patchwork Fri Nov 9 10:47:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrill Tkachov X-Patchwork-Id: 995430 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-489492-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=foss.arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="J1seCHEF"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42rxgv6bxbz9sB7 for ; Fri, 9 Nov 2018 21:47:55 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=kUYhOm47E/Z216b8t8Gm7NugkDhZBwSagyXg73DON/z q39cQ2+ICoHb/A3FNsltyi1b2IebzDJZI7G9/u1k4srNZ+5h74DHs77NWpHYYOHu ccFYhnCd26422jcpxLcrC8oJVe47Yq0+MDUIDkcD0zdaxuYzidLDsJG7mM5rWOLE = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=byOGvkUbT8owCQ4nsQ+/Ff9bj2I=; b=J1seCHEFVhXq+H3KF tbCozTSgSqoGIM9e2rA0xjHt2ZoRS+jx6BocajH7XxGMk7fLE+Q9IGs7g/WWgXVj PUSLAxR1TIMC/3saTuUeV4clkk0WSrFB/AEwfNz8Yzq2pqAkrXii+PTyRge925JV sa7Ehun2yyI3BI3N5hvAfUjPu4= Received: (qmail 8955 invoked by alias); 9 Nov 2018 10:47:48 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 7944 invoked by uid 89); 9 Nov 2018 10:47:47 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH, KAM_NUMSUBJECT autolearn=ham version=3.3.2 spammy=Advanced, vla, VLA X-HELO: foss.arm.com Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 09 Nov 2018 10:47:46 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 53647A78; Fri, 9 Nov 2018 02:47:44 -0800 (PST) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 636523F718; Fri, 9 Nov 2018 02:47:43 -0800 (PST) Message-ID: <5BE565CE.5000709@foss.arm.com> Date: Fri, 09 Nov 2018 10:47:42 +0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" CC: Marcus Shawcroft , "Richard Earnshaw (lists)" , James Greenhalgh , Richard Sandiford Subject: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64 Hi all, In this testcase the codegen for VLA SVE is worse than it could be due to unrolling: fully_peel_me: mov x1, 5 ptrue p1.d, all whilelo p0.d, xzr, x1 ld1d z0.d, p0/z, [x0] fadd z0.d, z0.d, z0.d st1d z0.d, p0, [x0] cntd x2 addvl x3, x0, #1 whilelo p0.d, x2, x1 beq .L1 ld1d z0.d, p0/z, [x0, #1, mul vl] fadd z0.d, z0.d, z0.d st1d z0.d, p0, [x3] cntw x2 incb x0, all, mul #2 whilelo p0.d, x2, x1 beq .L1 ld1d z0.d, p0/z, [x0] fadd z0.d, z0.d, z0.d st1d z0.d, p0, [x0] .L1: ret In this case, due to the vector-length-agnostic nature of SVE the compiler doesn't know the loop iteration count. For such loops we don't want to unroll if we don't end up eliminating branches as this just bloats code size and hurts icache performance. This patch introduces a new unroll-known-loop-iterations-only param that disables cunroll when the loop iteration count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA code, but it does help some Advanced SIMD cases as well where loops with an unknown iteration count are not unrolled when it doesn't eliminate the branches. So for the above testcase we generate now: fully_peel_me: mov x2, 5 mov x3, x2 mov x1, 0 whilelo p0.d, xzr, x2 ptrue p1.d, all .L2: ld1d z0.d, p0/z, [x0, x1, lsl 3] fadd z0.d, z0.d, z0.d st1d z0.d, p0, [x0, x1, lsl 3] incd x1 whilelo p0.d, x1, x3 bne .L2 ret Not perfect still, but it's preferable to the original code. The new param is enabled by default on aarch64 but disabled for other targets, leaving their behaviour unchanged (until other target people experiment with it and set it, if appropriate). Bootstrapped and tested on aarch64-none-linux-gnu. Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in performance. Ok for trunk? Thanks, Kyrill 2018-11-09 Kyrylo Tkachov * params.def (PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY): Define. * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Use above to disable unrolling on unknown iteration count. * config/aarch64/aarch64.c (aarch64_override_options_internal): Set PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY to 1. * doc/invoke.texi (--param unroll-known-loop-iterations-only): Document. 2018-11-09 Kyrylo Tkachov * gcc.target/aarch64/sve/unroll-1.c: New test. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index f0e54eda80656829528c018357dde2e1e87f6ebd..34d08a075221fd4c098e9b5e8fabd8fe3948d285 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -10993,6 +10993,12 @@ aarch64_override_options_internal (struct gcc_options *opts) opts->x_param_values, global_options_set.x_param_values); + /* Don't unroll loops where the exact iteration count is not known at + compile-time. */ + maybe_set_param_value (PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY, 1, + opts->x_param_values, + global_options_set.x_param_values); + /* If the user hasn't changed it via configure then set the default to 64 KB for the backend. */ maybe_set_param_value (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE, diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 802cc642453aef2d2c516bcbda22246252ec87c1..74e2aeda27d718264188761cf522d6c9f8025e07 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -10732,6 +10732,9 @@ The maximum number of branches on the hot path through the peeled sequence. @item max-completely-peeled-insns The maximum number of insns of a completely peeled loop. +@item unroll-known-loop-iterations-only +Only completely unroll loops where the iteration count is known. + @item max-completely-peel-times The maximum number of iterations of a loop to be suitable for complete peeling. diff --git a/gcc/params.def b/gcc/params.def index 4a5f2042dac72bb457488ac8bc35d09df94c929c..07946552232058cee41303e81ed694f7f0bb615e 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -344,6 +344,11 @@ DEFPARAM(PARAM_MAX_UNROLL_ITERATIONS, "The maximum depth of a loop nest we completely peel.", 8, 0, 0) +DEFPARAM(PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY, + "unroll-known-loop-iterations-only", + "Only completely unroll loops where the iteration count is known", + 0, 0, 1) + /* The maximum number of insns of an unswitched loop. */ DEFPARAM(PARAM_MAX_UNSWITCH_INSNS, "max-unswitch-insns", diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unroll-1.c b/gcc/testsuite/gcc.target/aarch64/sve/unroll-1.c new file mode 100644 index 0000000000000000000000000000000000000000..7f53d20cbf8e18a4389b86c037f56f024bac22a5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/unroll-1.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +/* Check that simple loop is not fully unrolled. */ + +void +fully_peel_me (double *x) +{ + for (int i = 0; i < 5; i++) + x[i] = x[i] * 2; +} + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[.+]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[.+\]\n} 1 } } */ diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c index eeae2a8c54af14e58970d1797c92ecc86ac0523c..a67800fe8807ba003c05c3d8bdd820cc8df93e57 100644 --- a/gcc/tree-ssa-loop-ivcanon.c +++ b/gcc/tree-ssa-loop-ivcanon.c @@ -883,6 +883,17 @@ try_unroll_loop_completely (struct loop *loop, loop->num); return false; } + else if (TREE_CODE (niter) == SCEV_NOT_KNOWN + && PARAM_VALUE (PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY) + == 1) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: " + "exact number of iterations not known " + "(--param unroll-known-loop-iterations-only).\n", + loop->num); + return false; + } } initialize_original_copy_tables ();