From patchwork Thu Aug 15 08:50:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1147493 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-507023-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="b9s04m9H"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 468KtP3X7pz9sN6 for ; Thu, 15 Aug 2019 18:51:08 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=QOqPVAO12S4pB7tkCMkLvXWOSUtqP/RDBe2mxOVt2iUt7Qj7fk47d cHP1Yb3j6QOQqkDmn/quAV5dW8lJzZxPXI5ZYAfNQ5XZXGnc/ffJ8tkfG7egkBVX J3UK3lYxa+gyNSL+O+ROeDtA8fuezmk07ZAogHG77PRONdFY37gccg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=9w/1vL2Oo6+B+tJb+MKm5rgM+dI=; b=b9s04m9HS6ifY1eYmc5w fKmc3trvAgxpOjA3cNcf1ou71ZJWksYY4YkSyYG1TKFCsakADevlQOhoZybLwSmc HOwAi73ENz8p2T9NLPH96f+j+SAB7EqqaNEAnOsc4ja28LENLncErHUHvm+XP1h0 h4KpZo5a9Q8YxJOlN9GZU5o= Received: (qmail 114692 invoked by alias); 15 Aug 2019 08:51:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 114641 invoked by uid 89); 15 Aug 2019 08:50:59 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-8.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, SPF_PASS autolearn=ham version=3.3.1 spammy=Optimise X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 15 Aug 2019 08:50:57 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 638CD28 for ; Thu, 15 Aug 2019 01:50:56 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0AE543F706 for ; Thu, 15 Aug 2019 01:50:55 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [committed][AArch64] Optimise aarch64_add_offset for SVE VL constants Date: Thu, 15 Aug 2019 09:50:54 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-IsSubscribed: yes aarch64_add_offset contains code to decompose all SVE VL-based constants into native operations. The worst-case fallback is to load the number of SVE elements into a register and use a general multiplication. This patch improves that fallback by reusing expand_mult if can_create_pseudo_p, rather than emitting a MULT pattern directly. In order to increase the chances of being able to use a simple add-and-shift, the patch also tries to compute VG * the lowest set bit of the multiplier, rather than always using CNTD as the basis for the multiplication path. This is tested by the ACLE patches but is really an independent improvement. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274519. Richard 2019-08-15 Richard Sandiford gcc/ * config/aarch64/aarch64.c (aarch64_add_offset): In the fallback multiplication case, try to compute VG * (lowest set bit) directly rather than always basing the multiplication on VG. Use expand_mult for the multiplication if we can. gcc/testsuite/ * gcc.target/aarch64/sve/loop_add_4.c: Expect 10 INCWs and INCDs rather than 8. Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2019-08-15 09:47:20.180358297 +0100 +++ gcc/config/aarch64/aarch64.c 2019-08-15 09:49:13.659521097 +0100 @@ -73,6 +73,7 @@ #define INCLUDE_STRING #include "selftest-rtl.h" #include "rtx-vector-builder.h" #include "intl.h" +#include "expmed.h" /* This file should be included last. */ #include "target-def.h" @@ -3465,20 +3466,36 @@ aarch64_add_offset (scalar_int_mode mode } else { - /* Use CNTD, then multiply it by FACTOR. */ - val = gen_int_mode (poly_int64 (2, 2), mode); + /* Base the factor on LOW_BIT if we can calculate LOW_BIT + directly, since that should increase the chances of being + able to use a shift and add sequence. If LOW_BIT itself + is out of range, just use CNTD. */ + if (low_bit <= 16 * 8) + factor /= low_bit; + else + low_bit = 1; + + val = gen_int_mode (poly_int64 (low_bit * 2, low_bit * 2), mode); val = aarch64_force_temporary (mode, temp1, val); - /* Go back to using a negative multiplication factor if we have - no register from which to subtract. */ - if (code == MINUS && src == const0_rtx) + if (can_create_pseudo_p ()) + { + rtx coeff1 = gen_int_mode (factor, mode); + val = expand_mult (mode, val, coeff1, NULL_RTX, false, true); + } + else { - factor = -factor; - code = PLUS; + /* Go back to using a negative multiplication factor if we have + no register from which to subtract. */ + if (code == MINUS && src == const0_rtx) + { + factor = -factor; + code = PLUS; + } + rtx coeff1 = gen_int_mode (factor, mode); + coeff1 = aarch64_force_temporary (mode, temp2, coeff1); + val = gen_rtx_MULT (mode, val, coeff1); } - rtx coeff1 = gen_int_mode (factor, mode); - coeff1 = aarch64_force_temporary (mode, temp2, coeff1); - val = gen_rtx_MULT (mode, val, coeff1); } if (shift > 0) Index: gcc/testsuite/gcc.target/aarch64/sve/loop_add_4.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve/loop_add_4.c 2019-03-08 18:14:29.780994734 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve/loop_add_4.c 2019-08-15 09:49:13.659521097 +0100 @@ -68,7 +68,8 @@ TEST_ALL (LOOP) /* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, w[0-9]+\n} 3 } } */ /* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 2\]} 8 } } */ /* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 2\]} 8 } } */ -/* { dg-final { scan-assembler-times {\tincw\tx[0-9]+\n} 8 } } */ +/* 2 for the calculations of -17 and 17. */ +/* { dg-final { scan-assembler-times {\tincw\tx[0-9]+\n} 10 } } */ /* { dg-final { scan-assembler-times {\tdecw\tz[0-9]+\.s, all, mul #16\n} 1 } } */ /* { dg-final { scan-assembler-times {\tdecw\tz[0-9]+\.s, all, mul #15\n} 1 } } */ @@ -85,7 +86,8 @@ TEST_ALL (LOOP) /* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, x[0-9]+\n} 3 } } */ /* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 3\]} 8 } } */ /* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 3\]} 8 } } */ -/* { dg-final { scan-assembler-times {\tincd\tx[0-9]+\n} 8 } } */ +/* 2 for the calculations of -17 and 17. */ +/* { dg-final { scan-assembler-times {\tincd\tx[0-9]+\n} 10 } } */ /* { dg-final { scan-assembler-times {\tdecd\tz[0-9]+\.d, all, mul #16\n} 1 } } */ /* { dg-final { scan-assembler-times {\tdecd\tz[0-9]+\.d, all, mul #15\n} 1 } } */