[AArch64,PR65375] Fix RTX cost for vector SET

Message ID	5506D77B.5060909@linaro.org
State	New
Headers	show Return-Path: <gcc-patches-return-393581-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=O1C/iD1N20Wa5eJZL 0xT0SRtNa+q2zMqZSsUN1ljB2+uG7XIFAq1ArepY098mbJw0MfsZ1y7eTQJHLYx8 O9i1elOVrl/FNTnU7cQJLQJ0EqG/lDvZtvquRcWfjp+VoAgqGBju+fg2R8jclvie cAvJoyp2bJQPRunrBbsU8jdZ+A= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Message-ID: <5506D77B.5060909@linaro.org> Date: Tue, 17 Mar 2015 00:15:39 +1100 From: Kugan <kugan.vivekanandarajah@linaro.org> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Kyrill Tkachov <kyrylo.tkachov@arm.com>, "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org> CC: Marcus Shawcroft <Marcus.Shawcroft@arm.com>, Richard Earnshaw <Richard.Earnshaw@arm.com>, Jim Wilson <jim.wilson@linaro.org> Subject: Re: [AArch64][PR65375] Fix RTX cost for vector SET References: <55066BCC.4010900@linaro.org> <5506AA24.3050108@arm.com> <5506CD7A.7030109@linaro.org> In-Reply-To: <5506CD7A.7030109@linaro.org> Content-Type: multipart/mixed; boundary="------------060106050704060608060406"

Message ID

5506D77B.5060909@linaro.org

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:cc:subject:references
	:in-reply-to:content-type; q=dns; s=default; b=O1C/iD1N20Wa5eJZL
	0xT0SRtNa+q2zMqZSsUN1ljB2+uG7XIFAq1ArepY098mbJw0MfsZ1y7eTQJHLYx8
	O9i1elOVrl/FNTnU7cQJLQJ0EqG/lDvZtvquRcWfjp+VoAgqGBju+fg2R8jclvie
	cAvJoyp2bJQPRunrBbsU8jdZ+A=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
Message-ID: <5506D77B.5060909@linaro.org>
Date: Tue, 17 Mar 2015 00:15:39 +1100
From: Kugan <kugan.vivekanandarajah@linaro.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:31.0) Gecko/20100101 Thunderbird/31.5.0
MIME-Version: 1.0
To: Kyrill Tkachov <kyrylo.tkachov@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
CC: Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>,
	Jim Wilson <jim.wilson@linaro.org>
Subject: Re: [AArch64][PR65375] Fix RTX cost for vector SET
References: <55066BCC.4010900@linaro.org> <5506AA24.3050108@arm.com>
	<5506CD7A.7030109@linaro.org>
In-Reply-To: <5506CD7A.7030109@linaro.org>
Content-Type: multipart/mixed;
	boundary="------------060106050704060608060406"

Commit Message

Kugan Vivekanandarajah March 16, 2015, 1:15 p.m. UTC

On 16/03/15 23:32, Kugan wrote:
>>> lower-subreg.c:compute_costs() only cares about the cost of a (set (reg)
>>> (const_int )) move but I think the intention, at least for now, is to
>>> return extra_cost->vect.alu for all the vector operations.
>>
>> Almost, what we want at the moment is COSTS_N_INSNS (1) +
>> extra_cost->vect.alu
> 
> Thanks Kyrill for the review.
> 
>>> Regression tested on aarch64-linux-gnu with no new regression. Is this
>>> OK for trunk?
>>
>> Are you sure it's a (set (reg) (const_int)) that's being costed here? I
>> thought for moves into vecto registers it would be a (set (reg)
>> (const_vector)) which we don't handle in our rtx costs currently. I
>> think the correct approach would be to extend the aarch64_rtx_costs
>> switch statement to handle the CONST_VECT case. I believe you can use
>> aarch64_simd_valid_immediate to check whether x is a valid immediate for
>> a simd instruction and give it a cost of extra_cost->vect.alu. The logic
>> should be similar to the CONST_INT case.
> 
> Sorry about the (set (reg) (const_int)) above. But the actual RTL that
> is being split at 220r.subreg2 is
> 
> (insn 11 10 12 2 (set (subreg:V4SF (reg/v:OI 77 [ __o ]) 0)
>          (subreg:V4SF (reg/v:OI 73 [ __o ]) 0))
> /home/kugan/work/builds/gcc-fsf-gcc/tools/lib/gcc/aarch64-none-linux-gnu/5.0.0/include/arm_neon.h:22625
> 800 {*aarch64_simd_movv4sf}
>       (nil))
> 
> And also, if we return RTX cost above COSTS_N_INSNS (1), it will be
> split and it dosent recover from there. Therefore we need something like
> the below to prevent that happening.
> 

Hi Kyrill,

How about the attached patch? It is similar to what is currently done
for scalar register move.

Thanks,
Kugan

Comments

Jim Wilson March 16, 2015, 4:42 p.m. UTC | #1

Resending, now that I've figured out how to make gmail send text email
instead of html.

> >> Almost, what we want at the moment is COSTS_N_INSNS (1) +
> >> extra_cost->vect.alu

This won't work, because extra_cost->vect.alu is COSTS_N_INSNS (1),
which means the total is COSTS_N_INSNS (2).

The lower-subreg pass makes a decision on whether to split based on
cost >= (word_move_cost * size/word_mode_size).  Vectors are twice the
size of word mode, and word moves are cost COSTS_N_INSNS (1).  Setting
the vector move cost to COSTS_N_INSNS (2) means we have COSTS_N_INSNS
(2) >= COSTS_N_INSNS (2) and vector moves are split which is bad for
vector register allocation.  This calculation happens in compute_costs
in lower-subreg.c.

> How about the attached patch? It is similar to what is currently done
> for scalar register move.

I like this approach of using the vector register size instead of word
size when we have a vector mode.

Jim

Kyrylo Tkachov March 16, 2015, 4:48 p.m. UTC | #2

On 16/03/15 13:15, Kugan wrote:
> On 16/03/15 23:32, Kugan wrote:
>>>> lower-subreg.c:compute_costs() only cares about the cost of a (set (reg)
>>>> (const_int )) move but I think the intention, at least for now, is to
>>>> return extra_cost->vect.alu for all the vector operations.
>>> Almost, what we want at the moment is COSTS_N_INSNS (1) +
>>> extra_cost->vect.alu
>> Thanks Kyrill for the review.
>>
>>>> Regression tested on aarch64-linux-gnu with no new regression. Is this
>>>> OK for trunk?
>>> Are you sure it's a (set (reg) (const_int)) that's being costed here? I
>>> thought for moves into vecto registers it would be a (set (reg)
>>> (const_vector)) which we don't handle in our rtx costs currently. I
>>> think the correct approach would be to extend the aarch64_rtx_costs
>>> switch statement to handle the CONST_VECT case. I believe you can use
>>> aarch64_simd_valid_immediate to check whether x is a valid immediate for
>>> a simd instruction and give it a cost of extra_cost->vect.alu. The logic
>>> should be similar to the CONST_INT case.
>> Sorry about the (set (reg) (const_int)) above. But the actual RTL that
>> is being split at 220r.subreg2 is
>>
>> (insn 11 10 12 2 (set (subreg:V4SF (reg/v:OI 77 [ __o ]) 0)
>>           (subreg:V4SF (reg/v:OI 73 [ __o ]) 0))
>> /home/kugan/work/builds/gcc-fsf-gcc/tools/lib/gcc/aarch64-none-linux-gnu/5.0.0/include/arm_neon.h:22625
>> 800 {*aarch64_simd_movv4sf}
>>        (nil))
>>
>> And also, if we return RTX cost above COSTS_N_INSNS (1), it will be
>> split and it dosent recover from there. Therefore we need something like
>> the below to prevent that happening.
>>
> Hi Kyrill,
>
> How about the attached patch? It is similar to what is currently done
> for scalar register move.

Hi Kugan,
yeah, I think this is a better approach, though I can't approve.

Kyrill

>
> Thanks,
> Kugan

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cba3c1a..b9db3ac 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5544,10 +5544,17 @@  aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 
 	  /* Fall through.  */
 	case REG:
+	  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
+	    {
+              /* The cost is 1 per register copied.  */
+              int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
+			      / GET_MODE_SIZE (V4SImode);
+              *cost = COSTS_N_INSNS (n_minus_1 + 1);
+	    }
 	  /* const0_rtx is in general free, but we will use an
 	     instruction to set a register to 0.  */
-          if (REG_P (op1) || op1 == const0_rtx)
-            {
+	  else if (REG_P (op1) || op1 == const0_rtx)
+	    {
               /* The cost is 1 per register copied.  */
               int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
 			      / UNITS_PER_WORD;

[AArch64,PR65375] Fix RTX cost for vector SET

Commit Message

Comments

Patch