diff mbox

[AArch64,PR65375] Fix RTX cost for vector SET

Message ID 55066BCC.4010900@linaro.org
State New
Headers show

Commit Message

Kugan Vivekanandarajah March 16, 2015, 5:36 a.m. UTC
AArch64 RTX cost for vector SET is causing PR65375. Lower subreg is
using this rtx_cost to compute the cost of moves, and splitting anything
larger than word size, 64-bits in this case. The aarch64 rtx_costs is
returning 2 * COST_N_INSNS(1) for vector moves, so they get split.
Attach patch fixes this.

With the patch the testcase in the PR:

#include <arm_neon.h>
void hello_vst2(float* fout, float *fin)
{
  float32x4x2_t a;
  a = vld2q_f32 (fin);
  vst2q_f32 (fout, a);
}

Changes to:

 hello_vst2:
-	ld2	{v0.4s - v1.4s}, [x1]
-	sub	sp, sp, #32
-	umov	x1, v0.d[0]
-	umov	x2, v0.d[1]
-	str	q1, [sp, 16]
-	mov	x5, x1
-	stp	x5, x2, [sp]
-	ld1	{v0.16b - v1.16b}, [sp]
+	ld2	{v2.4s - v3.4s}, [x1]
+	orr	v0.16b, v2.16b, v2.16b
+	orr	v1.16b, v3.16b, v3.16b
 	st2	{v0.4s - v1.4s}, [x0]
-	add	sp, sp, 32
 	ret


lower-subreg.c:compute_costs() only cares about the cost of a (set (reg)
(const_int )) move but I think the intention, at least for now, is to
return extra_cost->vect.alu for all the vector operations.

Regression tested on aarch64-linux-gnu with no new regression. Is this
OK for trunk?

Thanks,
Kugan


gcc/ChangeLog:

2015-03-16  Kugan Vivekanandarajah  <kuganv@linaro.org>
            Jim Wilson  <jim.wilson@linaro.org>

	PR target/65375
	* config/aarch64/aarch64.c (aarch64_rtx_costs): Return
	extra_cost->vect.alu for SET.

Comments

Kyrylo Tkachov March 16, 2015, 10:02 a.m. UTC | #1
On 16/03/15 05:36, Kugan wrote:

Hi Kugan,

> AArch64 RTX cost for vector SET is causing PR65375. Lower subreg is
> using this rtx_cost to compute the cost of moves, and splitting anything
> larger than word size, 64-bits in this case. The aarch64 rtx_costs is
> returning 2 * COST_N_INSNS(1) for vector moves, so they get split.
> Attach patch fixes this.
>
> With the patch the testcase in the PR:
>
> #include <arm_neon.h>
> void hello_vst2(float* fout, float *fin)
> {
>    float32x4x2_t a;
>    a = vld2q_f32 (fin);
>    vst2q_f32 (fout, a);
> }
>
> Changes to:
>
>   hello_vst2:
> -	ld2	{v0.4s - v1.4s}, [x1]
> -	sub	sp, sp, #32
> -	umov	x1, v0.d[0]
> -	umov	x2, v0.d[1]
> -	str	q1, [sp, 16]
> -	mov	x5, x1
> -	stp	x5, x2, [sp]
> -	ld1	{v0.16b - v1.16b}, [sp]
> +	ld2	{v2.4s - v3.4s}, [x1]
> +	orr	v0.16b, v2.16b, v2.16b
> +	orr	v1.16b, v3.16b, v3.16b
>   	st2	{v0.4s - v1.4s}, [x0]
> -	add	sp, sp, 32
>   	ret
>
>
> lower-subreg.c:compute_costs() only cares about the cost of a (set (reg)
> (const_int )) move but I think the intention, at least for now, is to
> return extra_cost->vect.alu for all the vector operations.

Almost, what we want at the moment is COSTS_N_INSNS (1) + 
extra_cost->vect.alu
>
> Regression tested on aarch64-linux-gnu with no new regression. Is this
> OK for trunk?

Are you sure it's a (set (reg) (const_int)) that's being costed here? I 
thought for moves into vecto registers it would be a (set (reg) 
(const_vector)) which we don't handle in our rtx costs currently. I 
think the correct approach would be to extend the aarch64_rtx_costs 
switch statement to handle the CONST_VECT case. I believe you can use 
aarch64_simd_valid_immediate to check whether x is a valid immediate for 
a simd instruction and give it a cost of extra_cost->vect.alu. The logic 
should be similar to the CONST_INT case.

Thanks,
Kyrill
>
> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2015-03-16  Kugan Vivekanandarajah  <kuganv@linaro.org>
>              Jim Wilson  <jim.wilson@linaro.org>
>
> 	PR target/65375
> 	* config/aarch64/aarch64.c (aarch64_rtx_costs): Return
> 	extra_cost->vect.alu for SET.
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cba3c1a..db69979 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5517,6 +5517,13 @@  aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
       op0 = SET_DEST (x);
       op1 = SET_SRC (x);
 
+      /* Sets don't have a mode, so we must recompute this here.  */
+      if (VECTOR_MODE_P (GET_MODE (op0)))
+	{
+	  *cost += extra_cost->vect.alu;
+	  return true;
+	}
+
       switch (GET_CODE (op0))
 	{
 	case MEM: