diff mbox

[AArch64] Implement popcount pattern

Message ID CO2PR07MB2694BD5587FAA6A364FF2068834F0@CO2PR07MB2694.namprd07.prod.outlook.com
State New
Headers show

Commit Message

Hurugalawadi, Naveen Feb. 3, 2017, 6:57 a.m. UTC
Hi Andrew,

Thanks for clearing the confusion.

> I don't understand this comment and how it relates to your updated patch

foo, foo1 and foo2 generates calls to "popcountdi2" which should have
been "popcountsi2" for foo1. When Kyrill commented on using the
popcountsi2; I was confused :).

Hence, the testcase generally checks for the absence of call to "popcount"
and the presence of "cnt" instruction instead.

>> Now of course what should change still is the argument 
>> types to foo1/foo2 

The arguments to foo1 and foo2 are modified as required.

Bootstrapped and regression tested on aarch64-linux-gnu with no regressions.

Please let us know if its okay for stage 1?

Thanks,
Naveen

Comments

Kyrill Tkachov Feb. 3, 2017, 10:28 a.m. UTC | #1
On 03/02/17 06:57, Hurugalawadi, Naveen wrote:
> Hi Andrew,
>
> Thanks for clearing the confusion.
>
>> I don't understand this comment and how it relates to your updated patch
> foo, foo1 and foo2 generates calls to "popcountdi2" which should have
> been "popcountsi2" for foo1. When Kyrill commented on using the
> popcountsi2; I was confused :).
>
> Hence, the testcase generally checks for the absence of call to "popcount"
> and the presence of "cnt" instruction instead.
>

Sorry for the confusion, I should have been more explicit that I was talking
about the optab.

>>> Now of course what should change still is the argument
>>> types to foo1/foo2
> The arguments to foo1 and foo2 are modified as required.
>
> Bootstrapped and regression tested on aarch64-linux-gnu with no regressions.
>
> Please let us know if its okay for stage 1?

This looks good to me, but you'll need James (or another aarch64 maintainer) to approve.

Thanks,
Kyrill

> Thanks,
> Naveen
James Greenhalgh Feb. 4, 2017, 4:03 a.m. UTC | #2
On Fri, Feb 03, 2017 at 06:57:54AM +0000, Hurugalawadi, Naveen wrote:
> Hi Andrew,
> 
> Thanks for clearing the confusion.
> 
> > I don't understand this comment and how it relates to your updated patch
> 
> foo, foo1 and foo2 generates calls to "popcountdi2" which should have
> been "popcountsi2" for foo1. When Kyrill commented on using the
> popcountsi2; I was confused :).
> 
> Hence, the testcase generally checks for the absence of call to "popcount"
> and the presence of "cnt" instruction instead.
> 
> >> Now of course what should change still is the argument 
> >> types to foo1/foo2 
> 
> The arguments to foo1 and foo2 are modified as required.
> 
> Bootstrapped and regression tested on aarch64-linux-gnu with no regressions.
> 
> Please let us know if its okay for stage 1?

This looks OK to me now. The change is self contained, looks safe, and
has only been held up for the testcase issue.

But, please give Richard/Marcus a reasonable time to object (say, end of day
Tuesday) before committing.

Thanks,
James

> 
> Thanks,
> Naveen

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index a693a3b..684a833 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -3778,6 +3778,39 @@
>    }
>  )
>  
> +;; Pop count be done via the "CNT" instruction in AdvSIMD.
> +;;
> +;; MOV	v.1d, x0
> +;; CNT	v1.8b, v.8b
> +;; ADDV b2, v1.8b
> +;; MOV	w0, v2.b[0]
> +
> +(define_expand "popcount<mode>2"
> +  [(match_operand:GPI 0 "register_operand")
> +   (match_operand:GPI 1 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  rtx v = gen_reg_rtx (V8QImode);
> +  rtx v1 = gen_reg_rtx (V8QImode);
> +  rtx r = gen_reg_rtx (QImode);
> +  rtx in = operands[1];
> +  rtx out = operands[0];
> +  if(<MODE>mode == SImode)
> +    {
> +      rtx tmp;
> +      tmp = gen_reg_rtx (DImode);
> +      /* If we have SImode, zero extend to DImode, pop count does
> +         not change if we have extra zeros. */
> +      emit_insn (gen_zero_extendsidi2 (tmp, in));
> +      in = tmp;
> +    }
> +  emit_move_insn (v, gen_lowpart (V8QImode, in));
> +  emit_insn (gen_popcountv8qi2 (v1, v));
> +  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
> +  emit_insn (gen_zero_extendqi<mode>2 (out, r));
> +  DONE;
> +})
> +
>  (define_insn "clrsb<mode>2"
>    [(set (match_operand:GPI 0 "register_operand" "=r")
>          (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt.c b/gcc/testsuite/gcc.target/aarch64/popcnt.c
> new file mode 100644
> index 0000000..7e95796
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int
> +foo (int x)
> +{
> +  return __builtin_popcount (x);
> +}
> +
> +long
> +foo1 (long x)
> +{
> +  return __builtin_popcountl (x);
> +}
> +
> +long long
> +foo2 (long long x)
> +{
> +  return __builtin_popcountll (x);
> +}
> +
> +/* { dg-final { scan-assembler-not "popcount" } } */
> +/* { dg-final { scan-assembler-times "cnt\t" 3 } } */
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a693a3b..684a833 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3778,6 +3778,39 @@ 
   }
 )
 
+;; Pop count be done via the "CNT" instruction in AdvSIMD.
+;;
+;; MOV	v.1d, x0
+;; CNT	v1.8b, v.8b
+;; ADDV b2, v1.8b
+;; MOV	w0, v2.b[0]
+
+(define_expand "popcount<mode>2"
+  [(match_operand:GPI 0 "register_operand")
+   (match_operand:GPI 1 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx v = gen_reg_rtx (V8QImode);
+  rtx v1 = gen_reg_rtx (V8QImode);
+  rtx r = gen_reg_rtx (QImode);
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if(<MODE>mode == SImode)
+    {
+      rtx tmp;
+      tmp = gen_reg_rtx (DImode);
+      /* If we have SImode, zero extend to DImode, pop count does
+         not change if we have extra zeros. */
+      emit_insn (gen_zero_extendsidi2 (tmp, in));
+      in = tmp;
+    }
+  emit_move_insn (v, gen_lowpart (V8QImode, in));
+  emit_insn (gen_popcountv8qi2 (v1, v));
+  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
+  emit_insn (gen_zero_extendqi<mode>2 (out, r));
+  DONE;
+})
+
 (define_insn "clrsb<mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
         (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt.c b/gcc/testsuite/gcc.target/aarch64/popcnt.c
new file mode 100644
index 0000000..7e95796
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt.c
@@ -0,0 +1,23 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+foo (int x)
+{
+  return __builtin_popcount (x);
+}
+
+long
+foo1 (long x)
+{
+  return __builtin_popcountl (x);
+}
+
+long long
+foo2 (long long x)
+{
+  return __builtin_popcountll (x);
+}
+
+/* { dg-final { scan-assembler-not "popcount" } } */
+/* { dg-final { scan-assembler-times "cnt\t" 3 } } */