Vectorizing abs(char/short/int) on x86.

As there are some issues with abs() type conversions, I removed the
related content from the patch but only kept the SSE2 support for
abs(int).

For the define_expand I added as below, the else body is there to
avoid fall-through transformations to ABS operation in optabs.c.
Otherwise ABS will be converted to other operations even that we have
corresponding instructions from SSSE3.


(define_expand "abs<mode>2"
  [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand")
(abs:VI124_AVX2_48_AVX512F
 (match_operand:VI124_AVX2_48_AVX512F 1 "nonimmediate_operand")))]
  "TARGET_SSE2"
{
  if (!TARGET_SSSE3)
    ix86_expand_sse2_abs (operands[0], force_reg (<MODE>mode, operands[1]));
  else
    emit_insn (gen_rtx_SET (VOIDmode, operands[0],
   gen_rtx_ABS (<MODE>mode, operands[1])));
  DONE;
})


The patch is attached here. Please give me your comments.


thanks,
Cong







On Thu, Oct 24, 2013 at 5:09 PM, Cong Hou <congh@google.com> wrote:
> On Wed, Oct 23, 2013 at 11:18 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Wed, Oct 23, 2013 at 09:40:21PM -0700, Cong Hou wrote:
>>> On Wed, Oct 23, 2013 at 8:52 AM, Joseph S. Myers
>>> <joseph@codesourcery.com> wrote:
>>> > On Tue, 22 Oct 2013, Cong Hou wrote:
>>> >
>>> >> For abs(char/short), type conversions are needed as the current abs()
>>> >> function/operation does not accept argument of char/short type.
>>> >> Therefore when we want to get the absolute value of a char_val using
>>> >> abs (char_val), it will be converted into abs ((int) char_val). It
>>> >> then can be vectorized, but the generated code is not efficient as
>>> >> lots of packings and unpackings are envolved. But if we convert
>>> >> (char) abs ((int) char_val) to abs (char_val), the vectorizer will be
>>> >> able to generate better code. Same for short.
>>> >
>>> > ABS_EXPR has undefined overflow behavior.  Thus, abs ((int) -128) is
>>> > defined (and we also define the subsequent conversion of +128 to signed
>>> > char, which ISO C makes implementation-defined not undefined), and
>>> > converting to an ABS_EXPR on char would wrongly make it undefined.  For
>>> > such a transformation to be valid (in the absence of VRP saying that -128
>>> > isn't a possible value) you'd need a GIMPLE representation for
>>> > ABS_EXPR<overflow:wrap>, as distinct from ABS_EXPR<overflow:undefined>.
>>> > You don't have the option there is for some arithmetic operations of
>>> > converting to a corresponding operation on unsigned types.
>>> >
>>>
>>> Yes, you are right. The method I use can guarantee wrapping on
>>> overflow (either shift-xor-sub or max(x, -x)). Can I just add the
>>> condition if (flag_wrapv) before the conversion I made to prevent the
>>> undefined behavior on overflow?
>>
>> What HW insns you expand to is one thing, but if some GCC pass assumes that
>> ABS_EXPR always returns non-negative value (many do, look e.g. at
>> tree_unary_nonnegative_warnv_p, extract_range_from_unary_expr_1,
>> simplify_const_relational_operation, etc., you'd need to grep for all
>> ABS_EXPR/ABS occurrences) and optimizes code based on that fact, you get
>> wrong code because (char) abs((char) -128) is well defined.
>> If we change ABS_EXPR/ABS definition that it is well defined on the most
>> negative value of the typ (resp. mode), then we loose all those
>> optimizations, if we do that only for the char/short types, it would be
>> quite weird, though we could keep the benefits, but at the RTL level we'd
>> need to treat that way all the modes equal to short's mode and smaller (so,
>> for sizeof(short) == sizeof(int) target even int's mode).
>
>
> I checked those functions and they all consider the possibility of
> overflow. For example, tree_unary_nonnegative_warnv_p only returns
> true for ABS_EXPR on integers if overflow is undefined. If the
> consequence of overflow is wrapping, I think converting (char)
> abs((int)-128) to abs(-128) (-128 has char type) is safe. Can we do it
> by checking flag_wrapv?
>
> I could also first remove the abs conversion content from this patch
> but only keep the content of expanding abs() for i386. I will submit
> it later.
>
>
>>
>> The other possibility is not to create the ABS_EXPRs of char/short anywhere,
>> solve the vectorization issues either through tree-vect-patterns.c or
>> as part of the vectorization type demotion/promotions, see the recent
>> discussions for that, you'd represent the short/char abs for the vectorized
>> loop say using the shift-xor-sub or builtin etc. and if you want to do the
>> same thing for scalar code, you'd just have combiner try to match some
>> sequence.
>
>
> Yes, I could do it through tree-vect-patterns.c, if the abs conversion
> is prohibited. Currently the only reason I need the abs conversion is
> for vectorization.
>
> Vectorization type demotion/promotions is interesting, but I am afraid
> we will face the same problem there.
>
>
> Thank you for your comment!
>
>
> Cong
>
>
>>
>>         Jakub

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8a38316..84c7ab5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2013-10-22  Cong Hou  <congh@google.com>
+
+	PR target/58762
+	* config/i386/i386-protos.h (ix86_expand_sse2_abs): New function.
+	* config/i386/i386.c (ix86_expand_sse2_abs): New function.
+	* config/i386/sse.md: Add SSE2 support to abs (8/16/32-bit-int).
+
 2013-10-14  David Malcolm  <dmalcolm@redhat.com>
 
 	* dumpfile.h (gcc::dump_manager): New class, to hold state
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 3ab2f3a..ca31224 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -238,6 +238,7 @@ extern void ix86_expand_mul_widen_evenodd (rtx, rtx, rtx, bool, bool);
 extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool);
 extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
+extern void ix86_expand_sse2_abs (rtx, rtx);
 
 /* In i386-c.c  */
 extern void ix86_target_macros (void);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 02cbbbd..71905fc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -41696,6 +41696,53 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2)
 		       gen_rtx_MULT (mode, op1, op2));
 }
 
+void
+ix86_expand_sse2_abs (rtx op0, rtx op1)
+{
+  enum machine_mode mode = GET_MODE (op0);
+  rtx tmp0, tmp1;
+
+  switch (mode)
+    {
+      /* For 32-bit signed integer X, the best way to calculate the absolute
+	 value of X is (((signed) X >> (W-1)) ^ X) - ((signed) X >> (W-1)).  */
+      case V4SImode:
+	tmp0 = expand_simple_binop (mode, ASHIFTRT, op1,
+				    GEN_INT (GET_MODE_BITSIZE
+						 (GET_MODE_INNER (mode)) - 1),
+				    NULL, 0, OPTAB_DIRECT);
+	if (tmp0)
+	  tmp1 = expand_simple_binop (mode, XOR, op1, tmp0,
+				      NULL, 0, OPTAB_DIRECT);
+	if (tmp0 && tmp1)
+	  expand_simple_binop (mode, MINUS, tmp1, tmp0,
+			       op0, 0, OPTAB_DIRECT);
+	break;
+
+      /* For 16-bit signed integer X, the best way to calculate the absolute
+	 value of X is max (X, -X), as SSE2 provides the PMAXSW insn.  */
+      case V8HImode:
+	tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0);
+	if (tmp0)
+	  expand_simple_binop (mode, SMAX, op1, tmp0, op0, 0,
+			       OPTAB_DIRECT);
+	break;
+
+      /* For 8-bit signed integer X, the best way to calculate the absolute
+	 value of X is min ((unsigned char) X, (unsigned char) (-X)),
+	 as SSE2 provides the PMINUB insn.  */
+      case V16QImode:
+	tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0);
+	if (tmp0)
+	  expand_simple_binop (V16QImode, UMIN, op1, tmp0, op0, 0,
+			       OPTAB_DIRECT);
+	break;
+
+      default:
+	break;
+    }
+}
+
 /* Expand an insert into a vector register through pinsr insn.
    Return true if successful.  */
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index c3f6c94..b85ded4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -8721,7 +8721,7 @@
    (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
    (set_attr "mode" "DI")])
 
-(define_insn "abs<mode>2"
+(define_insn "*abs<mode>2"
   [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand" "=v")
 	(abs:VI124_AVX2_48_AVX512F
 	  (match_operand:VI124_AVX2_48_AVX512F 1 "nonimmediate_operand" "vm")))]
@@ -8733,6 +8733,20 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_expand "abs<mode>2"
+  [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand")
+	(abs:VI124_AVX2_48_AVX512F
+	  (match_operand:VI124_AVX2_48_AVX512F 1 "nonimmediate_operand")))]
+  "TARGET_SSE2"
+{
+  if (!TARGET_SSSE3)
+    ix86_expand_sse2_abs (operands[0], force_reg (<MODE>mode, operands[1]));
+  else
+    emit_insn (gen_rtx_SET (VOIDmode, operands[0],
+			    gen_rtx_ABS (<MODE>mode, operands[1])));
+  DONE;
+})
+
 (define_insn "abs<mode>2"
   [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
 	(abs:MMXMODEI
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 075d071..cf5b942 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2013-10-22  Cong Hou  <congh@google.com>
+
+	PR target/58762
+	* gcc.dg/vect/pr58762.c: New test.
+
 2013-10-14  Tobias Burnus  <burnus@net-b.de>
 
 	PR fortran/58658
diff --git a/gcc/testsuite/gcc.dg/vect/pr58762.c b/gcc/testsuite/gcc.dg/vect/pr58762.c
new file mode 100644
index 0000000..8b114a7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr58762.c
@@ -0,0 +1,28 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+void test1 (char* a, char* b)
+{
+  int i;
+  for (i = 0; i < 10000; ++i)
+    a[i] = abs (b[i]);
+}
+
+void test2 (short* a, short* b)
+{
+  int i;
+  for (i = 0; i < 10000; ++i)
+    a[i] = abs (b[i]);
+}
+
+void test3 (int* a, int* b)
+{
+  int i;
+  for (i = 0; i < 10000; ++i)
+    a[i] = abs (b[i]);
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 3 "vect"
+       { target i?86-*-* x86_64-*-* ia64-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */


Vectorizing abs(char/short/int) on x86.

Commit Message

Patch