diff mbox

[i386,13/8,AVX-512] Fix argument order for perm and recp intrinsics.

Message ID CAFULd4YsyhC4D5CWcz6Z+kSZxFXBWs+HC31o=Sf2D-8P=zcmiA@mail.gmail.com
State New
Headers show

Commit Message

Uros Bizjak Feb. 13, 2014, 5:25 p.m. UTC
On Thu, Feb 13, 2014 at 1:55 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

>>> I've noticed that _mm512_permutexvar_epi[64|32] intrinsics
>>> have wrong arguments order. As per [1] first argument is index.
>>> For vmpermps/vpermpd intrinsics are fine, but I've changed tests
>>> to call CALC with same arg order as intrinsic. here is the same
>>> problem (wrong argument order) with vrcp14s[d|s].
>>> Also avx512er-vrcp28ss-2.c test called wrong intrinsic.
>>>
>>> [1]  http://software.intel.com/sites/landingpage/IntrinsicsGuide/
>>>
>>> gcc/
>>>         * config/i386/avx512fintrin.h (_mm512_maskz_permutexvar_epi64): Swap
>>>         arguments order in builtin.
>>>         (_mm512_permutexvar_epi64): Ditto.
>>>         (_mm512_mask_permutexvar_epi64): Ditto
>>>         (_mm512_maskz_permutexvar_epi32): Ditto
>>>         (_mm512_permutexvar_epi32): Ditto
>>>         (_mm512_mask_permutexvar_epi32): Ditto
>>>         * config/i386/sse.md (srcp14<mode>): Swap operands.
>>>
>>> gcc/testsuite/
>>>         * gcc.target/i386/avx512er-vrcp28ss-2.c: Call rigth intrinsic.
>>>         * gcc.target/i386/avx512f-vpermd-2.c: Fix reference calculations.
>>>         * gcc.target/i386/avx512f-vpermpd-2.c: Ditto.
>>>         * gcc.target/i386/avx512f-vpermps-2.c: Ditto.
>>>         * gcc.target/i386/avx512f-vpermq-var-2.c: Ditto.
>>>         * gcc.target/i386/avx512f-vrcp14sd-2.c: Ditto.
>>>         * gcc.target/i386/avx512f-vrcp14ss-2.c: Ditto.
>>>
>>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>>> index a04b289..d3b2dc5 100644
>>> --- a/gcc/config/i386/sse.md
>>> +++ b/gcc/config/i386/sse.md
>>> @@ -1456,12 +1456,12 @@
>>>    [(set (match_operand:VF_128 0 "register_operand" "=v")
>>>         (vec_merge:VF_128
>>>           (unspec:VF_128
>>> -           [(match_operand:VF_128 1 "nonimmediate_operand" "vm")]
>>> +           [(match_operand:VF_128 2 "nonimmediate_operand" "vm")]
>>>             UNSPEC_RCP14)
>>> -         (match_operand:VF_128 2 "register_operand" "v")
>>> +         (match_operand:VF_128 1 "register_operand" "v")
>>>           (const_int 1)))]
>>>    "TARGET_AVX512F"
>>> -  "vrcp14<ssescalarmodesuffix>\t{%1, %2, %0|%0, %2, %1}"
>>> +  "vrcp14<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
>>
>> Please don't change srcp pattern, it should be defined similar to
>> vrcpss (aka sse_vmrcpv4sf). You need to switch operand order
>> elsewhere.
>
> No, you are correct. Operands should be swapped as in your patch.

Eh, sorry that after some more thinking, I have to again revert this decision.

The srcp pattern should remain as is, and you should swap operands in
avx512fintrin.h instead:

--cut here--
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -12849,7 +12849,7 @@
          (match_operand:VF_128 2 "register_operand" "v")
          (const_int 1)))]
   "TARGET_AVX512ER"
-  "vrsqrt28<ssescalarmodesuffix>\t{<round_saeonly_op3>%2, %1, %0|%0,
%1, %2<round_saeonly_op3>}"
+  "vrsqrt28<ssescalarmodesuffix>\t{<round_saeonly_op3>%1, %2, %0|%0,
%2, %1<round_saeonly_op3>}"
   [(set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])

Intrinsics should swap their operands accordingly.

Uros.
diff mbox

Patch

Index: avx512fintrin.h
===================================================================
--- avx512fintrin.h     (revision 207762)
+++ avx512fintrin.h     (working copy)
@@ -1470,8 +1470,8 @@ 
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_rcp14_sd (__m128d __A, __m128d __B)
 {
-  return (__m128d) __builtin_ia32_rcp14sd ((__v2df) __A,
-                                          (__v2df) __B);
+  return (__m128d) __builtin_ia32_rcp14sd ((__v2df) __B,
+                                          (__v2df) __A);
 }

 extern __inline __m128
@@ -1478,8 +1478,8 @@ 
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_rcp14_ss (__m128 __A, __m128 __B)
 {
-  return (__m128) __builtin_ia32_rcp14ss ((__v4sf) __A,
-                                         (__v4sf) __B);
+  return (__m128) __builtin_ia32_rcp14ss ((__v4sf) __B,
+                                         (__v4sf) __A);
 }

 extern __inline __m512d
--cut here--

vec_merge RSQRT and RCP are unops of type "sse". To correctly
determine "memory" attribute, "sse" types look at operand1 only, so
this is the reason that the pattern is defined in this way.

There is similar problem with vec_merge rcp28 and rsqrt28 patterns.
operands 1 and 2 are swapped in the mnemonic, since only the last
operands allow memory:

Index: sse.md
===================================================================
--- sse.md      (revision 207764)
+++ sse.md      (working copy)
@@ -12825,7 +12825,7 @@ 
          (match_operand:VF_128 2 "register_operand" "v")
          (const_int 1)))]
   "TARGET_AVX512ER"
-  "vrcp28<ssescalarmodesuffix>\t{<round_saeonly_op3>%2, %1, %0|%0,
%1, %2<round_saeonly_op3>}"
+  "vrcp28<ssescalarmodesuffix>\t{<round_saeonly_op3>%1, %2, %0|%0,
%2, %1<round_saeonly_op3>}"
   [(set_attr "length_immediate" "1")