Message ID | 20190210001947.27278-15-hjl.tools@gmail.com |
---|---|
State | New |
Headers | show |
Series | V3: Emulate MMX intrinsics with SSE | expand |
On 2/10/19, H.J. Lu <hjl.tools@gmail.com> wrote: > Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE. > > PR target/89021 > * config/i386/mmx.md (sse_cvtps2pi): Add SSE emulation. > (sse_cvttps2pi): Likewise. It looks to me that this description is wrong. We don't have V4SF modes here, but V2SF, so we have to fake 64bit load in case of MMX. The cvtps2dq will access memory in true 128bit width, so this is wrong. We have to fix the description to not fake wide mode. Uros. > --- > gcc/config/i386/sse.md | 30 ++++++++++++++++++------------ > 1 file changed, 18 insertions(+), 12 deletions(-) > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index 7d2c0367911..ad31ac3d9e6 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -4668,26 +4668,32 @@ > (set_attr "mode" "V4SF")]) > > (define_insn "sse_cvtps2pi" > - [(set (match_operand:V2SI 0 "register_operand" "=y") > + [(set (match_operand:V2SI 0 "register_operand" "=y,Yv") > (vec_select:V2SI > - (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm")] > + (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm,YvBm")] > UNSPEC_FIX_NOTRUNC) > (parallel [(const_int 0) (const_int 1)])))] > - "TARGET_SSE" > - "cvtps2pi\t{%1, %0|%0, %q1}" > - [(set_attr "type" "ssecvt") > - (set_attr "unit" "mmx") > + "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE" > + "@ > + cvtps2pi\t{%1, %0|%0, %q1} > + %vcvtps2dq\t{%1, %0|%0, %1}" > + [(set_attr "mmx_isa" "native,x64") > + (set_attr "type" "ssecvt") > + (set_attr "unit" "mmx,*") > (set_attr "mode" "DI")]) > > (define_insn "sse_cvttps2pi" > - [(set (match_operand:V2SI 0 "register_operand" "=y") > + [(set (match_operand:V2SI 0 "register_operand" "=y,Yv") > (vec_select:V2SI > - (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm")) > + (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm,YvBm")) > (parallel [(const_int 0) (const_int 1)])))] > - "TARGET_SSE" > - "cvttps2pi\t{%1, %0|%0, %q1}" > - [(set_attr "type" "ssecvt") > - (set_attr "unit" "mmx") > + "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE" > + "@ > + cvttps2pi\t{%1, %0|%0, %q1} > + %vcvttps2dq\t{%1, %0|%0, %1}" > + [(set_attr "mmx_isa" "native,x64") > + (set_attr "type" "ssecvt") > + (set_attr "unit" "mmx,*") > (set_attr "prefix_rep" "0") > (set_attr "mode" "SF")]) > > -- > 2.20.1 > >
On Sun, Feb 10, 2019 at 2:48 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > On 2/10/19, H.J. Lu <hjl.tools@gmail.com> wrote: > > Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE. > > > > PR target/89021 > > * config/i386/mmx.md (sse_cvtps2pi): Add SSE emulation. > > (sse_cvttps2pi): Likewise. > > It looks to me that this description is wrong. We don't have V4SF > modes here, but V2SF, so we have to fake 64bit load in case of MMX. > The cvtps2dq will access memory in true 128bit width, so this is > wrong. > > We have to fix the description to not fake wide mode. > What do you propose to implement __m64 _mm_cvtps_pi32 (__m128 __A); We also have (define_insn "sse2_cvtps2pd<mask_name>" [(set (match_operand:V2DF 0 "register_operand" "=v") (float_extend:V2DF (vec_select:V2SF (match_operand:V4SF 1 "vector_operand" "vm") (parallel [(const_int 0) (const_int 1)]))))] "TARGET_SSE2 && <mask_avx512vl_condition>" "%vcvtps2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}" These aren't new problems introduced by my MMX work.
On Mon, Feb 11, 2019 at 8:08 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Sun, Feb 10, 2019 at 2:48 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > On 2/10/19, H.J. Lu <hjl.tools@gmail.com> wrote: > > > Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE. > > > > > > PR target/89021 > > > * config/i386/mmx.md (sse_cvtps2pi): Add SSE emulation. > > > (sse_cvttps2pi): Likewise. > > > > It looks to me that this description is wrong. We don't have V4SF > > modes here, but V2SF, so we have to fake 64bit load in case of MMX. > > The cvtps2dq will access memory in true 128bit width, so this is > > wrong. > > > > We have to fix the description to not fake wide mode. > > > > What do you propose to implement > > __m64 _mm_cvtps_pi32 (__m128 __A); Hm... In your original patch, we *do* have V4SF memory access, but the original insn accesses it in __m64 mode. This should be OK, but then accessing this memory in __m128 mode should also be OK. So, on a more detailed look, the original patch looks OK to me. Luckily, a false alarm... > > We also have > > (define_insn "sse2_cvtps2pd<mask_name>" > [(set (match_operand:V2DF 0 "register_operand" "=v") > (float_extend:V2DF > (vec_select:V2SF > (match_operand:V4SF 1 "vector_operand" "vm") > (parallel [(const_int 0) (const_int 1)]))))] > "TARGET_SSE2 && <mask_avx512vl_condition>" > "%vcvtps2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}" > > These aren't new problems introduced by my MMX work. This one is not problematic, since the instruction accesses memory in __m64 mode, which is narrower that V4SFmode. Uros.
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 7d2c0367911..ad31ac3d9e6 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -4668,26 +4668,32 @@ (set_attr "mode" "V4SF")]) (define_insn "sse_cvtps2pi" - [(set (match_operand:V2SI 0 "register_operand" "=y") + [(set (match_operand:V2SI 0 "register_operand" "=y,Yv") (vec_select:V2SI - (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm")] + (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm,YvBm")] UNSPEC_FIX_NOTRUNC) (parallel [(const_int 0) (const_int 1)])))] - "TARGET_SSE" - "cvtps2pi\t{%1, %0|%0, %q1}" - [(set_attr "type" "ssecvt") - (set_attr "unit" "mmx") + "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE" + "@ + cvtps2pi\t{%1, %0|%0, %q1} + %vcvtps2dq\t{%1, %0|%0, %1}" + [(set_attr "mmx_isa" "native,x64") + (set_attr "type" "ssecvt") + (set_attr "unit" "mmx,*") (set_attr "mode" "DI")]) (define_insn "sse_cvttps2pi" - [(set (match_operand:V2SI 0 "register_operand" "=y") + [(set (match_operand:V2SI 0 "register_operand" "=y,Yv") (vec_select:V2SI - (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm")) + (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm,YvBm")) (parallel [(const_int 0) (const_int 1)])))] - "TARGET_SSE" - "cvttps2pi\t{%1, %0|%0, %q1}" - [(set_attr "type" "ssecvt") - (set_attr "unit" "mmx") + "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE" + "@ + cvttps2pi\t{%1, %0|%0, %q1} + %vcvttps2dq\t{%1, %0|%0, %1}" + [(set_attr "mmx_isa" "native,x64") + (set_attr "type" "ssecvt") + (set_attr "unit" "mmx,*") (set_attr "prefix_rep" "0") (set_attr "mode" "SF")])