diff mbox series

[PR92658] Add missing vector truncmn2 expanders for avx512f

Message ID CAMZc-bxFhcDaV2QH4Cny3XX=o2zmMx-m+C7TwTYFq3u27-Ke_g@mail.gmail.com
State New
Headers show
Series [PR92658] Add missing vector truncmn2 expanders for avx512f | expand

Commit Message

Hongtao Liu May 20, 2020, 8:35 a.m. UTC
Hi:
  Bootstrap is ok, regression test on i386/x86-64 backend is ok.

gcc/ChangeLog:
        PR target/92658
        * config/i386/sse.md
        (trunc<pmov_src_lower><mode>2, truncv32hiv32qi2,
        trunc<ssedoublemodelower><mode>2): New expander.

gcc/testsuite/ChangeLog:
        * gcc.target/i386/pr92658-avx512f.c: New test.
        * gcc.target/i386/pr92658-avx512vl.c: Ditto.
        * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.

Comments

Uros Bizjak May 20, 2020, 3:42 p.m. UTC | #1
On Wed, May 20, 2020 at 10:35 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Hi:
>   Bootstrap is ok, regression test on i386/x86-64 backend is ok.
>
> gcc/ChangeLog:
>         PR target/92658
>         * config/i386/sse.md
>         (trunc<pmov_src_lower><mode>2, truncv32hiv32qi2,
>         trunc<ssedoublemodelower><mode>2): New expander.
>
> gcc/testsuite/ChangeLog:
>         * gcc.target/i386/pr92658-avx512f.c: New test.
>         * gcc.target/i386/pr92658-avx512vl.c: Ditto.
>         * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.

There are more conversions to be added. There are:

V2DImode to V2QImode, V2HImode, V2SImode
V4DImode to V4QImode, V4HImode, V4SImode
V8DImode to V8QImode, V8HImode, V8SImode

V4SImode to V4QImode, V4HImode
V8SImode to V8QImode, V8HImode
V16SImode to V16QImode, V16HImode

V8HImode to V8QImode
V16HImode to V16QImode
V32HImode to V32QImode

Uros.
Hongtao Liu May 21, 2020, 5:35 a.m. UTC | #2
On Wed, May 20, 2020 at 11:43 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, May 20, 2020 at 10:35 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > Hi:
> >   Bootstrap is ok, regression test on i386/x86-64 backend is ok.
> >
> > gcc/ChangeLog:
> >         PR target/92658
> >         * config/i386/sse.md
> >         (trunc<pmov_src_lower><mode>2, truncv32hiv32qi2,
> >         trunc<ssedoublemodelower><mode>2): New expander.
> >
> > gcc/testsuite/ChangeLog:
> >         * gcc.target/i386/pr92658-avx512f.c: New test.
> >         * gcc.target/i386/pr92658-avx512vl.c: Ditto.
> >         * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
>
> There are more conversions to be added. There are:
>
> V2DImode to V2QImode, V2HImode, V2SImode
> V4DImode to V4QImode, V4HImode, V4SImode
> V8DImode to V8QImode, V8HImode, V8SImode
>
> V4SImode to V4QImode, V4HImode
> V8SImode to V8QImode, V8HImode
> V16SImode to V16QImode, V16HImode
>
> V8HImode to V8QImode
> V16HImode to V16QImode
> V32HImode to V32QImode
>
All of them are added

Vectorization failure: (Add xfail in testcase for them since they need
generic part)
V2DImode to V2QImode, V2HImode
V4DImode to V4QImode, V4HImode
V8DImode to V8QImode

V4SImode to V4QImode, V4HImode
V8SImode to V8QImode

V8HImode to V8QImode

Vectorization success:
V2DImode to V2SImode (under TARGET_MMX_WITH_SSE)
V4DImode to V4SImode
V8DImode to V8HImode, V8SImode

V8SImode to V8HImode
V16SImode to V16QImode, V16HImode

V32HImode to V32QImode
V16HImode to V16HImode.


> Uros.

Update patch.
Regression test on i386/x86-64 backend is ok, bootstrap is ok.

gcc/ChangeLog:
        PR target/92658
        * config/i386/sse.md
        (trunc<pmov_src_lower><mode>2): New expander
        (truncv32hiv32qi2): Ditto.
        (trunc<ssedoublemodelower><mode>2): Ditto.
        (trunc<mode><pmov_dst_3>2): Ditto.
        (trunc<mode><pmov_dst_mode_4>2): Ditto.
        (truncv2div2si2): Ditto.
        (truncv8div8qi2): Ditto.
        (avx512f_<code>v8div16qi2): Renaming
        from *avx512f_<code>v8div16qi2.
        (avx512vl_<code>v2div2si): Renaming
        from *avx512vl_<code>v2div2si2.
        (avx512vl_<code><mode>v2<ssecakarnum>qi2): Renaming
        from *avx512vl_<code><mode>v<ssescalarnum>qi2.

gcc/testsuite/ChangeLog:
        * gcc.target/i386/pr92658-avx512f.c: New test.
        * gcc.target/i386/pr92658-avx512vl.c: Ditto.
        * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
Uros Bizjak May 21, 2020, 1:18 p.m. UTC | #3
On Thu, May 21, 2020 at 7:35 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, May 20, 2020 at 11:43 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Wed, May 20, 2020 at 10:35 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > Hi:
> > >   Bootstrap is ok, regression test on i386/x86-64 backend is ok.
> > >
> > > gcc/ChangeLog:
> > >         PR target/92658
> > >         * config/i386/sse.md
> > >         (trunc<pmov_src_lower><mode>2, truncv32hiv32qi2,
> > >         trunc<ssedoublemodelower><mode>2): New expander.
> > >
> > > gcc/testsuite/ChangeLog:
> > >         * gcc.target/i386/pr92658-avx512f.c: New test.
> > >         * gcc.target/i386/pr92658-avx512vl.c: Ditto.
> > >         * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
> >
> > There are more conversions to be added. There are:
> >
> > V2DImode to V2QImode, V2HImode, V2SImode
> > V4DImode to V4QImode, V4HImode, V4SImode
> > V8DImode to V8QImode, V8HImode, V8SImode
> >
> > V4SImode to V4QImode, V4HImode
> > V8SImode to V8QImode, V8HImode
> > V16SImode to V16QImode, V16HImode
> >
> > V8HImode to V8QImode
> > V16HImode to V16QImode
> > V32HImode to V32QImode
> >
> All of them are added
>
> Vectorization failure: (Add xfail in testcase for them since they need
> generic part)
> V2DImode to V2QImode, V2HImode
> V4DImode to V4QImode, V4HImode
> V8DImode to V8QImode
>
> V4SImode to V4QImode, V4HImode
> V8SImode to V8QImode
>
> V8HImode to V8QImode
>
> Vectorization success:
> V2DImode to V2SImode (under TARGET_MMX_WITH_SSE)
> V4DImode to V4SImode
> V8DImode to V8HImode, V8SImode
>
> V8SImode to V8HImode
> V16SImode to V16QImode, V16HImode
>
> V32HImode to V32QImode
> V16HImode to V16HImode.
>
>
> > Uros.
>
> Update patch.
> Regression test on i386/x86-64 backend is ok, bootstrap is ok.
>
> gcc/ChangeLog:
>         PR target/92658
>         * config/i386/sse.md
>         (trunc<pmov_src_lower><mode>2): New expander
>         (truncv32hiv32qi2): Ditto.
>         (trunc<ssedoublemodelower><mode>2): Ditto.
>         (trunc<mode><pmov_dst_3>2): Ditto.
>         (trunc<mode><pmov_dst_mode_4>2): Ditto.
>         (truncv2div2si2): Ditto.
>         (truncv8div8qi2): Ditto.
>         (avx512f_<code>v8div16qi2): Renaming
>         from *avx512f_<code>v8div16qi2.
>         (avx512vl_<code>v2div2si): Renaming
>         from *avx512vl_<code>v2div2si2.
>         (avx512vl_<code><mode>v2<ssecakarnum>qi2): Renaming
>         from *avx512vl_<code><mode>v<ssescalarnum>qi2.
>
> gcc/testsuite/ChangeLog:
>         * gcc.target/i386/pr92658-avx512f.c: New test.
>         * gcc.target/i386/pr92658-avx512vl.c: Ditto.
>         * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.


+  rtx op = simplify_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0);
+  operands[0] = op ? op : gen_rtx_SUBREG (V16QImode, operands[0], 0);

You should use simplify_gen_subreg, without null op fixup:

operands[0] = simplify_gen_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0);

+  "TARGET_MMX_WITH_SSE && TARGET_AVX512VL"

Do you really need TARGET_MMX_WITH_SSE?  Narrow modes are active even
without this flag.

Uros.
Hongtao Liu May 22, 2020, 4:54 a.m. UTC | #4
On Thu, May 21, 2020 at 7:18 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Thu, May 21, 2020 at 7:35 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Wed, May 20, 2020 at 11:43 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Wed, May 20, 2020 at 10:35 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > Hi:
> > > >   Bootstrap is ok, regression test on i386/x86-64 backend is ok.
> > > >
> > > > gcc/ChangeLog:
> > > >         PR target/92658
> > > >         * config/i386/sse.md
> > > >         (trunc<pmov_src_lower><mode>2, truncv32hiv32qi2,
> > > >         trunc<ssedoublemodelower><mode>2): New expander.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >         * gcc.target/i386/pr92658-avx512f.c: New test.
> > > >         * gcc.target/i386/pr92658-avx512vl.c: Ditto.
> > > >         * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
> > >
> > > There are more conversions to be added. There are:
> > >
> > > V2DImode to V2QImode, V2HImode, V2SImode
> > > V4DImode to V4QImode, V4HImode, V4SImode
> > > V8DImode to V8QImode, V8HImode, V8SImode
> > >
> > > V4SImode to V4QImode, V4HImode
> > > V8SImode to V8QImode, V8HImode
> > > V16SImode to V16QImode, V16HImode
> > >
> > > V8HImode to V8QImode
> > > V16HImode to V16QImode
> > > V32HImode to V32QImode
> > >
> > All of them are added
> >
> > Vectorization failure: (Add xfail in testcase for them since they need
> > generic part)
> > V2DImode to V2QImode, V2HImode
> > V4DImode to V4QImode, V4HImode
> > V8DImode to V8QImode
> >
> > V4SImode to V4QImode, V4HImode
> > V8SImode to V8QImode
> >
> > V8HImode to V8QImode
> >
> > Vectorization success:
> > V2DImode to V2SImode (under TARGET_MMX_WITH_SSE)
> > V4DImode to V4SImode
> > V8DImode to V8HImode, V8SImode
> >
> > V8SImode to V8HImode
> > V16SImode to V16QImode, V16HImode
> >
> > V32HImode to V32QImode
> > V16HImode to V16HImode.
> >
> >
> > > Uros.
> >
> > Update patch.
> > Regression test on i386/x86-64 backend is ok, bootstrap is ok.
> >
> > gcc/ChangeLog:
> >         PR target/92658
> >         * config/i386/sse.md
> >         (trunc<pmov_src_lower><mode>2): New expander
> >         (truncv32hiv32qi2): Ditto.
> >         (trunc<ssedoublemodelower><mode>2): Ditto.
> >         (trunc<mode><pmov_dst_3>2): Ditto.
> >         (trunc<mode><pmov_dst_mode_4>2): Ditto.
> >         (truncv2div2si2): Ditto.
> >         (truncv8div8qi2): Ditto.
> >         (avx512f_<code>v8div16qi2): Renaming
> >         from *avx512f_<code>v8div16qi2.
> >         (avx512vl_<code>v2div2si): Renaming
> >         from *avx512vl_<code>v2div2si2.
> >         (avx512vl_<code><mode>v2<ssecakarnum>qi2): Renaming
> >         from *avx512vl_<code><mode>v<ssescalarnum>qi2.
> >
> > gcc/testsuite/ChangeLog:
> >         * gcc.target/i386/pr92658-avx512f.c: New test.
> >         * gcc.target/i386/pr92658-avx512vl.c: Ditto.
> >         * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
>
>
> +  rtx op = simplify_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0);
> +  operands[0] = op ? op : gen_rtx_SUBREG (V16QImode, operands[0], 0);
>
> You should use simplify_gen_subreg, without null op fixup:
>
> operands[0] = simplify_gen_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0);
>
Changed.
> +  "TARGET_MMX_WITH_SSE && TARGET_AVX512VL"
>
> Do you really need TARGET_MMX_WITH_SSE?  Narrow modes are active even
> without this flag.
>
Changed.

> Uros.

Update patch.
Uros Bizjak May 22, 2020, 6:41 a.m. UTC | #5
On Fri, May 22, 2020 at 6:55 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, May 21, 2020 at 7:18 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Thu, May 21, 2020 at 7:35 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Wed, May 20, 2020 at 11:43 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Wed, May 20, 2020 at 10:35 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > >
> > > > > Hi:
> > > > >   Bootstrap is ok, regression test on i386/x86-64 backend is ok.
> > > > >
> > > > > gcc/ChangeLog:
> > > > >         PR target/92658
> > > > >         * config/i386/sse.md
> > > > >         (trunc<pmov_src_lower><mode>2, truncv32hiv32qi2,
> > > > >         trunc<ssedoublemodelower><mode>2): New expander.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >         * gcc.target/i386/pr92658-avx512f.c: New test.
> > > > >         * gcc.target/i386/pr92658-avx512vl.c: Ditto.
> > > > >         * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
> > > >
> > > > There are more conversions to be added. There are:
> > > >
> > > > V2DImode to V2QImode, V2HImode, V2SImode
> > > > V4DImode to V4QImode, V4HImode, V4SImode
> > > > V8DImode to V8QImode, V8HImode, V8SImode
> > > >
> > > > V4SImode to V4QImode, V4HImode
> > > > V8SImode to V8QImode, V8HImode
> > > > V16SImode to V16QImode, V16HImode
> > > >
> > > > V8HImode to V8QImode
> > > > V16HImode to V16QImode
> > > > V32HImode to V32QImode
> > > >
> > > All of them are added
> > >
> > > Vectorization failure: (Add xfail in testcase for them since they need
> > > generic part)
> > > V2DImode to V2QImode, V2HImode
> > > V4DImode to V4QImode, V4HImode
> > > V8DImode to V8QImode
> > >
> > > V4SImode to V4QImode, V4HImode
> > > V8SImode to V8QImode
> > >
> > > V8HImode to V8QImode
> > >
> > > Vectorization success:
> > > V2DImode to V2SImode (under TARGET_MMX_WITH_SSE)
> > > V4DImode to V4SImode
> > > V8DImode to V8HImode, V8SImode
> > >
> > > V8SImode to V8HImode
> > > V16SImode to V16QImode, V16HImode
> > >
> > > V32HImode to V32QImode
> > > V16HImode to V16HImode.
> > >
> > >
> > > > Uros.
> > >
> > > Update patch.
> > > Regression test on i386/x86-64 backend is ok, bootstrap is ok.
> > >
> > > gcc/ChangeLog:
> > >         PR target/92658
> > >         * config/i386/sse.md
> > >         (trunc<pmov_src_lower><mode>2): New expander
> > >         (truncv32hiv32qi2): Ditto.
> > >         (trunc<ssedoublemodelower><mode>2): Ditto.
> > >         (trunc<mode><pmov_dst_3>2): Ditto.
> > >         (trunc<mode><pmov_dst_mode_4>2): Ditto.
> > >         (truncv2div2si2): Ditto.
> > >         (truncv8div8qi2): Ditto.
> > >         (avx512f_<code>v8div16qi2): Renaming
> > >         from *avx512f_<code>v8div16qi2.
> > >         (avx512vl_<code>v2div2si): Renaming
> > >         from *avx512vl_<code>v2div2si2.
> > >         (avx512vl_<code><mode>v2<ssecakarnum>qi2): Renaming
> > >         from *avx512vl_<code><mode>v<ssescalarnum>qi2.
> > >
> > > gcc/testsuite/ChangeLog:
> > >         * gcc.target/i386/pr92658-avx512f.c: New test.
> > >         * gcc.target/i386/pr92658-avx512vl.c: Ditto.
> > >         * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
> >
> >
> > +  rtx op = simplify_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0);
> > +  operands[0] = op ? op : gen_rtx_SUBREG (V16QImode, operands[0], 0);
> >
> > You should use simplify_gen_subreg, without null op fixup:
> >
> > operands[0] = simplify_gen_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0);
> >
> Changed.
> > +  "TARGET_MMX_WITH_SSE && TARGET_AVX512VL"
> >
> > Do you really need TARGET_MMX_WITH_SSE?  Narrow modes are active even
> > without this flag.
> >
> Changed.

+(define_expand "truncv8div8qi2"
+  [(set (match_operand:V8QI 0 "register_operand")
+    (truncate:V8QI
+        (match_operand:V8DI 1 "register_operand")))]
+  "TARGET_AVX512F && TARGET_MMX_WITH_SSE"
+{
+  operands[0] = simplify_gen_subreg (V16QImode, operands[0], V8QImode, 0);
+  emit_insn (gen_avx512f_truncatev8div16qi2 (operands[0], operands[1]));
+  DONE;
+})

You left one here.

+/* { dg-final { scan-assembler-times "vpmovqd" 2 { target { ! ia32 } } } } */

Target selector shouldn't be needed here.

The patch is OK with the above changes.

On a related note, it looks that pmov stores are modelled in a wrong
way. For example, this pattern;

(define_insn "*avx512f_<code>v8div16qi2_store"
  [(set (match_operand:V16QI 0 "memory_operand" "=m")
    (vec_concat:V16QI
      (any_truncate:V8QI
        (match_operand:V8DI 1 "register_operand" "v"))
      (vec_select:V8QI
        (match_dup 0)
        (parallel [(const_int 8) (const_int 9)
               (const_int 10) (const_int 11)
               (const_int 12) (const_int 13)
               (const_int 14) (const_int 15)]))))]

models the store in 128bit mode, but according to ISA, it stores in 16bit mode.

Uros.
Hongtao Liu May 22, 2020, 9:52 a.m. UTC | #6
On Fri, May 22, 2020 at 2:41 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Fri, May 22, 2020 at 6:55 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Thu, May 21, 2020 at 7:18 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Thu, May 21, 2020 at 7:35 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > On Wed, May 20, 2020 at 11:43 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > On Wed, May 20, 2020 at 10:35 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > > >
> > > > > > Hi:
> > > > > >   Bootstrap is ok, regression test on i386/x86-64 backend is ok.
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >         PR target/92658
> > > > > >         * config/i386/sse.md
> > > > > >         (trunc<pmov_src_lower><mode>2, truncv32hiv32qi2,
> > > > > >         trunc<ssedoublemodelower><mode>2): New expander.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >         * gcc.target/i386/pr92658-avx512f.c: New test.
> > > > > >         * gcc.target/i386/pr92658-avx512vl.c: Ditto.
> > > > > >         * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
> > > > >
> > > > > There are more conversions to be added. There are:
> > > > >
> > > > > V2DImode to V2QImode, V2HImode, V2SImode
> > > > > V4DImode to V4QImode, V4HImode, V4SImode
> > > > > V8DImode to V8QImode, V8HImode, V8SImode
> > > > >
> > > > > V4SImode to V4QImode, V4HImode
> > > > > V8SImode to V8QImode, V8HImode
> > > > > V16SImode to V16QImode, V16HImode
> > > > >
> > > > > V8HImode to V8QImode
> > > > > V16HImode to V16QImode
> > > > > V32HImode to V32QImode
> > > > >
> > > > All of them are added
> > > >
> > > > Vectorization failure: (Add xfail in testcase for them since they need
> > > > generic part)
> > > > V2DImode to V2QImode, V2HImode
> > > > V4DImode to V4QImode, V4HImode
> > > > V8DImode to V8QImode
> > > >
> > > > V4SImode to V4QImode, V4HImode
> > > > V8SImode to V8QImode
> > > >
> > > > V8HImode to V8QImode
> > > >
> > > > Vectorization success:
> > > > V2DImode to V2SImode (under TARGET_MMX_WITH_SSE)
> > > > V4DImode to V4SImode
> > > > V8DImode to V8HImode, V8SImode
> > > >
> > > > V8SImode to V8HImode
> > > > V16SImode to V16QImode, V16HImode
> > > >
> > > > V32HImode to V32QImode
> > > > V16HImode to V16HImode.
> > > >
> > > >
> > > > > Uros.
> > > >
> > > > Update patch.
> > > > Regression test on i386/x86-64 backend is ok, bootstrap is ok.
> > > >
> > > > gcc/ChangeLog:
> > > >         PR target/92658
> > > >         * config/i386/sse.md
> > > >         (trunc<pmov_src_lower><mode>2): New expander
> > > >         (truncv32hiv32qi2): Ditto.
> > > >         (trunc<ssedoublemodelower><mode>2): Ditto.
> > > >         (trunc<mode><pmov_dst_3>2): Ditto.
> > > >         (trunc<mode><pmov_dst_mode_4>2): Ditto.
> > > >         (truncv2div2si2): Ditto.
> > > >         (truncv8div8qi2): Ditto.
> > > >         (avx512f_<code>v8div16qi2): Renaming
> > > >         from *avx512f_<code>v8div16qi2.
> > > >         (avx512vl_<code>v2div2si): Renaming
> > > >         from *avx512vl_<code>v2div2si2.
> > > >         (avx512vl_<code><mode>v2<ssecakarnum>qi2): Renaming
> > > >         from *avx512vl_<code><mode>v<ssescalarnum>qi2.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >         * gcc.target/i386/pr92658-avx512f.c: New test.
> > > >         * gcc.target/i386/pr92658-avx512vl.c: Ditto.
> > > >         * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
> > >
> > >
> > > +  rtx op = simplify_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0);
> > > +  operands[0] = op ? op : gen_rtx_SUBREG (V16QImode, operands[0], 0);
> > >
> > > You should use simplify_gen_subreg, without null op fixup:
> > >
> > > operands[0] = simplify_gen_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0);
> > >
> > Changed.
> > > +  "TARGET_MMX_WITH_SSE && TARGET_AVX512VL"
> > >
> > > Do you really need TARGET_MMX_WITH_SSE?  Narrow modes are active even
> > > without this flag.
> > >
> > Changed.
>
> +(define_expand "truncv8div8qi2"
> +  [(set (match_operand:V8QI 0 "register_operand")
> +    (truncate:V8QI
> +        (match_operand:V8DI 1 "register_operand")))]
> +  "TARGET_AVX512F && TARGET_MMX_WITH_SSE"
> +{
> +  operands[0] = simplify_gen_subreg (V16QImode, operands[0], V8QImode, 0);
> +  emit_insn (gen_avx512f_truncatev8div16qi2 (operands[0], operands[1]));
> +  DONE;
> +})
>
> You left one here.
>
> +/* { dg-final { scan-assembler-times "vpmovqd" 2 { target { ! ia32 } } } } */
>
> Target selector shouldn't be needed here.
>
> The patch is OK with the above changes.
>
Changed.
> On a related note, it looks that pmov stores are modelled in a wrong
> way. For example, this pattern;
>
> (define_insn "*avx512f_<code>v8div16qi2_store"
>   [(set (match_operand:V16QI 0 "memory_operand" "=m")
>     (vec_concat:V16QI
>       (any_truncate:V8QI
>         (match_operand:V8DI 1 "register_operand" "v"))
>       (vec_select:V8QI
>         (match_dup 0)
>         (parallel [(const_int 8) (const_int 9)
>                (const_int 10) (const_int 11)
>                (const_int 12) (const_int 13)
>                (const_int 14) (const_int 15)]))))]
>
> models the store in 128bit mode, but according to ISA, it stores in 16bit mode.
>
according to ISA, it stores in 64bit mode
vpmovqb xmm1/m64 {k1}{z}, zmm2.

memory_operand is 128bit but upper 64bit is not changed which means it
store only lower 64bits, just same meaning to ISA.

> Uros.
Uros Bizjak May 22, 2020, 11:04 a.m. UTC | #7
On Fri, May 22, 2020 at 11:52 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > On a related note, it looks that pmov stores are modelled in a wrong
> > way. For example, this pattern;
> >
> > (define_insn "*avx512f_<code>v8div16qi2_store"
> >   [(set (match_operand:V16QI 0 "memory_operand" "=m")
> >     (vec_concat:V16QI
> >       (any_truncate:V8QI
> >         (match_operand:V8DI 1 "register_operand" "v"))
> >       (vec_select:V8QI
> >         (match_dup 0)
> >         (parallel [(const_int 8) (const_int 9)
> >                (const_int 10) (const_int 11)
> >                (const_int 12) (const_int 13)
> >                (const_int 14) (const_int 15)]))))]
> >
> > models the store in 128bit mode, but according to ISA, it stores in 16bit mode.
> >
> according to ISA, it stores in 64bit mode
> vpmovqb xmm1/m64 {k1}{z}, zmm2.
>
> memory_operand is 128bit but upper 64bit is not changed which means it
> store only lower 64bits, just same meaning to ISA.

Sorry, I somehow mixed insn patterns. This is the right example:

(define_insn "*avx512vl_<code>v2div2qi2_store"
  [(set (match_operand:V16QI 0 "memory_operand" "=m")
    (vec_concat:V16QI
      (any_truncate:V2QI
          (match_operand:V2DI 1 "register_operand" "v"))
      (vec_select:V14QI
        (match_dup 0)
        (parallel [(const_int 2) (const_int 3)
                   (const_int 4) (const_int 5)
                   (const_int 6) (const_int 7)
                   (const_int 8) (const_int 9)
                   (const_int 10) (const_int 11)
                   (const_int 12) (const_int 13)
                   (const_int 14) (const_int 15)]))))]
  "TARGET_AVX512VL"
  "vpmov<trunsuffix>qb\t{%1, %0|%w0, %1}"
  [(set_attr "type" "ssemov")
   (set_attr "memory" "store")
   (set_attr "prefix" "evex")
   (set_attr "mode" "TI")])

The isa says:

EVEX.128.F3.0F38.W0 32 /r VPMOVQB xmm1/m16 {k1}{z}, xmm2

However, the pattern says that V16QImode is stored to a memory. Due to
this, insn template needs %w modifier for intel dialect, which is the
sign that something is wrong with the pattern.

These conversions should be reimplemented as having
nonimmedate_operand output operand and memory operand should be split
to a separate insn using a pre-reload splitter. Please see how sse4_1
conversions handle their input operands.

Uros.
diff mbox series

Patch

From 3133cde9eb5e187c2e55b74d5c207810717c17a0 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 20 May 2020 15:53:14 +0800
Subject: [PATCH] Add missing vector truncmn2 expanders [PR92658]

2020-0520  Hongtao.liu  <hongtao.liu@intel.com>

gcc/ChangeLog:
	PR target/92658
	* config/i386/sse.md
	(trunc<pmov_src_lower><mode>2, truncv32hiv32qi2,
	trunc<ssedoublemodelower><mode>2): New expander.

gcc/testsuite/ChangeLog:
	* gcc.target/i386/pr92658-avx512f.c: New test.
	* gcc.target/i386/pr92658-avx512vl.c: Ditto.
	* gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
---
 gcc/config/i386/sse.md                        | 18 ++++
 .../gcc.target/i386/pr92658-avx512bw-trunc.c  | 73 +++++++++++++++
 .../gcc.target/i386/pr92658-avx512f.c         | 93 +++++++++++++++++++
 .../gcc.target/i386/pr92658-avx512vl.c        | 37 ++++++++
 4 files changed, 221 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92658-avx512bw-trunc.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92658-avx512f.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9bf4361384a..b0194c62b02 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10491,6 +10491,12 @@ 
 (define_mode_attr pmov_suff_1
   [(V16QI "db") (V16HI "dw") (V8SI "qd") (V8HI "qw")])
 
+(define_expand "trunc<pmov_src_lower><mode>2"
+  [(set (match_operand:PMOV_DST_MODE_1 0 "nonimmediate_operand")
+	(truncate:PMOV_DST_MODE_1
+	  (match_operand:<pmov_src_mode> 1 "register_operand")))]
+  "TARGET_AVX512F")
+
 (define_insn "*avx512f_<code><pmov_src_lower><mode>2"
   [(set (match_operand:PMOV_DST_MODE_1 0 "nonimmediate_operand" "=v,m")
 	(any_truncate:PMOV_DST_MODE_1
@@ -10525,6 +10531,12 @@ 
       (match_operand:<avx512fmaskmode> 2 "register_operand")))]
   "TARGET_AVX512F")
 
+(define_expand "truncv32hiv32qi2"
+  [(set (match_operand:V32QI 0 "nonimmediate_operand")
+	(truncate:V32QI
+	  (match_operand:V32HI 1 "register_operand")))]
+  "TARGET_AVX512BW")
+
 (define_insn "avx512bw_<code>v32hiv32qi2"
   [(set (match_operand:V32QI 0 "nonimmediate_operand" "=v,m")
 	(any_truncate:V32QI
@@ -10564,6 +10576,12 @@ 
 (define_mode_attr pmov_suff_2
   [(V16QI "wb") (V8HI "dw") (V4SI "qd")])
 
+(define_expand "trunc<ssedoublemodelower><mode>2"
+  [(set (match_operand:PMOV_DST_MODE_2 0 "nonimmediate_operand")
+	(truncate:PMOV_DST_MODE_2
+	  (match_operand:<ssedoublemode> 1 "register_operand")))]
+  "TARGET_AVX512VL")
+
 (define_insn "*avx512vl_<code><ssedoublemodelower><mode>2"
   [(set (match_operand:PMOV_DST_MODE_2 0 "nonimmediate_operand" "=v,m")
 	(any_truncate:PMOV_DST_MODE_2
diff --git a/gcc/testsuite/gcc.target/i386/pr92658-avx512bw-trunc.c b/gcc/testsuite/gcc.target/i386/pr92658-avx512bw-trunc.c
new file mode 100644
index 00000000000..b37535aeb38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr92658-avx512bw-trunc.c
@@ -0,0 +1,73 @@ 
+/* PR target/92658 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -mavx512bw -mavx512vl" } */
+
+typedef unsigned char v16qi __attribute__((vector_size (16)));
+typedef unsigned char v32qi __attribute__((vector_size (32)));
+typedef unsigned short v16hi __attribute__((vector_size (32)));
+typedef unsigned short v32hi __attribute__((vector_size (64)));
+
+
+void
+truncwb_512 (v32qi * dst, v32hi * __restrict src)
+{
+  unsigned char tem[8];
+  tem[0] = (*src)[0];
+  tem[1] = (*src)[1];
+  tem[2] = (*src)[2];
+  tem[3] = (*src)[3];
+  tem[4] = (*src)[4];
+  tem[5] = (*src)[5];
+  tem[6] = (*src)[6];
+  tem[7] = (*src)[7];
+  tem[8] = (*src)[8];
+  tem[9] = (*src)[9];
+  tem[10] = (*src)[10];
+  tem[11] = (*src)[11];
+  tem[12] = (*src)[12];
+  tem[13] = (*src)[13];
+  tem[14] = (*src)[14];
+  tem[15] = (*src)[15];
+  tem[16] = (*src)[16];
+  tem[17] = (*src)[17];
+  tem[18] = (*src)[18];
+  tem[19] = (*src)[19];
+  tem[20] = (*src)[20];
+  tem[21] = (*src)[21];
+  tem[22] = (*src)[22];
+  tem[23] = (*src)[23];
+  tem[24] = (*src)[24];
+  tem[25] = (*src)[25];
+  tem[26] = (*src)[26];
+  tem[27] = (*src)[27];
+  tem[28] = (*src)[28];
+  tem[29] = (*src)[29];
+  tem[30] = (*src)[30];
+  tem[31] = (*src)[31];
+  dst[0] = *(v32qi *) tem;
+}
+
+void
+truncwb_256 (v16qi * dst, v16hi * __restrict src)
+{
+  unsigned char tem[8];
+  tem[0] = (*src)[0];
+  tem[1] = (*src)[1];
+  tem[2] = (*src)[2];
+  tem[3] = (*src)[3];
+  tem[4] = (*src)[4];
+  tem[5] = (*src)[5];
+  tem[6] = (*src)[6];
+  tem[7] = (*src)[7];
+  tem[8] = (*src)[8];
+  tem[9] = (*src)[9];
+  tem[10] = (*src)[10];
+  tem[11] = (*src)[11];
+  tem[12] = (*src)[12];
+  tem[13] = (*src)[13];
+  tem[14] = (*src)[14];
+  tem[15] = (*src)[15];
+  dst[0] = *(v16qi *) tem;
+}
+
+/* { dg-final { scan-assembler-times "vpmovwb" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr92658-avx512f.c b/gcc/testsuite/gcc.target/i386/pr92658-avx512f.c
new file mode 100644
index 00000000000..c59eebfd550
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr92658-avx512f.c
@@ -0,0 +1,93 @@ 
+/* PR target/92658 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -mavx512f" } */
+
+typedef unsigned char v8qi __attribute__((vector_size (8)));
+typedef unsigned char v16qi __attribute__((vector_size (16)));
+typedef unsigned short v8hi __attribute__((vector_size (16)));
+typedef unsigned short v16hi __attribute__((vector_size (32)));
+typedef unsigned int v8si __attribute__((vector_size (32)));
+typedef unsigned int v16si __attribute__((vector_size (64)));
+typedef unsigned long long v8di __attribute__((vector_size (64)));
+
+void
+truncqd (v8si * dst, v8di * __restrict src)
+{
+  unsigned tem[8];
+  tem[0] = (*src)[0];
+  tem[1] = (*src)[1];
+  tem[2] = (*src)[2];
+  tem[3] = (*src)[3];
+  tem[4] = (*src)[4];
+  tem[5] = (*src)[5];
+  tem[6] = (*src)[6];
+  tem[7] = (*src)[7];
+  dst[0] = *(v8si *) tem;
+}
+
+void
+truncqw (v8hi * dst, v8di * __restrict src)
+{
+  unsigned short tem[8];
+  tem[0] = (*src)[0];
+  tem[1] = (*src)[1];
+  tem[2] = (*src)[2];
+  tem[3] = (*src)[3];
+  tem[4] = (*src)[4];
+  tem[5] = (*src)[5];
+  tem[6] = (*src)[6];
+  tem[7] = (*src)[7];
+  dst[0] = *(v8hi *) tem;
+}
+
+void
+truncdw (v16hi * dst, v16si * __restrict src)
+{
+  unsigned short tem[8];
+  tem[0] = (*src)[0];
+  tem[1] = (*src)[1];
+  tem[2] = (*src)[2];
+  tem[3] = (*src)[3];
+  tem[4] = (*src)[4];
+  tem[5] = (*src)[5];
+  tem[6] = (*src)[6];
+  tem[7] = (*src)[7];
+  tem[8] = (*src)[8];
+  tem[9] = (*src)[9];
+  tem[10] = (*src)[10];
+  tem[11] = (*src)[11];
+  tem[12] = (*src)[12];
+  tem[13] = (*src)[13];
+  tem[14] = (*src)[14];
+  tem[15] = (*src)[15];
+  dst[0] = *(v16hi *) tem;
+}
+
+
+void
+truncdb (v16qi * dst, v16si * __restrict src)
+{
+  unsigned char tem[8];
+  tem[0] = (*src)[0];
+  tem[1] = (*src)[1];
+  tem[2] = (*src)[2];
+  tem[3] = (*src)[3];
+  tem[4] = (*src)[4];
+  tem[5] = (*src)[5];
+  tem[6] = (*src)[6];
+  tem[7] = (*src)[7];
+  tem[8] = (*src)[8];
+  tem[9] = (*src)[9];
+  tem[10] = (*src)[10];
+  tem[11] = (*src)[11];
+  tem[12] = (*src)[12];
+  tem[13] = (*src)[13];
+  tem[14] = (*src)[14];
+  tem[15] = (*src)[15];
+  dst[0] = *(v16qi *) tem;
+}
+
+/* { dg-final { scan-assembler-times "vpmovqd" 1 } } */
+/* { dg-final { scan-assembler-times "vpmovqw" 1 } } */
+/* { dg-final { scan-assembler-times "vpmovdw" 1 } } */
+/* { dg-final { scan-assembler-times "vpmovdb" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c b/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c
new file mode 100644
index 00000000000..c6f748ca458
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c
@@ -0,0 +1,37 @@ 
+/* PR target/92658 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -mavx512f -mavx512vl" } */
+
+typedef unsigned short v8hi __attribute__((vector_size (16)));
+typedef unsigned int v4si __attribute__((vector_size (16)));
+typedef unsigned int v8si __attribute__((vector_size (32)));
+typedef unsigned long long v4di __attribute__((vector_size (32)));
+
+void
+truncqd_256 (v4si * dst, v4di * __restrict src)
+{
+  unsigned tem[4];
+  tem[0] = (*src)[0];
+  tem[1] = (*src)[1];
+  tem[2] = (*src)[2];
+  tem[3] = (*src)[3];
+  dst[0] = *(v4si *) tem;
+}
+
+void
+truncdw_256 (v8hi * dst, v8si * __restrict src)
+{
+  unsigned short tem[8];
+  tem[0] = (*src)[0];
+  tem[1] = (*src)[1];
+  tem[2] = (*src)[2];
+  tem[3] = (*src)[3];
+  tem[4] = (*src)[4];
+  tem[5] = (*src)[5];
+  tem[6] = (*src)[6];
+  tem[7] = (*src)[7];
+  dst[0] = *(v8hi *) tem;
+}
+
+/* { dg-final { scan-assembler-times "vpmovqd" 1 } } */
+/* { dg-final { scan-assembler-times "vpmovdw" 1 } } */
-- 
2.18.1