Message ID | CAMZc-bxFhcDaV2QH4Cny3XX=o2zmMx-m+C7TwTYFq3u27-Ke_g@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | [PR92658] Add missing vector truncmn2 expanders for avx512f | expand |
On Wed, May 20, 2020 at 10:35 AM Hongtao Liu <crazylht@gmail.com> wrote: > > Hi: > Bootstrap is ok, regression test on i386/x86-64 backend is ok. > > gcc/ChangeLog: > PR target/92658 > * config/i386/sse.md > (trunc<pmov_src_lower><mode>2, truncv32hiv32qi2, > trunc<ssedoublemodelower><mode>2): New expander. > > gcc/testsuite/ChangeLog: > * gcc.target/i386/pr92658-avx512f.c: New test. > * gcc.target/i386/pr92658-avx512vl.c: Ditto. > * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto. There are more conversions to be added. There are: V2DImode to V2QImode, V2HImode, V2SImode V4DImode to V4QImode, V4HImode, V4SImode V8DImode to V8QImode, V8HImode, V8SImode V4SImode to V4QImode, V4HImode V8SImode to V8QImode, V8HImode V16SImode to V16QImode, V16HImode V8HImode to V8QImode V16HImode to V16QImode V32HImode to V32QImode Uros.
On Wed, May 20, 2020 at 11:43 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Wed, May 20, 2020 at 10:35 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > Hi: > > Bootstrap is ok, regression test on i386/x86-64 backend is ok. > > > > gcc/ChangeLog: > > PR target/92658 > > * config/i386/sse.md > > (trunc<pmov_src_lower><mode>2, truncv32hiv32qi2, > > trunc<ssedoublemodelower><mode>2): New expander. > > > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr92658-avx512f.c: New test. > > * gcc.target/i386/pr92658-avx512vl.c: Ditto. > > * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto. > > There are more conversions to be added. There are: > > V2DImode to V2QImode, V2HImode, V2SImode > V4DImode to V4QImode, V4HImode, V4SImode > V8DImode to V8QImode, V8HImode, V8SImode > > V4SImode to V4QImode, V4HImode > V8SImode to V8QImode, V8HImode > V16SImode to V16QImode, V16HImode > > V8HImode to V8QImode > V16HImode to V16QImode > V32HImode to V32QImode > All of them are added Vectorization failure: (Add xfail in testcase for them since they need generic part) V2DImode to V2QImode, V2HImode V4DImode to V4QImode, V4HImode V8DImode to V8QImode V4SImode to V4QImode, V4HImode V8SImode to V8QImode V8HImode to V8QImode Vectorization success: V2DImode to V2SImode (under TARGET_MMX_WITH_SSE) V4DImode to V4SImode V8DImode to V8HImode, V8SImode V8SImode to V8HImode V16SImode to V16QImode, V16HImode V32HImode to V32QImode V16HImode to V16HImode. > Uros. Update patch. Regression test on i386/x86-64 backend is ok, bootstrap is ok. gcc/ChangeLog: PR target/92658 * config/i386/sse.md (trunc<pmov_src_lower><mode>2): New expander (truncv32hiv32qi2): Ditto. (trunc<ssedoublemodelower><mode>2): Ditto. (trunc<mode><pmov_dst_3>2): Ditto. (trunc<mode><pmov_dst_mode_4>2): Ditto. (truncv2div2si2): Ditto. (truncv8div8qi2): Ditto. (avx512f_<code>v8div16qi2): Renaming from *avx512f_<code>v8div16qi2. (avx512vl_<code>v2div2si): Renaming from *avx512vl_<code>v2div2si2. (avx512vl_<code><mode>v2<ssecakarnum>qi2): Renaming from *avx512vl_<code><mode>v<ssescalarnum>qi2. gcc/testsuite/ChangeLog: * gcc.target/i386/pr92658-avx512f.c: New test. * gcc.target/i386/pr92658-avx512vl.c: Ditto. * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
On Thu, May 21, 2020 at 7:35 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Wed, May 20, 2020 at 11:43 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > On Wed, May 20, 2020 at 10:35 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > Hi: > > > Bootstrap is ok, regression test on i386/x86-64 backend is ok. > > > > > > gcc/ChangeLog: > > > PR target/92658 > > > * config/i386/sse.md > > > (trunc<pmov_src_lower><mode>2, truncv32hiv32qi2, > > > trunc<ssedoublemodelower><mode>2): New expander. > > > > > > gcc/testsuite/ChangeLog: > > > * gcc.target/i386/pr92658-avx512f.c: New test. > > > * gcc.target/i386/pr92658-avx512vl.c: Ditto. > > > * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto. > > > > There are more conversions to be added. There are: > > > > V2DImode to V2QImode, V2HImode, V2SImode > > V4DImode to V4QImode, V4HImode, V4SImode > > V8DImode to V8QImode, V8HImode, V8SImode > > > > V4SImode to V4QImode, V4HImode > > V8SImode to V8QImode, V8HImode > > V16SImode to V16QImode, V16HImode > > > > V8HImode to V8QImode > > V16HImode to V16QImode > > V32HImode to V32QImode > > > All of them are added > > Vectorization failure: (Add xfail in testcase for them since they need > generic part) > V2DImode to V2QImode, V2HImode > V4DImode to V4QImode, V4HImode > V8DImode to V8QImode > > V4SImode to V4QImode, V4HImode > V8SImode to V8QImode > > V8HImode to V8QImode > > Vectorization success: > V2DImode to V2SImode (under TARGET_MMX_WITH_SSE) > V4DImode to V4SImode > V8DImode to V8HImode, V8SImode > > V8SImode to V8HImode > V16SImode to V16QImode, V16HImode > > V32HImode to V32QImode > V16HImode to V16HImode. > > > > Uros. > > Update patch. > Regression test on i386/x86-64 backend is ok, bootstrap is ok. > > gcc/ChangeLog: > PR target/92658 > * config/i386/sse.md > (trunc<pmov_src_lower><mode>2): New expander > (truncv32hiv32qi2): Ditto. > (trunc<ssedoublemodelower><mode>2): Ditto. > (trunc<mode><pmov_dst_3>2): Ditto. > (trunc<mode><pmov_dst_mode_4>2): Ditto. > (truncv2div2si2): Ditto. > (truncv8div8qi2): Ditto. > (avx512f_<code>v8div16qi2): Renaming > from *avx512f_<code>v8div16qi2. > (avx512vl_<code>v2div2si): Renaming > from *avx512vl_<code>v2div2si2. > (avx512vl_<code><mode>v2<ssecakarnum>qi2): Renaming > from *avx512vl_<code><mode>v<ssescalarnum>qi2. > > gcc/testsuite/ChangeLog: > * gcc.target/i386/pr92658-avx512f.c: New test. > * gcc.target/i386/pr92658-avx512vl.c: Ditto. > * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto. + rtx op = simplify_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0); + operands[0] = op ? op : gen_rtx_SUBREG (V16QImode, operands[0], 0); You should use simplify_gen_subreg, without null op fixup: operands[0] = simplify_gen_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0); + "TARGET_MMX_WITH_SSE && TARGET_AVX512VL" Do you really need TARGET_MMX_WITH_SSE? Narrow modes are active even without this flag. Uros.
On Thu, May 21, 2020 at 7:18 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Thu, May 21, 2020 at 7:35 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > On Wed, May 20, 2020 at 11:43 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > On Wed, May 20, 2020 at 10:35 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > Hi: > > > > Bootstrap is ok, regression test on i386/x86-64 backend is ok. > > > > > > > > gcc/ChangeLog: > > > > PR target/92658 > > > > * config/i386/sse.md > > > > (trunc<pmov_src_lower><mode>2, truncv32hiv32qi2, > > > > trunc<ssedoublemodelower><mode>2): New expander. > > > > > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr92658-avx512f.c: New test. > > > > * gcc.target/i386/pr92658-avx512vl.c: Ditto. > > > > * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto. > > > > > > There are more conversions to be added. There are: > > > > > > V2DImode to V2QImode, V2HImode, V2SImode > > > V4DImode to V4QImode, V4HImode, V4SImode > > > V8DImode to V8QImode, V8HImode, V8SImode > > > > > > V4SImode to V4QImode, V4HImode > > > V8SImode to V8QImode, V8HImode > > > V16SImode to V16QImode, V16HImode > > > > > > V8HImode to V8QImode > > > V16HImode to V16QImode > > > V32HImode to V32QImode > > > > > All of them are added > > > > Vectorization failure: (Add xfail in testcase for them since they need > > generic part) > > V2DImode to V2QImode, V2HImode > > V4DImode to V4QImode, V4HImode > > V8DImode to V8QImode > > > > V4SImode to V4QImode, V4HImode > > V8SImode to V8QImode > > > > V8HImode to V8QImode > > > > Vectorization success: > > V2DImode to V2SImode (under TARGET_MMX_WITH_SSE) > > V4DImode to V4SImode > > V8DImode to V8HImode, V8SImode > > > > V8SImode to V8HImode > > V16SImode to V16QImode, V16HImode > > > > V32HImode to V32QImode > > V16HImode to V16HImode. > > > > > > > Uros. > > > > Update patch. > > Regression test on i386/x86-64 backend is ok, bootstrap is ok. > > > > gcc/ChangeLog: > > PR target/92658 > > * config/i386/sse.md > > (trunc<pmov_src_lower><mode>2): New expander > > (truncv32hiv32qi2): Ditto. > > (trunc<ssedoublemodelower><mode>2): Ditto. > > (trunc<mode><pmov_dst_3>2): Ditto. > > (trunc<mode><pmov_dst_mode_4>2): Ditto. > > (truncv2div2si2): Ditto. > > (truncv8div8qi2): Ditto. > > (avx512f_<code>v8div16qi2): Renaming > > from *avx512f_<code>v8div16qi2. > > (avx512vl_<code>v2div2si): Renaming > > from *avx512vl_<code>v2div2si2. > > (avx512vl_<code><mode>v2<ssecakarnum>qi2): Renaming > > from *avx512vl_<code><mode>v<ssescalarnum>qi2. > > > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr92658-avx512f.c: New test. > > * gcc.target/i386/pr92658-avx512vl.c: Ditto. > > * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto. > > > + rtx op = simplify_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0); > + operands[0] = op ? op : gen_rtx_SUBREG (V16QImode, operands[0], 0); > > You should use simplify_gen_subreg, without null op fixup: > > operands[0] = simplify_gen_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0); > Changed. > + "TARGET_MMX_WITH_SSE && TARGET_AVX512VL" > > Do you really need TARGET_MMX_WITH_SSE? Narrow modes are active even > without this flag. > Changed. > Uros. Update patch.
On Fri, May 22, 2020 at 6:55 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Thu, May 21, 2020 at 7:18 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > On Thu, May 21, 2020 at 7:35 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > On Wed, May 20, 2020 at 11:43 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > On Wed, May 20, 2020 at 10:35 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > > > Hi: > > > > > Bootstrap is ok, regression test on i386/x86-64 backend is ok. > > > > > > > > > > gcc/ChangeLog: > > > > > PR target/92658 > > > > > * config/i386/sse.md > > > > > (trunc<pmov_src_lower><mode>2, truncv32hiv32qi2, > > > > > trunc<ssedoublemodelower><mode>2): New expander. > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > * gcc.target/i386/pr92658-avx512f.c: New test. > > > > > * gcc.target/i386/pr92658-avx512vl.c: Ditto. > > > > > * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto. > > > > > > > > There are more conversions to be added. There are: > > > > > > > > V2DImode to V2QImode, V2HImode, V2SImode > > > > V4DImode to V4QImode, V4HImode, V4SImode > > > > V8DImode to V8QImode, V8HImode, V8SImode > > > > > > > > V4SImode to V4QImode, V4HImode > > > > V8SImode to V8QImode, V8HImode > > > > V16SImode to V16QImode, V16HImode > > > > > > > > V8HImode to V8QImode > > > > V16HImode to V16QImode > > > > V32HImode to V32QImode > > > > > > > All of them are added > > > > > > Vectorization failure: (Add xfail in testcase for them since they need > > > generic part) > > > V2DImode to V2QImode, V2HImode > > > V4DImode to V4QImode, V4HImode > > > V8DImode to V8QImode > > > > > > V4SImode to V4QImode, V4HImode > > > V8SImode to V8QImode > > > > > > V8HImode to V8QImode > > > > > > Vectorization success: > > > V2DImode to V2SImode (under TARGET_MMX_WITH_SSE) > > > V4DImode to V4SImode > > > V8DImode to V8HImode, V8SImode > > > > > > V8SImode to V8HImode > > > V16SImode to V16QImode, V16HImode > > > > > > V32HImode to V32QImode > > > V16HImode to V16HImode. > > > > > > > > > > Uros. > > > > > > Update patch. > > > Regression test on i386/x86-64 backend is ok, bootstrap is ok. > > > > > > gcc/ChangeLog: > > > PR target/92658 > > > * config/i386/sse.md > > > (trunc<pmov_src_lower><mode>2): New expander > > > (truncv32hiv32qi2): Ditto. > > > (trunc<ssedoublemodelower><mode>2): Ditto. > > > (trunc<mode><pmov_dst_3>2): Ditto. > > > (trunc<mode><pmov_dst_mode_4>2): Ditto. > > > (truncv2div2si2): Ditto. > > > (truncv8div8qi2): Ditto. > > > (avx512f_<code>v8div16qi2): Renaming > > > from *avx512f_<code>v8div16qi2. > > > (avx512vl_<code>v2div2si): Renaming > > > from *avx512vl_<code>v2div2si2. > > > (avx512vl_<code><mode>v2<ssecakarnum>qi2): Renaming > > > from *avx512vl_<code><mode>v<ssescalarnum>qi2. > > > > > > gcc/testsuite/ChangeLog: > > > * gcc.target/i386/pr92658-avx512f.c: New test. > > > * gcc.target/i386/pr92658-avx512vl.c: Ditto. > > > * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto. > > > > > > + rtx op = simplify_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0); > > + operands[0] = op ? op : gen_rtx_SUBREG (V16QImode, operands[0], 0); > > > > You should use simplify_gen_subreg, without null op fixup: > > > > operands[0] = simplify_gen_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0); > > > Changed. > > + "TARGET_MMX_WITH_SSE && TARGET_AVX512VL" > > > > Do you really need TARGET_MMX_WITH_SSE? Narrow modes are active even > > without this flag. > > > Changed. +(define_expand "truncv8div8qi2" + [(set (match_operand:V8QI 0 "register_operand") + (truncate:V8QI + (match_operand:V8DI 1 "register_operand")))] + "TARGET_AVX512F && TARGET_MMX_WITH_SSE" +{ + operands[0] = simplify_gen_subreg (V16QImode, operands[0], V8QImode, 0); + emit_insn (gen_avx512f_truncatev8div16qi2 (operands[0], operands[1])); + DONE; +}) You left one here. +/* { dg-final { scan-assembler-times "vpmovqd" 2 { target { ! ia32 } } } } */ Target selector shouldn't be needed here. The patch is OK with the above changes. On a related note, it looks that pmov stores are modelled in a wrong way. For example, this pattern; (define_insn "*avx512f_<code>v8div16qi2_store" [(set (match_operand:V16QI 0 "memory_operand" "=m") (vec_concat:V16QI (any_truncate:V8QI (match_operand:V8DI 1 "register_operand" "v")) (vec_select:V8QI (match_dup 0) (parallel [(const_int 8) (const_int 9) (const_int 10) (const_int 11) (const_int 12) (const_int 13) (const_int 14) (const_int 15)]))))] models the store in 128bit mode, but according to ISA, it stores in 16bit mode. Uros.
On Fri, May 22, 2020 at 2:41 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Fri, May 22, 2020 at 6:55 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > On Thu, May 21, 2020 at 7:18 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > On Thu, May 21, 2020 at 7:35 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > On Wed, May 20, 2020 at 11:43 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > > > On Wed, May 20, 2020 at 10:35 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > > > > > Hi: > > > > > > Bootstrap is ok, regression test on i386/x86-64 backend is ok. > > > > > > > > > > > > gcc/ChangeLog: > > > > > > PR target/92658 > > > > > > * config/i386/sse.md > > > > > > (trunc<pmov_src_lower><mode>2, truncv32hiv32qi2, > > > > > > trunc<ssedoublemodelower><mode>2): New expander. > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/i386/pr92658-avx512f.c: New test. > > > > > > * gcc.target/i386/pr92658-avx512vl.c: Ditto. > > > > > > * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto. > > > > > > > > > > There are more conversions to be added. There are: > > > > > > > > > > V2DImode to V2QImode, V2HImode, V2SImode > > > > > V4DImode to V4QImode, V4HImode, V4SImode > > > > > V8DImode to V8QImode, V8HImode, V8SImode > > > > > > > > > > V4SImode to V4QImode, V4HImode > > > > > V8SImode to V8QImode, V8HImode > > > > > V16SImode to V16QImode, V16HImode > > > > > > > > > > V8HImode to V8QImode > > > > > V16HImode to V16QImode > > > > > V32HImode to V32QImode > > > > > > > > > All of them are added > > > > > > > > Vectorization failure: (Add xfail in testcase for them since they need > > > > generic part) > > > > V2DImode to V2QImode, V2HImode > > > > V4DImode to V4QImode, V4HImode > > > > V8DImode to V8QImode > > > > > > > > V4SImode to V4QImode, V4HImode > > > > V8SImode to V8QImode > > > > > > > > V8HImode to V8QImode > > > > > > > > Vectorization success: > > > > V2DImode to V2SImode (under TARGET_MMX_WITH_SSE) > > > > V4DImode to V4SImode > > > > V8DImode to V8HImode, V8SImode > > > > > > > > V8SImode to V8HImode > > > > V16SImode to V16QImode, V16HImode > > > > > > > > V32HImode to V32QImode > > > > V16HImode to V16HImode. > > > > > > > > > > > > > Uros. > > > > > > > > Update patch. > > > > Regression test on i386/x86-64 backend is ok, bootstrap is ok. > > > > > > > > gcc/ChangeLog: > > > > PR target/92658 > > > > * config/i386/sse.md > > > > (trunc<pmov_src_lower><mode>2): New expander > > > > (truncv32hiv32qi2): Ditto. > > > > (trunc<ssedoublemodelower><mode>2): Ditto. > > > > (trunc<mode><pmov_dst_3>2): Ditto. > > > > (trunc<mode><pmov_dst_mode_4>2): Ditto. > > > > (truncv2div2si2): Ditto. > > > > (truncv8div8qi2): Ditto. > > > > (avx512f_<code>v8div16qi2): Renaming > > > > from *avx512f_<code>v8div16qi2. > > > > (avx512vl_<code>v2div2si): Renaming > > > > from *avx512vl_<code>v2div2si2. > > > > (avx512vl_<code><mode>v2<ssecakarnum>qi2): Renaming > > > > from *avx512vl_<code><mode>v<ssescalarnum>qi2. > > > > > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr92658-avx512f.c: New test. > > > > * gcc.target/i386/pr92658-avx512vl.c: Ditto. > > > > * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto. > > > > > > > > > + rtx op = simplify_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0); > > > + operands[0] = op ? op : gen_rtx_SUBREG (V16QImode, operands[0], 0); > > > > > > You should use simplify_gen_subreg, without null op fixup: > > > > > > operands[0] = simplify_gen_subreg (V16QImode, operands[0], <pmov_dst_3>mode, 0); > > > > > Changed. > > > + "TARGET_MMX_WITH_SSE && TARGET_AVX512VL" > > > > > > Do you really need TARGET_MMX_WITH_SSE? Narrow modes are active even > > > without this flag. > > > > > Changed. > > +(define_expand "truncv8div8qi2" > + [(set (match_operand:V8QI 0 "register_operand") > + (truncate:V8QI > + (match_operand:V8DI 1 "register_operand")))] > + "TARGET_AVX512F && TARGET_MMX_WITH_SSE" > +{ > + operands[0] = simplify_gen_subreg (V16QImode, operands[0], V8QImode, 0); > + emit_insn (gen_avx512f_truncatev8div16qi2 (operands[0], operands[1])); > + DONE; > +}) > > You left one here. > > +/* { dg-final { scan-assembler-times "vpmovqd" 2 { target { ! ia32 } } } } */ > > Target selector shouldn't be needed here. > > The patch is OK with the above changes. > Changed. > On a related note, it looks that pmov stores are modelled in a wrong > way. For example, this pattern; > > (define_insn "*avx512f_<code>v8div16qi2_store" > [(set (match_operand:V16QI 0 "memory_operand" "=m") > (vec_concat:V16QI > (any_truncate:V8QI > (match_operand:V8DI 1 "register_operand" "v")) > (vec_select:V8QI > (match_dup 0) > (parallel [(const_int 8) (const_int 9) > (const_int 10) (const_int 11) > (const_int 12) (const_int 13) > (const_int 14) (const_int 15)]))))] > > models the store in 128bit mode, but according to ISA, it stores in 16bit mode. > according to ISA, it stores in 64bit mode vpmovqb xmm1/m64 {k1}{z}, zmm2. memory_operand is 128bit but upper 64bit is not changed which means it store only lower 64bits, just same meaning to ISA. > Uros.
On Fri, May 22, 2020 at 11:52 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On a related note, it looks that pmov stores are modelled in a wrong > > way. For example, this pattern; > > > > (define_insn "*avx512f_<code>v8div16qi2_store" > > [(set (match_operand:V16QI 0 "memory_operand" "=m") > > (vec_concat:V16QI > > (any_truncate:V8QI > > (match_operand:V8DI 1 "register_operand" "v")) > > (vec_select:V8QI > > (match_dup 0) > > (parallel [(const_int 8) (const_int 9) > > (const_int 10) (const_int 11) > > (const_int 12) (const_int 13) > > (const_int 14) (const_int 15)]))))] > > > > models the store in 128bit mode, but according to ISA, it stores in 16bit mode. > > > according to ISA, it stores in 64bit mode > vpmovqb xmm1/m64 {k1}{z}, zmm2. > > memory_operand is 128bit but upper 64bit is not changed which means it > store only lower 64bits, just same meaning to ISA. Sorry, I somehow mixed insn patterns. This is the right example: (define_insn "*avx512vl_<code>v2div2qi2_store" [(set (match_operand:V16QI 0 "memory_operand" "=m") (vec_concat:V16QI (any_truncate:V2QI (match_operand:V2DI 1 "register_operand" "v")) (vec_select:V14QI (match_dup 0) (parallel [(const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7) (const_int 8) (const_int 9) (const_int 10) (const_int 11) (const_int 12) (const_int 13) (const_int 14) (const_int 15)]))))] "TARGET_AVX512VL" "vpmov<trunsuffix>qb\t{%1, %0|%w0, %1}" [(set_attr "type" "ssemov") (set_attr "memory" "store") (set_attr "prefix" "evex") (set_attr "mode" "TI")]) The isa says: EVEX.128.F3.0F38.W0 32 /r VPMOVQB xmm1/m16 {k1}{z}, xmm2 However, the pattern says that V16QImode is stored to a memory. Due to this, insn template needs %w modifier for intel dialect, which is the sign that something is wrong with the pattern. These conversions should be reimplemented as having nonimmedate_operand output operand and memory operand should be split to a separate insn using a pre-reload splitter. Please see how sse4_1 conversions handle their input operands. Uros.
From 3133cde9eb5e187c2e55b74d5c207810717c17a0 Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao.liu@intel.com> Date: Wed, 20 May 2020 15:53:14 +0800 Subject: [PATCH] Add missing vector truncmn2 expanders [PR92658] 2020-0520 Hongtao.liu <hongtao.liu@intel.com> gcc/ChangeLog: PR target/92658 * config/i386/sse.md (trunc<pmov_src_lower><mode>2, truncv32hiv32qi2, trunc<ssedoublemodelower><mode>2): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/pr92658-avx512f.c: New test. * gcc.target/i386/pr92658-avx512vl.c: Ditto. * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto. --- gcc/config/i386/sse.md | 18 ++++ .../gcc.target/i386/pr92658-avx512bw-trunc.c | 73 +++++++++++++++ .../gcc.target/i386/pr92658-avx512f.c | 93 +++++++++++++++++++ .../gcc.target/i386/pr92658-avx512vl.c | 37 ++++++++ 4 files changed, 221 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr92658-avx512bw-trunc.c create mode 100644 gcc/testsuite/gcc.target/i386/pr92658-avx512f.c create mode 100644 gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 9bf4361384a..b0194c62b02 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -10491,6 +10491,12 @@ (define_mode_attr pmov_suff_1 [(V16QI "db") (V16HI "dw") (V8SI "qd") (V8HI "qw")]) +(define_expand "trunc<pmov_src_lower><mode>2" + [(set (match_operand:PMOV_DST_MODE_1 0 "nonimmediate_operand") + (truncate:PMOV_DST_MODE_1 + (match_operand:<pmov_src_mode> 1 "register_operand")))] + "TARGET_AVX512F") + (define_insn "*avx512f_<code><pmov_src_lower><mode>2" [(set (match_operand:PMOV_DST_MODE_1 0 "nonimmediate_operand" "=v,m") (any_truncate:PMOV_DST_MODE_1 @@ -10525,6 +10531,12 @@ (match_operand:<avx512fmaskmode> 2 "register_operand")))] "TARGET_AVX512F") +(define_expand "truncv32hiv32qi2" + [(set (match_operand:V32QI 0 "nonimmediate_operand") + (truncate:V32QI + (match_operand:V32HI 1 "register_operand")))] + "TARGET_AVX512BW") + (define_insn "avx512bw_<code>v32hiv32qi2" [(set (match_operand:V32QI 0 "nonimmediate_operand" "=v,m") (any_truncate:V32QI @@ -10564,6 +10576,12 @@ (define_mode_attr pmov_suff_2 [(V16QI "wb") (V8HI "dw") (V4SI "qd")]) +(define_expand "trunc<ssedoublemodelower><mode>2" + [(set (match_operand:PMOV_DST_MODE_2 0 "nonimmediate_operand") + (truncate:PMOV_DST_MODE_2 + (match_operand:<ssedoublemode> 1 "register_operand")))] + "TARGET_AVX512VL") + (define_insn "*avx512vl_<code><ssedoublemodelower><mode>2" [(set (match_operand:PMOV_DST_MODE_2 0 "nonimmediate_operand" "=v,m") (any_truncate:PMOV_DST_MODE_2 diff --git a/gcc/testsuite/gcc.target/i386/pr92658-avx512bw-trunc.c b/gcc/testsuite/gcc.target/i386/pr92658-avx512bw-trunc.c new file mode 100644 index 00000000000..b37535aeb38 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr92658-avx512bw-trunc.c @@ -0,0 +1,73 @@ +/* PR target/92658 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -mavx512bw -mavx512vl" } */ + +typedef unsigned char v16qi __attribute__((vector_size (16))); +typedef unsigned char v32qi __attribute__((vector_size (32))); +typedef unsigned short v16hi __attribute__((vector_size (32))); +typedef unsigned short v32hi __attribute__((vector_size (64))); + + +void +truncwb_512 (v32qi * dst, v32hi * __restrict src) +{ + unsigned char tem[8]; + tem[0] = (*src)[0]; + tem[1] = (*src)[1]; + tem[2] = (*src)[2]; + tem[3] = (*src)[3]; + tem[4] = (*src)[4]; + tem[5] = (*src)[5]; + tem[6] = (*src)[6]; + tem[7] = (*src)[7]; + tem[8] = (*src)[8]; + tem[9] = (*src)[9]; + tem[10] = (*src)[10]; + tem[11] = (*src)[11]; + tem[12] = (*src)[12]; + tem[13] = (*src)[13]; + tem[14] = (*src)[14]; + tem[15] = (*src)[15]; + tem[16] = (*src)[16]; + tem[17] = (*src)[17]; + tem[18] = (*src)[18]; + tem[19] = (*src)[19]; + tem[20] = (*src)[20]; + tem[21] = (*src)[21]; + tem[22] = (*src)[22]; + tem[23] = (*src)[23]; + tem[24] = (*src)[24]; + tem[25] = (*src)[25]; + tem[26] = (*src)[26]; + tem[27] = (*src)[27]; + tem[28] = (*src)[28]; + tem[29] = (*src)[29]; + tem[30] = (*src)[30]; + tem[31] = (*src)[31]; + dst[0] = *(v32qi *) tem; +} + +void +truncwb_256 (v16qi * dst, v16hi * __restrict src) +{ + unsigned char tem[8]; + tem[0] = (*src)[0]; + tem[1] = (*src)[1]; + tem[2] = (*src)[2]; + tem[3] = (*src)[3]; + tem[4] = (*src)[4]; + tem[5] = (*src)[5]; + tem[6] = (*src)[6]; + tem[7] = (*src)[7]; + tem[8] = (*src)[8]; + tem[9] = (*src)[9]; + tem[10] = (*src)[10]; + tem[11] = (*src)[11]; + tem[12] = (*src)[12]; + tem[13] = (*src)[13]; + tem[14] = (*src)[14]; + tem[15] = (*src)[15]; + dst[0] = *(v16qi *) tem; +} + +/* { dg-final { scan-assembler-times "vpmovwb" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr92658-avx512f.c b/gcc/testsuite/gcc.target/i386/pr92658-avx512f.c new file mode 100644 index 00000000000..c59eebfd550 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr92658-avx512f.c @@ -0,0 +1,93 @@ +/* PR target/92658 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -mavx512f" } */ + +typedef unsigned char v8qi __attribute__((vector_size (8))); +typedef unsigned char v16qi __attribute__((vector_size (16))); +typedef unsigned short v8hi __attribute__((vector_size (16))); +typedef unsigned short v16hi __attribute__((vector_size (32))); +typedef unsigned int v8si __attribute__((vector_size (32))); +typedef unsigned int v16si __attribute__((vector_size (64))); +typedef unsigned long long v8di __attribute__((vector_size (64))); + +void +truncqd (v8si * dst, v8di * __restrict src) +{ + unsigned tem[8]; + tem[0] = (*src)[0]; + tem[1] = (*src)[1]; + tem[2] = (*src)[2]; + tem[3] = (*src)[3]; + tem[4] = (*src)[4]; + tem[5] = (*src)[5]; + tem[6] = (*src)[6]; + tem[7] = (*src)[7]; + dst[0] = *(v8si *) tem; +} + +void +truncqw (v8hi * dst, v8di * __restrict src) +{ + unsigned short tem[8]; + tem[0] = (*src)[0]; + tem[1] = (*src)[1]; + tem[2] = (*src)[2]; + tem[3] = (*src)[3]; + tem[4] = (*src)[4]; + tem[5] = (*src)[5]; + tem[6] = (*src)[6]; + tem[7] = (*src)[7]; + dst[0] = *(v8hi *) tem; +} + +void +truncdw (v16hi * dst, v16si * __restrict src) +{ + unsigned short tem[8]; + tem[0] = (*src)[0]; + tem[1] = (*src)[1]; + tem[2] = (*src)[2]; + tem[3] = (*src)[3]; + tem[4] = (*src)[4]; + tem[5] = (*src)[5]; + tem[6] = (*src)[6]; + tem[7] = (*src)[7]; + tem[8] = (*src)[8]; + tem[9] = (*src)[9]; + tem[10] = (*src)[10]; + tem[11] = (*src)[11]; + tem[12] = (*src)[12]; + tem[13] = (*src)[13]; + tem[14] = (*src)[14]; + tem[15] = (*src)[15]; + dst[0] = *(v16hi *) tem; +} + + +void +truncdb (v16qi * dst, v16si * __restrict src) +{ + unsigned char tem[8]; + tem[0] = (*src)[0]; + tem[1] = (*src)[1]; + tem[2] = (*src)[2]; + tem[3] = (*src)[3]; + tem[4] = (*src)[4]; + tem[5] = (*src)[5]; + tem[6] = (*src)[6]; + tem[7] = (*src)[7]; + tem[8] = (*src)[8]; + tem[9] = (*src)[9]; + tem[10] = (*src)[10]; + tem[11] = (*src)[11]; + tem[12] = (*src)[12]; + tem[13] = (*src)[13]; + tem[14] = (*src)[14]; + tem[15] = (*src)[15]; + dst[0] = *(v16qi *) tem; +} + +/* { dg-final { scan-assembler-times "vpmovqd" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovqw" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovdw" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovdb" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c b/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c new file mode 100644 index 00000000000..c6f748ca458 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c @@ -0,0 +1,37 @@ +/* PR target/92658 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -mavx512f -mavx512vl" } */ + +typedef unsigned short v8hi __attribute__((vector_size (16))); +typedef unsigned int v4si __attribute__((vector_size (16))); +typedef unsigned int v8si __attribute__((vector_size (32))); +typedef unsigned long long v4di __attribute__((vector_size (32))); + +void +truncqd_256 (v4si * dst, v4di * __restrict src) +{ + unsigned tem[4]; + tem[0] = (*src)[0]; + tem[1] = (*src)[1]; + tem[2] = (*src)[2]; + tem[3] = (*src)[3]; + dst[0] = *(v4si *) tem; +} + +void +truncdw_256 (v8hi * dst, v8si * __restrict src) +{ + unsigned short tem[8]; + tem[0] = (*src)[0]; + tem[1] = (*src)[1]; + tem[2] = (*src)[2]; + tem[3] = (*src)[3]; + tem[4] = (*src)[4]; + tem[5] = (*src)[5]; + tem[6] = (*src)[6]; + tem[7] = (*src)[7]; + dst[0] = *(v8hi *) tem; +} + +/* { dg-final { scan-assembler-times "vpmovqd" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovdw" 1 } } */ -- 2.18.1