Message ID | 20200628141906.418009-1-hjl.tools@gmail.com |
---|---|
State | New |
Headers | show |
Series | x86: Enable FMA in rsqrt<mode>2 expander | expand |
Hello HJ, On 28 июн 07:19, H.J. Lu via Gcc-patches wrote: > Enable FMA in rsqrt<mode>2 expander and fold rsqrtv16sf2 expander into > rsqrt<mode>2 expander which expands to UNSPEC_RSQRT28 for TARGET_AVX512ER. > Although it doesn't show performance change in our workloads, FMA can > improve other workloads. > > gcc/ > > PR target/88713 > * config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA. > * config/i386/sse.md (VF_AVX512VL_VF1_128_256): New. > (rsqrt<mode>2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256. > (rsqrtv16sf2): Removed. > > gcc/testsuite/ > > PR target/88713 > * gcc.target/i386/pr88713-1.c: New test. > * gcc.target/i386/pr88713-2.c: Likewise. So, you've introduced new rsqrt expanders for DF vectors and relaxed condition for V16SF. What I didn't get is why did you change unspec type from RSQRT to RSQRT28 for V16SF expander? -- K
On Tue, Jul 7, 2020 at 8:56 AM Kirill Yukhin <kirill.yukhin@gmail.com> wrote: > > Hello HJ, > > On 28 июн 07:19, H.J. Lu via Gcc-patches wrote: > > Enable FMA in rsqrt<mode>2 expander and fold rsqrtv16sf2 expander into > > rsqrt<mode>2 expander which expands to UNSPEC_RSQRT28 for TARGET_AVX512ER. > > Although it doesn't show performance change in our workloads, FMA can > > improve other workloads. > > > > gcc/ > > > > PR target/88713 > > * config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA. > > * config/i386/sse.md (VF_AVX512VL_VF1_128_256): New. > > (rsqrt<mode>2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256. > > (rsqrtv16sf2): Removed. > > > > gcc/testsuite/ > > > > PR target/88713 > > * gcc.target/i386/pr88713-1.c: New test. > > * gcc.target/i386/pr88713-2.c: Likewise. > > So, you've introduced new rsqrt expanders for DF vectors and relaxed > condition for V16SF. What I didn't get is why did you change unspec > type from RSQRT to RSQRT28 for V16SF expander? > UNSPEC in define_expand is meaningless when the pattern is fully expanded by ix86_emit_swsqrtsf. I believe that UNSPEC in rsqrt<mode>2 expander can be removed.
On 07 июл 09:06, H.J. Lu wrote: > On Tue, Jul 7, 2020 at 8:56 AM Kirill Yukhin <kirill.yukhin@gmail.com> wrote: > > > > Hello HJ, > > > > On 28 июн 07:19, H.J. Lu via Gcc-patches wrote: > > > Enable FMA in rsqrt<mode>2 expander and fold rsqrtv16sf2 expander into > > > rsqrt<mode>2 expander which expands to UNSPEC_RSQRT28 for TARGET_AVX512ER. > > > Although it doesn't show performance change in our workloads, FMA can > > > improve other workloads. > > > > > > gcc/ > > > > > > PR target/88713 > > > * config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA. > > > * config/i386/sse.md (VF_AVX512VL_VF1_128_256): New. > > > (rsqrt<mode>2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256. > > > (rsqrtv16sf2): Removed. > > > > > > gcc/testsuite/ > > > > > > PR target/88713 > > > * gcc.target/i386/pr88713-1.c: New test. > > > * gcc.target/i386/pr88713-2.c: Likewise. > > > > So, you've introduced new rsqrt expanders for DF vectors and relaxed > > condition for V16SF. What I didn't get is why did you change unspec > > type from RSQRT to RSQRT28 for V16SF expander? > > > > UNSPEC in define_expand is meaningless when the pattern is fully > expanded by ix86_emit_swsqrtsf. I believe that UNSPEC in rsqrt<mode>2 > expander can be removed. Agree. --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88713-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -Ofast -mno-avx512f -mfma" } */ I gues -O2 is useless here (and in -2.c test). Othwerwise LGTM. -- K > > -- > H.J.
On Thu, Jul 9, 2020 at 5:04 AM Kirill Yukhin <kirill.yukhin@gmail.com> wrote: > > On 07 июл 09:06, H.J. Lu wrote: > > On Tue, Jul 7, 2020 at 8:56 AM Kirill Yukhin <kirill.yukhin@gmail.com> wrote: > > > > > > Hello HJ, > > > > > > On 28 июн 07:19, H.J. Lu via Gcc-patches wrote: > > > > Enable FMA in rsqrt<mode>2 expander and fold rsqrtv16sf2 expander into > > > > rsqrt<mode>2 expander which expands to UNSPEC_RSQRT28 for TARGET_AVX512ER. > > > > Although it doesn't show performance change in our workloads, FMA can > > > > improve other workloads. > > > > > > > > gcc/ > > > > > > > > PR target/88713 > > > > * config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA. > > > > * config/i386/sse.md (VF_AVX512VL_VF1_128_256): New. > > > > (rsqrt<mode>2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256. > > > > (rsqrtv16sf2): Removed. > > > > > > > > gcc/testsuite/ > > > > > > > > PR target/88713 > > > > * gcc.target/i386/pr88713-1.c: New test. > > > > * gcc.target/i386/pr88713-2.c: Likewise. > > > > > > So, you've introduced new rsqrt expanders for DF vectors and relaxed > > > condition for V16SF. What I didn't get is why did you change unspec > > > type from RSQRT to RSQRT28 for V16SF expander? > > > > > > > UNSPEC in define_expand is meaningless when the pattern is fully > > expanded by ix86_emit_swsqrtsf. I believe that UNSPEC in rsqrt<mode>2 > > expander can be removed. > > Agree. I will leave UNSPEC alone here. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr88713-1.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -Ofast -mno-avx512f -mfma" } */ > > I gues -O2 is useless here (and in -2.c test). Fixed. > Othwerwise LGTM. > This is the patch I am checking in. Thanks.
On Thu, Jul 9, 2020 at 6:35 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Thu, Jul 9, 2020 at 5:04 AM Kirill Yukhin <kirill.yukhin@gmail.com> wrote: > > > > On 07 июл 09:06, H.J. Lu wrote: > > > On Tue, Jul 7, 2020 at 8:56 AM Kirill Yukhin <kirill.yukhin@gmail.com> wrote: > > > > > > > > Hello HJ, > > > > > > > > On 28 июн 07:19, H.J. Lu via Gcc-patches wrote: > > > > > Enable FMA in rsqrt<mode>2 expander and fold rsqrtv16sf2 expander into > > > > > rsqrt<mode>2 expander which expands to UNSPEC_RSQRT28 for TARGET_AVX512ER. > > > > > Although it doesn't show performance change in our workloads, FMA can > > > > > improve other workloads. > > > > > > > > > > gcc/ > > > > > > > > > > PR target/88713 > > > > > * config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA. > > > > > * config/i386/sse.md (VF_AVX512VL_VF1_128_256): New. > > > > > (rsqrt<mode>2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256. > > > > > (rsqrtv16sf2): Removed. > > > > > > > > > > gcc/testsuite/ > > > > > > > > > > PR target/88713 > > > > > * gcc.target/i386/pr88713-1.c: New test. > > > > > * gcc.target/i386/pr88713-2.c: Likewise. > > > > > > > > So, you've introduced new rsqrt expanders for DF vectors and relaxed > > > > condition for V16SF. What I didn't get is why did you change unspec > > > > type from RSQRT to RSQRT28 for V16SF expander? > > > > > > > > > > UNSPEC in define_expand is meaningless when the pattern is fully > > > expanded by ix86_emit_swsqrtsf. I believe that UNSPEC in rsqrt<mode>2 > > > expander can be removed. > > > > Agree. > > I will leave UNSPEC alone here. > > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr88713-1.c > > @@ -0,0 +1,13 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -Ofast -mno-avx512f -mfma" } */ > > > > I gues -O2 is useless here (and in -2.c test). > > Fixed. > > > Othwerwise LGTM. > > > > This is the patch I am checking in. > This patch is needed for FAIL: gcc.target/i386/avx512er-vrsqrt28ps-3.c (internal compiler error) FAIL: gcc.target/i386/avx512er-vrsqrt28ps-3.c (test for excess errors) FAIL: gcc.target/i386/avx512er-vrsqrt28ps-4.c (internal compiler error) FAIL: gcc.target/i386/avx512er-vrsqrt28ps-4.c (test for excess errors) FAIL: gcc.target/i386/avx512er-vrsqrt28ps-5.c (internal compiler error) FAIL: gcc.target/i386/avx512er-vrsqrt28ps-5.c (test for excess errors) FAIL: gcc.target/i386/avx512er-vrsqrt28ps-6.c (internal compiler error) FAIL: gcc.target/i386/avx512er-vrsqrt28ps-6.c (test for excess errors)
On Thu, Jul 9, 2020 at 3:02 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Thu, Jul 9, 2020 at 6:35 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > On Thu, Jul 9, 2020 at 5:04 AM Kirill Yukhin <kirill.yukhin@gmail.com> wrote: > > > > > > On 07 июл 09:06, H.J. Lu wrote: > > > > On Tue, Jul 7, 2020 at 8:56 AM Kirill Yukhin <kirill.yukhin@gmail.com> wrote: > > > > > > > > > > Hello HJ, > > > > > > > > > > On 28 июн 07:19, H.J. Lu via Gcc-patches wrote: > > > > > > Enable FMA in rsqrt<mode>2 expander and fold rsqrtv16sf2 expander into > > > > > > rsqrt<mode>2 expander which expands to UNSPEC_RSQRT28 for TARGET_AVX512ER. > > > > > > Although it doesn't show performance change in our workloads, FMA can > > > > > > improve other workloads. > > > > > > > > > > > > gcc/ > > > > > > > > > > > > PR target/88713 > > > > > > * config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA. > > > > > > * config/i386/sse.md (VF_AVX512VL_VF1_128_256): New. > > > > > > (rsqrt<mode>2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256. > > > > > > (rsqrtv16sf2): Removed. > > > > > > > > > > > > gcc/testsuite/ > > > > > > > > > > > > PR target/88713 > > > > > > * gcc.target/i386/pr88713-1.c: New test. > > > > > > * gcc.target/i386/pr88713-2.c: Likewise. > > > > > > > > > > So, you've introduced new rsqrt expanders for DF vectors and relaxed > > > > > condition for V16SF. What I didn't get is why did you change unspec > > > > > type from RSQRT to RSQRT28 for V16SF expander? > > > > > > > > > > > > > UNSPEC in define_expand is meaningless when the pattern is fully > > > > expanded by ix86_emit_swsqrtsf. I believe that UNSPEC in rsqrt<mode>2 > > > > expander can be removed. > > > > > > Agree. > > > > I will leave UNSPEC alone here. > > > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/pr88713-1.c > > > @@ -0,0 +1,13 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -Ofast -mno-avx512f -mfma" } */ > > > > > > I gues -O2 is useless here (and in -2.c test). > > > > Fixed. > > > > > Othwerwise LGTM. > > > > > > > This is the patch I am checking in. > > > > This patch is needed for > > FAIL: gcc.target/i386/avx512er-vrsqrt28ps-3.c (internal compiler error) > FAIL: gcc.target/i386/avx512er-vrsqrt28ps-3.c (test for excess errors) > FAIL: gcc.target/i386/avx512er-vrsqrt28ps-4.c (internal compiler error) > FAIL: gcc.target/i386/avx512er-vrsqrt28ps-4.c (test for excess errors) > FAIL: gcc.target/i386/avx512er-vrsqrt28ps-5.c (internal compiler error) > FAIL: gcc.target/i386/avx512er-vrsqrt28ps-5.c (test for excess errors) > FAIL: gcc.target/i386/avx512er-vrsqrt28ps-6.c (internal compiler error) > FAIL: gcc.target/i386/avx512er-vrsqrt28ps-6.c (test for excess errors) > This fixed: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96144
On Thu, Jul 09, 2020 at 03:02:35PM -0700, H.J. Lu via Gcc-patches wrote: --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -15540,7 +15540,11 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, machine_mode mode, bool recip) /* e0 = x0 * a */ emit_insn (gen_rtx_SET (e0, gen_rtx_MULT (mode, x0, a))); - if (TARGET_FMA || TARGET_AVX512F) + unsigned vector_size = GET_MODE_SIZE (mode); + if (TARGET_FMA + || (TARGET_AVX512F && vector_size == 64) + || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16))) + emit_insn (gen_rtx_SET (e2, Why the empty line in there? Ok for trunk with that fixed. gen_rtx_FMA (mode, e0, x0, mthree))); else
On Fri, Jul 10, 2020 at 4:19 AM Jakub Jelinek <jakub@redhat.com> wrote: > > On Thu, Jul 09, 2020 at 03:02:35PM -0700, H.J. Lu via Gcc-patches wrote: > --- a/gcc/config/i386/i386-expand.c > +++ b/gcc/config/i386/i386-expand.c > @@ -15540,7 +15540,11 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, machine_mode mode, bool recip) > /* e0 = x0 * a */ > emit_insn (gen_rtx_SET (e0, gen_rtx_MULT (mode, x0, a))); > > - if (TARGET_FMA || TARGET_AVX512F) > + unsigned vector_size = GET_MODE_SIZE (mode); > + if (TARGET_FMA > + || (TARGET_AVX512F && vector_size == 64) > + || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16))) > + > emit_insn (gen_rtx_SET (e2, > > Why the empty line in there? > Ok for trunk with that fixed. > > gen_rtx_FMA (mode, e0, x0, mthree))); > else This is the patch I am checking in. Thanks.
On Thu, Jul 9, 2020 at 6:35 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Thu, Jul 9, 2020 at 5:04 AM Kirill Yukhin <kirill.yukhin@gmail.com> wrote: > > > > On 07 июл 09:06, H.J. Lu wrote: > > > On Tue, Jul 7, 2020 at 8:56 AM Kirill Yukhin <kirill.yukhin@gmail.com> wrote: > > > > > > > > Hello HJ, > > > > > > > > On 28 июн 07:19, H.J. Lu via Gcc-patches wrote: > > > > > Enable FMA in rsqrt<mode>2 expander and fold rsqrtv16sf2 expander into > > > > > rsqrt<mode>2 expander which expands to UNSPEC_RSQRT28 for TARGET_AVX512ER. > > > > > Although it doesn't show performance change in our workloads, FMA can > > > > > improve other workloads. > > > > > > > > > > gcc/ > > > > > > > > > > PR target/88713 > > > > > * config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA. > > > > > * config/i386/sse.md (VF_AVX512VL_VF1_128_256): New. > > > > > (rsqrt<mode>2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256. > > > > > (rsqrtv16sf2): Removed. > > > > > > > > > > gcc/testsuite/ > > > > > > > > > > PR target/88713 > > > > > * gcc.target/i386/pr88713-1.c: New test. > > > > > * gcc.target/i386/pr88713-2.c: Likewise. > > > > > > > > So, you've introduced new rsqrt expanders for DF vectors and relaxed > > > > condition for V16SF. What I didn't get is why did you change unspec > > > > type from RSQRT to RSQRT28 for V16SF expander? > > > > > > > > > > UNSPEC in define_expand is meaningless when the pattern is fully > > > expanded by ix86_emit_swsqrtsf. I believe that UNSPEC in rsqrt<mode>2 > > > expander can be removed. > > > > Agree. > > I will leave UNSPEC alone here. > > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr88713-1.c > > @@ -0,0 +1,13 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -Ofast -mno-avx512f -mfma" } */ > > > > I gues -O2 is useless here (and in -2.c test). > > Fixed. > > > Othwerwise LGTM. > > > > This is the patch I am checking in. > Since ix86_emit_swsqrtsf shouldn't be called with DF vector modes, rename VF_AVX512VL_VF1_128_256 to VF1_AVX512ER_128_256 and drop DF vector modes.
On Mon, Jul 13, 2020 at 9:12 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Thu, Jul 9, 2020 at 6:35 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > On Thu, Jul 9, 2020 at 5:04 AM Kirill Yukhin <kirill.yukhin@gmail.com> wrote: > > > > > > On 07 июл 09:06, H.J. Lu wrote: > > > > On Tue, Jul 7, 2020 at 8:56 AM Kirill Yukhin <kirill.yukhin@gmail.com> wrote: > > > > > > > > > > Hello HJ, > > > > > > > > > > On 28 июн 07:19, H.J. Lu via Gcc-patches wrote: > > > > > > Enable FMA in rsqrt<mode>2 expander and fold rsqrtv16sf2 expander into > > > > > > rsqrt<mode>2 expander which expands to UNSPEC_RSQRT28 for TARGET_AVX512ER. > > > > > > Although it doesn't show performance change in our workloads, FMA can > > > > > > improve other workloads. > > > > > > > > > > > > gcc/ > > > > > > > > > > > > PR target/88713 > > > > > > * config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA. > > > > > > * config/i386/sse.md (VF_AVX512VL_VF1_128_256): New. > > > > > > (rsqrt<mode>2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256. > > > > > > (rsqrtv16sf2): Removed. > > > > > > > > > > > > gcc/testsuite/ > > > > > > > > > > > > PR target/88713 > > > > > > * gcc.target/i386/pr88713-1.c: New test. > > > > > > * gcc.target/i386/pr88713-2.c: Likewise. > > > > > > > > > > So, you've introduced new rsqrt expanders for DF vectors and relaxed > > > > > condition for V16SF. What I didn't get is why did you change unspec > > > > > type from RSQRT to RSQRT28 for V16SF expander? > > > > > > > > > > > > > UNSPEC in define_expand is meaningless when the pattern is fully > > > > expanded by ix86_emit_swsqrtsf. I believe that UNSPEC in rsqrt<mode>2 > > > > expander can be removed. > > > > > > Agree. > > > > I will leave UNSPEC alone here. > > > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/pr88713-1.c > > > @@ -0,0 +1,13 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -Ofast -mno-avx512f -mfma" } */ > > > > > > I gues -O2 is useless here (and in -2.c test). > > > > Fixed. > > > > > Othwerwise LGTM. > > > > > > > This is the patch I am checking in. > > > > Since ix86_emit_swsqrtsf shouldn't be called with DF vector modes, rename > VF_AVX512VL_VF1_128_256 to VF1_AVX512ER_128_256 and drop DF vector modes. > I will check in this patch to fix the regression next Monday if there are objections. Thanks.
On Thu, Jul 16, 2020 at 7:50 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Mon, Jul 13, 2020 at 9:12 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > On Thu, Jul 9, 2020 at 6:35 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > > > On Thu, Jul 9, 2020 at 5:04 AM Kirill Yukhin <kirill.yukhin@gmail.com> wrote: > > > > > > > > On 07 июл 09:06, H.J. Lu wrote: > > > > > On Tue, Jul 7, 2020 at 8:56 AM Kirill Yukhin <kirill.yukhin@gmail.com> wrote: > > > > > > > > > > > > Hello HJ, > > > > > > > > > > > > On 28 июн 07:19, H.J. Lu via Gcc-patches wrote: > > > > > > > Enable FMA in rsqrt<mode>2 expander and fold rsqrtv16sf2 expander into > > > > > > > rsqrt<mode>2 expander which expands to UNSPEC_RSQRT28 for TARGET_AVX512ER. > > > > > > > Although it doesn't show performance change in our workloads, FMA can > > > > > > > improve other workloads. > > > > > > > > > > > > > > gcc/ > > > > > > > > > > > > > > PR target/88713 > > > > > > > * config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA. > > > > > > > * config/i386/sse.md (VF_AVX512VL_VF1_128_256): New. > > > > > > > (rsqrt<mode>2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256. > > > > > > > (rsqrtv16sf2): Removed. > > > > > > > > > > > > > > gcc/testsuite/ > > > > > > > > > > > > > > PR target/88713 > > > > > > > * gcc.target/i386/pr88713-1.c: New test. > > > > > > > * gcc.target/i386/pr88713-2.c: Likewise. > > > > > > > > > > > > So, you've introduced new rsqrt expanders for DF vectors and relaxed > > > > > > condition for V16SF. What I didn't get is why did you change unspec > > > > > > type from RSQRT to RSQRT28 for V16SF expander? > > > > > > > > > > > > > > > > UNSPEC in define_expand is meaningless when the pattern is fully > > > > > expanded by ix86_emit_swsqrtsf. I believe that UNSPEC in rsqrt<mode>2 > > > > > expander can be removed. > > > > > > > > Agree. > > > > > > I will leave UNSPEC alone here. > > > > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/i386/pr88713-1.c > > > > @@ -0,0 +1,13 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O2 -Ofast -mno-avx512f -mfma" } */ > > > > > > > > I gues -O2 is useless here (and in -2.c test). > > > > > > Fixed. > > > > > > > Othwerwise LGTM. > > > > > > > > > > This is the patch I am checking in. > > > > > > > Since ix86_emit_swsqrtsf shouldn't be called with DF vector modes, rename > > VF_AVX512VL_VF1_128_256 to VF1_AVX512ER_128_256 and drop DF vector modes. > > > > I will check in this patch to fix the regression next Monday if there > are objections. > One more people reported the same issue. I am checking it in now.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index d81dd73f034..49718b7a41c 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -15535,14 +15535,22 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, machine_mode mode, bool recip) } } + mthree = force_reg (mode, mthree); + /* e0 = x0 * a */ emit_insn (gen_rtx_SET (e0, gen_rtx_MULT (mode, x0, a))); - /* e1 = e0 * x0 */ - emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, e0, x0))); - /* e2 = e1 - 3. */ - mthree = force_reg (mode, mthree); - emit_insn (gen_rtx_SET (e2, gen_rtx_PLUS (mode, e1, mthree))); + if (TARGET_FMA || TARGET_AVX512F) + emit_insn (gen_rtx_SET (e2, + gen_rtx_FMA (mode, e0, x0, mthree))); + else + { + /* e1 = e0 * x0 */ + emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, e0, x0))); + + /* e2 = e1 - 3. */ + emit_insn (gen_rtx_SET (e2, gen_rtx_PLUS (mode, e1, mthree))); + } mhalf = force_reg (mode, mhalf); if (recip) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 431571a4bc1..d3ad5833e1f 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -326,6 +326,12 @@ (define_mode_iterator VF_AVX512VL [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) +;; AVX512VL SF/DF plus 128- and 256-bit SF vector modes +(define_mode_iterator VF_AVX512VL_VF1_128_256 + [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX512VL") + (V2DF "TARGET_AVX512VL")]) + (define_mode_iterator VF2_AVX512VL [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) @@ -2070,26 +2076,16 @@ (define_insn "*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>" (set_attr "mode" "<ssescalarmode>")]) (define_expand "rsqrt<mode>2" - [(set (match_operand:VF1_128_256 0 "register_operand") - (unspec:VF1_128_256 - [(match_operand:VF1_128_256 1 "vector_operand")] UNSPEC_RSQRT))] + [(set (match_operand:VF_AVX512VL_VF1_128_256 0 "register_operand") + (unspec:VF_AVX512VL_VF1_128_256 + [(match_operand:VF_AVX512VL_VF1_128_256 1 "vector_operand")] + UNSPEC_RSQRT))] "TARGET_SSE && TARGET_SSE_MATH" { ix86_emit_swsqrtsf (operands[0], operands[1], <MODE>mode, true); DONE; }) -(define_expand "rsqrtv16sf2" - [(set (match_operand:V16SF 0 "register_operand") - (unspec:V16SF - [(match_operand:V16SF 1 "vector_operand")] - UNSPEC_RSQRT28))] - "TARGET_AVX512ER && TARGET_SSE_MATH" -{ - ix86_emit_swsqrtsf (operands[0], operands[1], V16SFmode, true); - DONE; -}) - (define_insn "<sse>_rsqrt<mode>2" [(set (match_operand:VF1_128_256 0 "register_operand" "=x") (unspec:VF1_128_256 diff --git a/gcc/testsuite/gcc.target/i386/pr88713-1.c b/gcc/testsuite/gcc.target/i386/pr88713-1.c new file mode 100644 index 00000000000..2f583b6d1a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88713-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -Ofast -mno-avx512f -mfma" } */ + +extern float sqrtf (float); + +void +rsqrt (float* restrict r, float* restrict a) +{ + for (int i = 0; i < 64; i++) + r[i] = sqrtf(a[i]); +} + +/* { dg-final { scan-assembler "\tvfmadd\[123\]+ps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr88713-2.c b/gcc/testsuite/gcc.target/i386/pr88713-2.c new file mode 100644 index 00000000000..559026df485 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88713-2.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -Ofast -march=skylake-avx512 -mno-fma" } */ + +#include "pr88713-1.c" + +/* { dg-final { scan-assembler "\tvfmadd\[123\]+ps" } } */