diff mbox series

x86: improve fast bfloat->float conversion

Message ID d75a4d5a-8624-aa77-9f29-140767357b58@suse.com
State New
Headers show
Series x86: improve fast bfloat->float conversion | expand

Commit Message

Jan Beulich July 11, 2023, 6:08 a.m. UTC
There's nothing AVX512BW-ish in here, so no reason to use Yw as the
constraints for the AVX alternative. Furthermore by using the 512-bit
form of VPSSLD (in a new alternative) all 32 registers can be used
directly by the insn without AVX512VL needing to be enabled.

Also adjust the originally last alternative's "prefix" attribute to
maybe_evex.

gcc/

	* config/i386/i386.md (extendbfsf2_1): Add new AVX512F
	alternative. Adjust original last alternative's "prefix"
	attribute to maybe_evex.
---
The corresponding expander, "extendbfsf2", looks to have been dead since
its introduction in a1ecc5600464 ("Fix incorrect _mm_cvtsbh_ss"): The
builtin references the insn (extendbfsf2_1), not the expander. Can't the
expander be deleted and the name of the insn then pruned of the _1
suffix? If so, that further raises the question of the significance of
the "!HONOR_NANS (BFmode)" that the expander has, but the insn doesn't
have. Which may instead suggest the builtin was meant to reference the
expander. Yet then I can't see what would the builtin would expand to
when HONOR_NANS (BFmode) it true.

I further wonder whether the nearby "extendhfdf2" expander is really
needed. It doesn't look to specify anything that the corresponding insn
doesn't also specify.

Comments

Li, Pan2 via Gcc-patches July 11, 2023, 6:45 a.m. UTC | #1
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, July 11, 2023 2:08 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao <hongtao.liu@intel.com>; Kirill Yukhin
> <kirill.yukhin@gmail.com>
> Subject: [PATCH] x86: improve fast bfloat->float conversion
> 
> There's nothing AVX512BW-ish in here, so no reason to use Yw as the
> constraints for the AVX alternative. Furthermore by using the 512-bit form of
> VPSSLD (in a new alternative) all 32 registers can be used directly by the insn
> without AVX512VL needing to be enabled.
Yes, the instruction vpslld doesn't need AVX512BW, the patch LGTM.
> 
> Also adjust the originally last alternative's "prefix" attribute to maybe_evex.
> 
> gcc/
> 
> 	* config/i386/i386.md (extendbfsf2_1): Add new AVX512F
> 	alternative. Adjust original last alternative's "prefix"
> 	attribute to maybe_evex.
> ---
> The corresponding expander, "extendbfsf2", looks to have been dead since
> its introduction in a1ecc5600464 ("Fix incorrect _mm_cvtsbh_ss"): The builtin
> references the insn (extendbfsf2_1), not the expander. Can't the expander
> be deleted and the name of the insn then pruned of the _1 suffix? If so, that
> further raises the question of the significance of the "!HONOR_NANS
> (BFmode)" that the expander has, but the insn doesn't have. Which may
> instead suggest the builtin was meant to reference the expander. Yet then I
> can't see what would the builtin would expand to when HONOR_NANS
> (BFmode) it true.

Quote from what Jakub said in [1].
-------
This is not correct.
While using such code for _mm_cvtsbh_ss is fine if it is documented not to
raise exceptions and turn a sNaN into a qNaN, it is not fine for HONOR_NANS
(i.e. when -ffast-math is not on), because a __bf16 -> float conversion
on sNaN should raise invalid exception and turn it into a qNaN.
We could have extendbfsf2 expander that would FAIL; if HONOR_NANS and
emit extendbfsf2_1 otherwise. 
-------
[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607108.html
> 
> I further wonder whether the nearby "extendhfdf2" expander is really
> needed. It doesn't look to specify anything that the corresponding insn
> doesn't also specify.
> 
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -5181,21 +5181,27 @@
>  ;; Don't use float_extend since psrlld doesn't raise  ;; exceptions and turn a
> sNaN into a qNaN.
>  (define_insn "extendbfsf2_1"
> -  [(set (match_operand:SF 0 "register_operand"   "=x,Yw")
> +  [(set (match_operand:SF 0 "register_operand"   "=x,Yv,v")
>  	(unspec:SF
> -	  [(match_operand:BF 1 "register_operand" " 0,Yw")]
> +	  [(match_operand:BF 1 "register_operand" " 0,Yv,v")]
>  	  UNSPEC_CVTBFSF))]
>   "TARGET_SSE2"
>   "@
>    pslld\t{$16, %0|%0, 16}
> -  vpslld\t{$16, %1, %0|%0, %1, 16}"
> -  [(set_attr "isa" "noavx,avx")
> +  vpslld\t{$16, %1, %0|%0, %1, 16}
> +  vpslld\t{$16, %g1, %g0|%g0, %g1, 16}"
> +  [(set_attr "isa" "noavx,avx,*")
>     (set_attr "type" "sseishft1")
>     (set_attr "length_immediate" "1")
> -   (set_attr "prefix_data16" "1,*")
> -   (set_attr "prefix" "orig,vex")
> -   (set_attr "mode" "TI")
> -   (set_attr "memory" "none")])
> +   (set_attr "prefix_data16" "1,*,*")
> +   (set_attr "prefix" "orig,maybe_evex,evex")
> +   (set_attr "mode" "TI,TI,XI")
> +   (set_attr "memory" "none")
> +   (set (attr "enabled")
> +     (if_then_else (eq_attr "alternative" "2")
> +       (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
> +		    && !TARGET_PREFER_AVX256")
> +       (const_string "*")))])
> 
>  (define_expand "extend<mode>xf2"
>    [(set (match_operand:XF 0 "nonimmediate_operand")
Jan Beulich July 11, 2023, 7:50 a.m. UTC | #2
On 11.07.2023 08:45, Liu, Hongtao wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Tuesday, July 11, 2023 2:08 PM
>>
>> There's nothing AVX512BW-ish in here, so no reason to use Yw as the
>> constraints for the AVX alternative. Furthermore by using the 512-bit form of
>> VPSSLD (in a new alternative) all 32 registers can be used directly by the insn
>> without AVX512VL needing to be enabled.
> Yes, the instruction vpslld doesn't need AVX512BW, the patch LGTM.

Thanks.

>> ---
>> The corresponding expander, "extendbfsf2", looks to have been dead since
>> its introduction in a1ecc5600464 ("Fix incorrect _mm_cvtsbh_ss"): The builtin
>> references the insn (extendbfsf2_1), not the expander. Can't the expander
>> be deleted and the name of the insn then pruned of the _1 suffix? If so, that
>> further raises the question of the significance of the "!HONOR_NANS
>> (BFmode)" that the expander has, but the insn doesn't have. Which may
>> instead suggest the builtin was meant to reference the expander. Yet then I
>> can't see what would the builtin would expand to when HONOR_NANS
>> (BFmode) it true.
> 
> Quote from what Jakub said in [1].
> -------
> This is not correct.
> While using such code for _mm_cvtsbh_ss is fine if it is documented not to
> raise exceptions and turn a sNaN into a qNaN, it is not fine for HONOR_NANS
> (i.e. when -ffast-math is not on), because a __bf16 -> float conversion
> on sNaN should raise invalid exception and turn it into a qNaN.
> We could have extendbfsf2 expander that would FAIL; if HONOR_NANS and
> emit extendbfsf2_1 otherwise. 
> -------
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607108.html

I'm not sure I understand: It sounds like what Jakub said matches my
observation, yet then it seems unlikely that the issue wasn't fixed in
over half a year.

Also having the expander FAIL when HONOR_NANS (matching what I was
thinking) still doesn't clarify to me what then would happen to uses of
the builtin. Is there any (common code) fallback for such a case? I
didn't think there would be, in which case wouldn't this result in an
internal compiler error?

Jan
Jakub Jelinek July 11, 2023, 8:36 a.m. UTC | #3
On Tue, Jul 11, 2023 at 09:50:23AM +0200, Jan Beulich via Gcc-patches wrote:
> > Quote from what Jakub said in [1].
> > -------
> > This is not correct.
> > While using such code for _mm_cvtsbh_ss is fine if it is documented not to
> > raise exceptions and turn a sNaN into a qNaN, it is not fine for HONOR_NANS
> > (i.e. when -ffast-math is not on), because a __bf16 -> float conversion
> > on sNaN should raise invalid exception and turn it into a qNaN.
> > We could have extendbfsf2 expander that would FAIL; if HONOR_NANS and
> > emit extendbfsf2_1 otherwise. 
> > -------
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607108.html
> 
> I'm not sure I understand: It sounds like what Jakub said matches my
> observation, yet then it seems unlikely that the issue wasn't fixed in
> over half a year.
> 
> Also having the expander FAIL when HONOR_NANS (matching what I was
> thinking) still doesn't clarify to me what then would happen to uses of
> the builtin. Is there any (common code) fallback for such a case? I
> didn't think there would be, in which case wouldn't this result in an
> internal compiler error?

There is some bfloat specific generic code I've added last year in expr.cc
and optabs.cc:
grep arm_bfloat_ *.cc
expr.cc:		  || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
expr.cc:		  || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
expr.cc:      if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
expr.cc:	  && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
optabs.cc:		    == &arm_bfloat_half_format
optabs.cc:		    == &arm_bfloat_half_format
optabs.cc:  if (REAL_MODE_FORMAT (GET_MODE (from)) == &arm_bfloat_half_format
plus if that doesn't trigger there are libcalls for it in libgcc, e.g.
libgcc/soft-fp/extendbfsf2.c
The generic code ensure one doesn't need full set of library functions
like also extendbf{d,x,t,h}f2.c etc. when target has next to bfloat also
IEEE single support.
The generic code also uses shifts when not honoring NaNs (etc.) where
possible, but of course target code can override it if it can do the stuff
better than the generic code, I just wanted some fallback because
on targets which do support bfloat and IEEE single the shifts will be a
common way to extend.

	Jakub
Li, Pan2 via Gcc-patches July 11, 2023, 8:39 a.m. UTC | #4
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, July 11, 2023 3:50 PM
> To: Liu, Hongtao <hongtao.liu@intel.com>
> Cc: Kirill Yukhin <kirill.yukhin@gmail.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] x86: improve fast bfloat->float conversion
> 
> On 11.07.2023 08:45, Liu, Hongtao wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Tuesday, July 11, 2023 2:08 PM
> >>
> >> There's nothing AVX512BW-ish in here, so no reason to use Yw as the
> >> constraints for the AVX alternative. Furthermore by using the 512-bit
> >> form of VPSSLD (in a new alternative) all 32 registers can be used
> >> directly by the insn without AVX512VL needing to be enabled.
> > Yes, the instruction vpslld doesn't need AVX512BW, the patch LGTM.
> 
> Thanks.
> 
> >> ---
> >> The corresponding expander, "extendbfsf2", looks to have been dead
> >> since its introduction in a1ecc5600464 ("Fix incorrect
> >> _mm_cvtsbh_ss"): The builtin references the insn (extendbfsf2_1), not
> >> the expander. Can't the expander be deleted and the name of the insn
> >> then pruned of the _1 suffix? If so, that further raises the question
> >> of the significance of the "!HONOR_NANS (BFmode)" that the expander
> >> has, but the insn doesn't have. Which may instead suggest the builtin
> >> was meant to reference the expander. Yet then I can't see what would
> >> the builtin would expand to when HONOR_NANS
> >> (BFmode) it true.
> >
> > Quote from what Jakub said in [1].
> > -------
> > This is not correct.
> > While using such code for _mm_cvtsbh_ss is fine if it is documented
> > not to raise exceptions and turn a sNaN into a qNaN, it is not fine
> > for HONOR_NANS (i.e. when -ffast-math is not on), because a __bf16 ->
> > float conversion on sNaN should raise invalid exception and turn it into a
> qNaN.
> > We could have extendbfsf2 expander that would FAIL; if HONOR_NANS
> and
> > emit extendbfsf2_1 otherwise.
> > -------
> > [1]
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607108.html
> 
> I'm not sure I understand: It sounds like what Jakub said matches my
> observation, yet then it seems unlikely that the issue wasn't fixed in over half
> a year.
> 
> Also having the expander FAIL when HONOR_NANS (matching what I was
> thinking) still doesn't clarify to me what then would happen to uses of the
> builtin. Is there any (common code) fallback for such a case? I didn't think
> there would be, in which case wouldn't this result in an internal compiler
> error?
For __bf16 -> float or target specific builtins, it should be ok since __bf16 is just an extension type.
 but extendbfsf2 is a standard pattern name which is also used to expand c++23 std::bfloat16_t -> float conversion which is assumed to raise exceptions for sNAN.
Since vpslld won't raise any exception, we need to add HONOR_NANS in the extendbfsf2 pattern.
It's my understanding, for std:bfloat16_t support, it's mentioned in [2].

https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601865.html
> 
> Jan
diff mbox series

Patch

--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -5181,21 +5181,27 @@ 
 ;; Don't use float_extend since psrlld doesn't raise
 ;; exceptions and turn a sNaN into a qNaN.
 (define_insn "extendbfsf2_1"
-  [(set (match_operand:SF 0 "register_operand"   "=x,Yw")
+  [(set (match_operand:SF 0 "register_operand"   "=x,Yv,v")
 	(unspec:SF
-	  [(match_operand:BF 1 "register_operand" " 0,Yw")]
+	  [(match_operand:BF 1 "register_operand" " 0,Yv,v")]
 	  UNSPEC_CVTBFSF))]
  "TARGET_SSE2"
  "@
   pslld\t{$16, %0|%0, 16}
-  vpslld\t{$16, %1, %0|%0, %1, 16}"
-  [(set_attr "isa" "noavx,avx")
+  vpslld\t{$16, %1, %0|%0, %1, 16}
+  vpslld\t{$16, %g1, %g0|%g0, %g1, 16}"
+  [(set_attr "isa" "noavx,avx,*")
    (set_attr "type" "sseishft1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,*")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "mode" "TI")
-   (set_attr "memory" "none")])
+   (set_attr "prefix_data16" "1,*,*")
+   (set_attr "prefix" "orig,maybe_evex,evex")
+   (set_attr "mode" "TI,TI,XI")
+   (set_attr "memory" "none")
+   (set (attr "enabled")
+     (if_then_else (eq_attr "alternative" "2")
+       (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
+		    && !TARGET_PREFER_AVX256")
+       (const_string "*")))])
 
 (define_expand "extend<mode>xf2"
   [(set (match_operand:XF 0 "nonimmediate_operand")