diff mbox

x86: _mm512_set1_p[sd]

Message ID 8738idzj52.fsf@x240.local.i-did-not-set--mail-host-address--so-tickle-me
State New
Headers show

Commit Message

Ulrich Drepper March 20, 2014, 2:41 a.m. UTC
Another set of functions missing are those to set all elements of a
512-bit vector to the same float or double value.  I think the patch
below uses the optimal code sequence for that.  The patch requires the
previous patch introducing _mm*_undefined_*.


2014-03-19  Ulrich Drepper  <drepper@gmail.com>

	* config/i386/avx512fintrin.h: Define _mm512_set1_ps and
	_mm512_set1_pd.

Comments

Kirill Yukhin March 24, 2014, 5:50 a.m. UTC | #1
Hello Ulrich,
On 19 Mar 22:41, Ulrich Drepper wrote:
> Another set of functions missing are those to set all elements of a
> 512-bit vector to the same float or double value.  I think the patch
> below uses the optimal code sequence for that.  The patch requires the
> previous patch introducing _mm*_undefined_*.
> 
> 
> 2014-03-19  Ulrich Drepper  <drepper@gmail.com>
> 
> 	* config/i386/avx512fintrin.h: Define _mm512_set1_ps and
> 	_mm512_set1_pd.
Your patch is correct IMHO, but maybe it worst to add all missing
`mm512_set1*' stuff?

According to trunk and [1] we're still missing (beside mentioned by you)
_mm512_set1_epi16 and  _mm512_set1_epi8 broadcasts.

[1] - http://software.intel.com/sites/landingpage/IntrinsicsGuide/

--
Thanks, K
Ulrich Drepper March 24, 2014, 11:13 a.m. UTC | #2
On Mon, Mar 24, 2014 at 1:50 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Your patch is correct IMHO, but maybe it worst to add all missing
> `mm512_set1*' stuff?
>
> According to trunk and [1] we're still missing (beside mentioned by you)
> _mm512_set1_epi16 and  _mm512_set1_epi8 broadcasts.

Yes, more are missing, but I think those will need new builtins.  The
_ps and _pd don't require additional instructions.

_mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8
might have to map to vpbroadcastb.  I haven't seen a way to generate
those instructions if needed and so this work was out of scope for now
due to time constraints.  I agree, they should be added as quickly as
possible to avoid releasing headers with incomplete APIs.

What is the verdict on checking these changes in?  Too late for the
next release?
Richard Biener March 24, 2014, 11:27 a.m. UTC | #3
On Mon, Mar 24, 2014 at 12:13 PM, Ulrich Drepper <drepper@gmail.com> wrote:
> On Mon, Mar 24, 2014 at 1:50 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
>> Your patch is correct IMHO, but maybe it worst to add all missing
>> `mm512_set1*' stuff?
>>
>> According to trunk and [1] we're still missing (beside mentioned by you)
>> _mm512_set1_epi16 and  _mm512_set1_epi8 broadcasts.
>
> Yes, more are missing, but I think those will need new builtins.  The
> _ps and _pd don't require additional instructions.
>
> _mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8
> might have to map to vpbroadcastb.  I haven't seen a way to generate
> those instructions if needed and so this work was out of scope for now
> due to time constraints.  I agree, they should be added as quickly as
> possible to avoid releasing headers with incomplete APIs.
>
> What is the verdict on checking these changes in?  Too late for the
> next release?

This kind of changes can also be made for 4.9.1 for example.

Richard.
diff mbox

Patch

diff -u b/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
--- b/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -130,6 +130,28 @@ 
   return __Y;
 }
 
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_pd (double __A)
+{
+  return (__m512d) __builtin_ia32_broadcastsd512 (__extension__
+						  (__v2df) { __A, },
+						  (__v8df)
+						  _mm512_undefined_pd (),
+						  (__mmask8) -1);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_ps (float __A)
+{
+  return (__m512) __builtin_ia32_broadcastss512 (__extension__
+						 (__v4sf) { __A, },
+						 (__v16sf)
+						 _mm512_undefined_ps (),
+						 (__mmask16) -1);
+}
+
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_setzero_ps (void)