diff mbox series

doc: showcase a "union of vectors" pattern (PR 88698)

Message ID alpine.LNX.2.20.13.1901201255380.13981@monopod.intra.ispras.ru
State New
Headers show
Series doc: showcase a "union of vectors" pattern (PR 88698) | expand

Commit Message

Alexander Monakov Jan. 20, 2019, 10:39 a.m. UTC
Hi,

PR 88698 ("Relax generic vector conversions") is asking to be more permissive
about type compatibility for generic vector types.  While I don't think that's
a good idea per se, from a "compiler user" point of view I also see how writing
code for SSE/AVX using both generic vectors and x86 intrinsics is inconvenient
because all intrinsics use the same type such as __m128i, regardless of whether
they operate on 8/16/32/64-bit signed/unsigned lanes.

My solution to that is to introduce a union type that holds both the type
accepted by intrinsics and generic vector types used by my code. It also has
an interesting effect that each statement spells out the lane type.

Is it appropriate to showcase this approach in our documentation, as done in
this patch?  What to do with the PR, should I leave it open?

	PR c/88698
	* doc/extend.texi (Vector Extensions): Add an example of using vector
	types together with x86 intrinsics.

Comments

Richard Biener Jan. 21, 2019, 8:25 a.m. UTC | #1
On Sun, Jan 20, 2019 at 11:40 AM Alexander Monakov <amonakov@ispras.ru> wrote:
>
> Hi,
>
> PR 88698 ("Relax generic vector conversions") is asking to be more permissive
> about type compatibility for generic vector types.  While I don't think that's
> a good idea per se, from a "compiler user" point of view I also see how writing
> code for SSE/AVX using both generic vectors and x86 intrinsics is inconvenient
> because all intrinsics use the same type such as __m128i, regardless of whether
> they operate on 8/16/32/64-bit signed/unsigned lanes.
>
> My solution to that is to introduce a union type that holds both the type
> accepted by intrinsics and generic vector types used by my code. It also has
> an interesting effect that each statement spells out the lane type.
>
> Is it appropriate to showcase this approach in our documentation, as done in
> this patch?  What to do with the PR, should I leave it open?

I think it's appropriate to have this kind of hints in the documentation.

I'm not sure whether it's permitted to use this in C++ or not (and whether
the C usage is now officially sanctioned other than via a non-normative
footnote) - but at least GCC will guarantee that it works.

Might be interesting to show how to do argument marshalling with the
same union.

Richard.

>         PR c/88698
>         * doc/extend.texi (Vector Extensions): Add an example of using vector
>         types together with x86 intrinsics.
>
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -10632,6 +10632,40 @@ v4si g = __builtin_convertvector (f, v4si); /* g is @{1,-2,3,7@} */
>  v4si h = __builtin_convertvector (c, v4si); /* h is @{1,5,0,10@} */
>  @end smallexample
>
> +@cindex vector types, using with x86 intrinsics
> +Sometimes it is desirable to write code using a mix of generic vector
> +operations (for clarity) and machine-specific vector intrinsics (to
> +access vector instructions that are not exposed via generic built-ins).
> +On x86, intrinsic functions for integer vectors typically use the same
> +vector type @code{__m128i} irrespective of how they interpret the vector,
> +making it necessary to cast their arguments and return values from/to
> +other vector types.  In C, you can make use of a @code{union} type:
> +@c In C++ such type punning via a union is not allowed by the language
> +@smallexample
> +#include <immintrin.h>
> +
> +typedef unsigned char u8x16 __attribute__ ((vector_size (16)));
> +typedef unsigned int  u32x4 __attribute__ ((vector_size (16)));
> +
> +typedef union @{
> +        u8x16   u8;
> +        u32x4   u32;
> +        __m128i mm;
> +@} v128;
> +@end smallexample
> +
> +@noindent
> +for variables that can be used with both built-in operations and x86
> +intrinsics:
> +
> +@smallexample
> +v128 x, y = @{ 0 @};
> +memcpy (&x, ptr, sizeof x);
> +y.u8  += 0x80;
> +x.mm  = _mm_adds_epu8 (x.mm, y.mm);
> +x.u32 &= 0xffffff;
> +@end smallexample
> +
>  @node Offsetof
>  @section Support for @code{offsetof}
>  @findex __builtin_offsetof
Alexander Monakov Jan. 21, 2019, 9:13 a.m. UTC | #2
On Mon, 21 Jan 2019, Richard Biener wrote:

> I'm not sure whether it's permitted to use this in C++ or not (and whether
> the C usage is now officially sanctioned other than via a non-normative
> footnote) - but at least GCC will guarantee that it works.

My impression is that, via the "active member" wording, C++ standard says
clearly this is UB.  I assume on the GCC side the intention is to treat it
like in C (for POD types only?..), but I don't want to go there in the
"Vector Extensions" section.  Hence my patch limits the suggestion to C.

I think in C++ it's a bit less of an issue anyway, because people can use
operator overloading to achieve a similar end result, with some extra legwork.

Do you want me to change the text somehow?

> Might be interesting to show how to do argument marshalling with the
> same union.

Sorry, I don't see what you mean here; can you give an example or elaborate?

Thanks!
Alexander
Richard Biener Jan. 21, 2019, 11:13 a.m. UTC | #3
On Mon, Jan 21, 2019 at 10:14 AM Alexander Monakov <amonakov@ispras.ru> wrote:
>
> On Mon, 21 Jan 2019, Richard Biener wrote:
>
> > I'm not sure whether it's permitted to use this in C++ or not (and whether
> > the C usage is now officially sanctioned other than via a non-normative
> > footnote) - but at least GCC will guarantee that it works.
>
> My impression is that, via the "active member" wording, C++ standard says
> clearly this is UB.  I assume on the GCC side the intention is to treat it
> like in C (for POD types only?..), but I don't want to go there in the
> "Vector Extensions" section.  Hence my patch limits the suggestion to C.
>
> I think in C++ it's a bit less of an issue anyway, because people can use
> operator overloading to achieve a similar end result, with some extra legwork.
>
> Do you want me to change the text somehow?

No, it was just a note from my side.

> > Might be interesting to show how to do argument marshalling with the
> > same union.
>
> Sorry, I don't see what you mean here; can you give an example or elaborate?

when you call a function with v4si signature but want to pass it a result
from a __mm intrinsic you'd need a temporary union to do the marshalling.
I think you can use a union argument in the function itself and you can
elide the temporary then if you use transparent_union.  But I didn't try.

Richard.

> Thanks!
> Alexander
Alexander Monakov Jan. 21, 2019, 5:54 p.m. UTC | #4
On Mon, 21 Jan 2019, Richard Biener wrote:

> > Sorry, I don't see what you mean here; can you give an example or elaborate?
> 
> when you call a function with v4si signature but want to pass it a result
> from a __mm intrinsic you'd need a temporary union to do the marshalling.
> I think you can use a union argument in the function itself and you can
> elide the temporary then if you use transparent_union.  But I didn't try.

Ah, I see now.  I agree transparent_union ought to work, but today both GCC
and Clang will reject such attempt; I've filed PR 88955 for the GCC issue.

So unfortunately such code would still need a cast or an unnamed temporary,
which may be relatively obvious to the reader.

Alexander
Alexander Monakov Jan. 30, 2019, 2:41 p.m. UTC | #5
On Mon, 21 Jan 2019, Alexander Monakov wrote:

> Ah, I see now.  I agree transparent_union ought to work, but today both GCC
> and Clang will reject such attempt; I've filed PR 88955 for the GCC issue.
> 
> So unfortunately such code would still need a cast or an unnamed temporary,
> which may be relatively obvious to the reader.

In the end it may be preferable to have it anyway, so here's a revised version
with an extra line in the example for passing arguments via compound literals.

Checked with 'make html', OK to apply?

	PR c/88698
	* doc/extend.texi (Vector Extensions): Add an example of using vector
	types together with x86 intrinsics.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 95d22ac1e3c..34a76927b12 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -10632,6 +10632,47 @@ v4si g = __builtin_convertvector (f, v4si); /* g is @{1,-2,3,7@} */
 v4si h = __builtin_convertvector (c, v4si); /* h is @{1,5,0,10@} */
 @end smallexample
 
+@cindex vector types, using with x86 intrinsics
+Sometimes it is desirable to write code using a mix of generic vector
+operations (for clarity) and machine-specific vector intrinsics (to
+access vector instructions that are not exposed via generic built-ins).
+On x86, intrinsic functions for integer vectors typically use the same
+vector type @code{__m128i} irrespective of how they interpret the vector,
+making it necessary to cast their arguments and return values from/to
+other vector types.  In C, you can make use of a @code{union} type:
+@c In C++ such type punning via a union is not allowed by the language
+@smallexample
+#include <immintrin.h>
+
+typedef unsigned char u8x16 __attribute__ ((vector_size (16)));
+typedef unsigned int  u32x4 __attribute__ ((vector_size (16)));
+
+typedef union @{
+        __m128i mm;
+        u8x16   u8;
+        u32x4   u32;
+@} v128;
+@end smallexample
+
+@noindent
+for variables that can be used with both built-in operators and x86
+intrinsics:
+
+@smallexample
+v128 x, y = @{ 0 @};
+memcpy (&x, ptr, sizeof x);
+y.u8  += 0x80;
+x.mm  = _mm_adds_epu8 (x.mm, y.mm);
+x.u32 &= 0xffffff;
+
+/* Instead of a variable, a compound literal may be used to pass the
+   return value of an intrinsic call to a function expecting the union: */
+v128 foo (v128);
+x = foo ((v128) @{_mm_adds_epu8 (x.mm, y.mm)@});
+@c This could be done implicitly with __attribute__((transparent_union)),
+@c but GCC does not accept it for unions of vector types (PR 88955).
+@end smallexample
+
 @node Offsetof
 @section Support for @code{offsetof}
 @findex __builtin_offsetof
Richard Biener Jan. 31, 2019, 9:28 a.m. UTC | #6
On Wed, Jan 30, 2019 at 3:42 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>
> On Mon, 21 Jan 2019, Alexander Monakov wrote:
>
> > Ah, I see now.  I agree transparent_union ought to work, but today both GCC
> > and Clang will reject such attempt; I've filed PR 88955 for the GCC issue.
> >
> > So unfortunately such code would still need a cast or an unnamed temporary,
> > which may be relatively obvious to the reader.
>
> In the end it may be preferable to have it anyway, so here's a revised version
> with an extra line in the example for passing arguments via compound literals.
>
> Checked with 'make html', OK to apply?

OK unless somebody objects within a reasonable time frame.

Richard.

>         PR c/88698
>         * doc/extend.texi (Vector Extensions): Add an example of using vector
>         types together with x86 intrinsics.
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 95d22ac1e3c..34a76927b12 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -10632,6 +10632,47 @@ v4si g = __builtin_convertvector (f, v4si); /* g is @{1,-2,3,7@} */
>  v4si h = __builtin_convertvector (c, v4si); /* h is @{1,5,0,10@} */
>  @end smallexample
>
> +@cindex vector types, using with x86 intrinsics
> +Sometimes it is desirable to write code using a mix of generic vector
> +operations (for clarity) and machine-specific vector intrinsics (to
> +access vector instructions that are not exposed via generic built-ins).
> +On x86, intrinsic functions for integer vectors typically use the same
> +vector type @code{__m128i} irrespective of how they interpret the vector,
> +making it necessary to cast their arguments and return values from/to
> +other vector types.  In C, you can make use of a @code{union} type:
> +@c In C++ such type punning via a union is not allowed by the language
> +@smallexample
> +#include <immintrin.h>
> +
> +typedef unsigned char u8x16 __attribute__ ((vector_size (16)));
> +typedef unsigned int  u32x4 __attribute__ ((vector_size (16)));
> +
> +typedef union @{
> +        __m128i mm;
> +        u8x16   u8;
> +        u32x4   u32;
> +@} v128;
> +@end smallexample
> +
> +@noindent
> +for variables that can be used with both built-in operators and x86
> +intrinsics:
> +
> +@smallexample
> +v128 x, y = @{ 0 @};
> +memcpy (&x, ptr, sizeof x);
> +y.u8  += 0x80;
> +x.mm  = _mm_adds_epu8 (x.mm, y.mm);
> +x.u32 &= 0xffffff;
> +
> +/* Instead of a variable, a compound literal may be used to pass the
> +   return value of an intrinsic call to a function expecting the union: */
> +v128 foo (v128);
> +x = foo ((v128) @{_mm_adds_epu8 (x.mm, y.mm)@});
> +@c This could be done implicitly with __attribute__((transparent_union)),
> +@c but GCC does not accept it for unions of vector types (PR 88955).
> +@end smallexample
> +
>  @node Offsetof
>  @section Support for @code{offsetof}
>  @findex __builtin_offsetof
diff mbox series

Patch

--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -10632,6 +10632,40 @@  v4si g = __builtin_convertvector (f, v4si); /* g is @{1,-2,3,7@} */
 v4si h = __builtin_convertvector (c, v4si); /* h is @{1,5,0,10@} */
 @end smallexample
 
+@cindex vector types, using with x86 intrinsics
+Sometimes it is desirable to write code using a mix of generic vector
+operations (for clarity) and machine-specific vector intrinsics (to
+access vector instructions that are not exposed via generic built-ins).
+On x86, intrinsic functions for integer vectors typically use the same
+vector type @code{__m128i} irrespective of how they interpret the vector,
+making it necessary to cast their arguments and return values from/to
+other vector types.  In C, you can make use of a @code{union} type:
+@c In C++ such type punning via a union is not allowed by the language
+@smallexample
+#include <immintrin.h>
+
+typedef unsigned char u8x16 __attribute__ ((vector_size (16)));
+typedef unsigned int  u32x4 __attribute__ ((vector_size (16)));
+
+typedef union @{
+        u8x16   u8;
+        u32x4   u32;
+        __m128i mm;
+@} v128;
+@end smallexample
+
+@noindent
+for variables that can be used with both built-in operations and x86
+intrinsics:
+
+@smallexample
+v128 x, y = @{ 0 @};
+memcpy (&x, ptr, sizeof x);
+y.u8  += 0x80;
+x.mm  = _mm_adds_epu8 (x.mm, y.mm);
+x.u32 &= 0xffffff;
+@end smallexample
+
 @node Offsetof
 @section Support for @code{offsetof}
 @findex __builtin_offsetof