Message ID | 5D149658020000780023B7A9@prv1-mh.provo.novell.com |
---|---|
State | New |
Headers | show |
Series | x86: mark "k" and "Yk" constraints as non-internal | expand |
>>> On 27.06.19 at 12:11, wrote: > Without these constraints asm() can't make use of mask registers. Similarly it is entirely unclear to me how to use e.g. v4fmaddps or vp2intersectd in asm(): For the former the respective "Yh" constraint was dropped (oddly enough leaving its comment line in place), and there's nothing I can spot for a mask register pair. Using (in an attempt to find alternative options) typedef float __attribute__((vector_size(256))) v64sf_t; results in "impossible constraint" when using an operand of this type in asm(), while typedef unsigned __attribute__((mode(P2HI))) kpair_t; gives "no data type for mode" even without any actual use. Jan
On Thu, Jun 27, 2019 at 12:11 PM Jan Beulich <JBeulich@suse.com> wrote: > > Without these constraints asm() can't make use of mask registers. asm should be deprecated. We have intrinsics for this purpose. Uros. > gcc/ > 2019-06-27 Jan Beulich <jbeulich@suse.com> > > * config/i386/constraints.md: Remove @internal from "k" and > "Yk". > > --- a/gcc/config/i386/constraints.md > +++ b/gcc/config/i386/constraints.md > @@ -79,10 +79,10 @@ > "Second from top of 80387 floating-point stack (@code{%st(1)}).") > > (define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS" > -"@internal Any mask register that can be used as predicate, i.e. k1-k7.") > +"Any mask register that can be used as predicate, i.e. k1-k7.") > > (define_register_constraint "k" "TARGET_AVX512F ? ALL_MASK_REGS : NO_REGS" > -"@internal Any mask register.") > +"Any mask register.") > > ;; Vector registers (also used for plain floating point nowadays). > (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS" > >
>>> On 27.06.19 at 13:09, <ubizjak@gmail.com> wrote: > On Thu, Jun 27, 2019 at 12:11 PM Jan Beulich <JBeulich@suse.com> wrote: >> >> Without these constraints asm() can't make use of mask registers. > > asm should be deprecated. We have intrinsics for this purpose. While maybe not explicitly applicable here, the intrinsics aren't (afaict) providing full flexibility. In particular (just as example) I haven't found a way to use embedded broadcast with the intrinsics, but I can easily do so with asm(). Furthermore there are other reasons to use asm() - things like the Linux kernel are full of it for a reason. And once one has to use asm(), the resulting code typically is easier to follow if one doesn't further intermix it with uses of builtins. And finally, if asm() was indeed meant to be deprecated, how come it pretty recently got extended to allow for "inline"? Jan
On Thu, Jun 27, 2019 at 1:46 PM Jan Beulich <JBeulich@suse.com> wrote: > > >>> On 27.06.19 at 13:09, <ubizjak@gmail.com> wrote: > > On Thu, Jun 27, 2019 at 12:11 PM Jan Beulich <JBeulich@suse.com> wrote: > >> > >> Without these constraints asm() can't make use of mask registers. > > > > asm should be deprecated. We have intrinsics for this purpose. > > While maybe not explicitly applicable here, the intrinsics aren't > (afaict) providing full flexibility. In particular (just as example) > I haven't found a way to use embedded broadcast with the > intrinsics, but I can easily do so with asm(). > > Furthermore there are other reasons to use asm() - things like > the Linux kernel are full of it for a reason. And once one has > to use asm(), the resulting code typically is easier to follow if > one doesn't further intermix it with uses of builtins. > > And finally, if asm() was indeed meant to be deprecated, how > come it pretty recently got extended to allow for "inline"? I didn't mean that asm() in general should be deprecated, but for SSE and other vector extensions, where intrinsics are available, intrinsics should be used instead. There was exactly zero requests to use new asm constraints, it looks that people are satisfied with intrinsics approach (which is also future-proof, etc). Uros.
>>> On 27.06.19 at 14:00, <ubizjak@gmail.com> wrote: > On Thu, Jun 27, 2019 at 1:46 PM Jan Beulich <JBeulich@suse.com> wrote: >> >> >>> On 27.06.19 at 13:09, <ubizjak@gmail.com> wrote: >> > On Thu, Jun 27, 2019 at 12:11 PM Jan Beulich <JBeulich@suse.com> wrote: >> >> >> >> Without these constraints asm() can't make use of mask registers. >> > >> > asm should be deprecated. We have intrinsics for this purpose. >> >> While maybe not explicitly applicable here, the intrinsics aren't >> (afaict) providing full flexibility. In particular (just as example) >> I haven't found a way to use embedded broadcast with the >> intrinsics, but I can easily do so with asm(). >> >> Furthermore there are other reasons to use asm() - things like >> the Linux kernel are full of it for a reason. And once one has >> to use asm(), the resulting code typically is easier to follow if >> one doesn't further intermix it with uses of builtins. >> >> And finally, if asm() was indeed meant to be deprecated, how >> come it pretty recently got extended to allow for "inline"? > > I didn't mean that asm() in general should be deprecated, but for SSE > and other vector extensions, where intrinsics are available, > intrinsics should be used instead. There was exactly zero requests to > use new asm constraints, it looks that people are satisfied with > intrinsics approach (which is also future-proof, etc). So what about my embedded broadcast example then? "Zero requests" is clearly not exactly right. It simply didn't occur to me (until I noticed the @internal here) that I should raise such a request, rather than just using asm(). Subsequently I did then notice "Yh" going away, complicating things further ... I'd also like to note that the choice of types on some of the builtins makes it rather cumbersome to use them. Especially for scalar operations I've found myself better resorting to asm(). See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/x86_emulator/simd.c (most of the changes submitted (not so) recently have been coming from the work of putting together this and its sibling tests for the Xen Project instruction emulator). Jan
On Thu, Jun 27, 2019 at 2:23 PM Jan Beulich <JBeulich@suse.com> wrote: > > >>> On 27.06.19 at 14:00, <ubizjak@gmail.com> wrote: > > On Thu, Jun 27, 2019 at 1:46 PM Jan Beulich <JBeulich@suse.com> wrote: > >> > >> >>> On 27.06.19 at 13:09, <ubizjak@gmail.com> wrote: > >> > On Thu, Jun 27, 2019 at 12:11 PM Jan Beulich <JBeulich@suse.com> wrote: > >> >> > >> >> Without these constraints asm() can't make use of mask registers. > >> > > >> > asm should be deprecated. We have intrinsics for this purpose. > >> > >> While maybe not explicitly applicable here, the intrinsics aren't > >> (afaict) providing full flexibility. In particular (just as example) > >> I haven't found a way to use embedded broadcast with the > >> intrinsics, but I can easily do so with asm(). > >> > >> Furthermore there are other reasons to use asm() - things like > >> the Linux kernel are full of it for a reason. And once one has > >> to use asm(), the resulting code typically is easier to follow if > >> one doesn't further intermix it with uses of builtins. > >> > >> And finally, if asm() was indeed meant to be deprecated, how > >> come it pretty recently got extended to allow for "inline"? > > > > I didn't mean that asm() in general should be deprecated, but for SSE > > and other vector extensions, where intrinsics are available, > > intrinsics should be used instead. There was exactly zero requests to > > use new asm constraints, it looks that people are satisfied with > > intrinsics approach (which is also future-proof, etc). > > So what about my embedded broadcast example then? "Zero > requests" is clearly not exactly right. It simply didn't occur to me > (until I noticed the @internal here) that I should raise such a > request, rather than just using asm(). Subsequently I did then > notice "Yh" going away, complicating things further ... There was some work by HJ a while ago that implemented automatic generation of embedded broadcasts. Perhaps he can answer the question. Uros. > I'd also like to note that the choice of types on some of the builtins > makes it rather cumbersome to use them. Especially for scalar > operations I've found myself better resorting to asm(). See > https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/x86_emulator/simd.c > (most of the changes submitted (not so) recently have been > coming from the work of putting together this and its sibling > tests for the Xen Project instruction emulator). > > Jan > >
On Thu, Jun 27, 2019 at 05:46:00AM -0600, Jan Beulich wrote: > While maybe not explicitly applicable here, the intrinsics aren't > (afaict) providing full flexibility. In particular (just as example) > I haven't found a way to use embedded broadcast with the > intrinsics, but I can easily do so with asm(). > > Furthermore there are other reasons to use asm() - things like > the Linux kernel are full of it for a reason. Some of it for a (good) reason. Yes. > And once one has > to use asm(), the resulting code typically is easier to follow if > one doesn't further intermix it with uses of builtins. If you have to write more than a few lines of assembler, you are usually *much* better off just writing it in an assembler source file (a .s). > And finally, if asm() was indeed meant to be deprecated, how > come it pretty recently got extended to allow for "inline"? That was just a simple extension for a very specific purpose. This does not mean you should use inline assembler if there are better alternatives. I'm not sure Linux using such extremely huge inline assembler statements is a good idea at all, or if there are better ways to do what they want (not that I see any fwiw); but they already did, and this takes the pain away a bit for them, so why not. You should not use inline assembler when there are better way of expressing what you want. Like with builtins, for example. Inline assembler is hard to write correctly, and hard to *keep* correct. It is mixing abstractions (that is its *purpose*), which of course makes it a royal pain to deal with, esp. if overused. Segher
On Thu, Jun 27, 2019 at 5:26 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Thu, Jun 27, 2019 at 2:23 PM Jan Beulich <JBeulich@suse.com> wrote: > > > > >>> On 27.06.19 at 14:00, <ubizjak@gmail.com> wrote: > > > On Thu, Jun 27, 2019 at 1:46 PM Jan Beulich <JBeulich@suse.com> wrote: > > >> > > >> >>> On 27.06.19 at 13:09, <ubizjak@gmail.com> wrote: > > >> > On Thu, Jun 27, 2019 at 12:11 PM Jan Beulich <JBeulich@suse.com> wrote: > > >> >> > > >> >> Without these constraints asm() can't make use of mask registers. > > >> > > > >> > asm should be deprecated. We have intrinsics for this purpose. > > >> > > >> While maybe not explicitly applicable here, the intrinsics aren't > > >> (afaict) providing full flexibility. In particular (just as example) > > >> I haven't found a way to use embedded broadcast with the > > >> intrinsics, but I can easily do so with asm(). > > >> > > >> Furthermore there are other reasons to use asm() - things like > > >> the Linux kernel are full of it for a reason. And once one has > > >> to use asm(), the resulting code typically is easier to follow if > > >> one doesn't further intermix it with uses of builtins. > > >> > > >> And finally, if asm() was indeed meant to be deprecated, how > > >> come it pretty recently got extended to allow for "inline"? > > > > > > I didn't mean that asm() in general should be deprecated, but for SSE > > > and other vector extensions, where intrinsics are available, > > > intrinsics should be used instead. There was exactly zero requests to > > > use new asm constraints, it looks that people are satisfied with > > > intrinsics approach (which is also future-proof, etc). > > > > So what about my embedded broadcast example then? "Zero > > requests" is clearly not exactly right. It simply didn't occur to me > > (until I noticed the @internal here) that I should raise such a > > request, rather than just using asm(). Subsequently I did then > > notice "Yh" going away, complicating things further ... > > There was some work by HJ a while ago that implemented automatic > generation of embedded broadcasts. Perhaps he can answer the question. I implemented broadcast for some commonly used instructions. Adding broadcast to all will take time, especially when there is no user demand. > Uros. > > > I'd also like to note that the choice of types on some of the builtins > > makes it rather cumbersome to use them. Especially for scalar > > operations I've found myself better resorting to asm(). See > > https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/x86_emulator/simd.c > > (most of the changes submitted (not so) recently have been > > coming from the work of putting together this and its sibling > > tests for the Xen Project instruction emulator). > > > > Jan > > > >
--- a/gcc/config/i386/constraints.md +++ b/gcc/config/i386/constraints.md @@ -79,10 +79,10 @@ "Second from top of 80387 floating-point stack (@code{%st(1)}).") (define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS" -"@internal Any mask register that can be used as predicate, i.e. k1-k7.") +"Any mask register that can be used as predicate, i.e. k1-k7.") (define_register_constraint "k" "TARGET_AVX512F ? ALL_MASK_REGS : NO_REGS" -"@internal Any mask register.") +"Any mask register.") ;; Vector registers (also used for plain floating point nowadays). (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"