Message ID | 20110930.035938.868908024383419283.davem@davemloft.net |
---|---|
State | New |
Headers | show |
On 09/30/2011 12:59 AM, David Miller wrote: > > I tried to add the 'siam' instruction too but that one is really > difficult because it influences the behavior of every float operation > and I couldnt' find an easy way to express those dependencies. I > tried a few easy approaches but I couldn't reliably keep the compiler > from moving 'siam' across float operations. > > The 'siam' (Set Interval Arithmetic Mode) instruction is a mechanism > to override the float rounding mode on a cycle-to-cycle basis, ie. > without the cost of doing a write to the %fsr. I don't think I'd ever expose this via a builtin. This seems like a feature we've talked about for a long time, but have never done anything about. Specifically, in-compiler support for #pragma STDC FENV_ACCESS and the various <fenv.h> routines. We ought to be able to track the rounding mode (and other relevant parameters) on a per-expression basis, tagging each floating-point operation with the parameters in effect. At some point, at or after rtl generation time, we transform these saved parameters into manipulations of the fpu state. We have several options: (1) Alpha-like where e.g. the rounding mode is directly encoded in the instruction. No further optimization necessary, unless we are manipulating non-rounding parameters. (2) IA64-like where we have multiple fpu environments, and can encode which to use inside the instruction. However, in this case we also need to set up these alternate environments and merge back the exception state when the user reads it. (3) Use optimize-mode-switching to minimize the number of changes to the global state. This includes the use of SIAM vs %fsr, especially when a subroutine call could have changed the global rounding mode. All of which is a lot of work. > +(define_insn "bmask<P:mode>_vis" > + [(set (match_operand:P 0 "register_operand" "=r") > + (plus:P (match_operand:P 1 "register_operand" "rJ") > + (match_operand:P 2 "register_operand" "rJ"))) > + (clobber (reg:SI GSR_REG))] > + "TARGET_VIS2" > + "bmask\t%r1, %r2, %0" > + [(set_attr "type" "array")]) I think this is wrong. I think you want to model this as [(set (match_operand:DI 0 "register_operand" "=r") (plus:DI (match_operand:DI 1 "register_or_zero_operand" "rJ") (match_operand:DI 2 "register_or_zero_operand" "rJ"))) (set (zero_extract:DI (reg:DI GSR_REG) (const_int 32) (const_int 32)) (plus:DI (match_dup 1) (match_dup 2)))] (1) %gsr is really set to something, not just modified in uninteresting ways; we're going to use this value later. (2) Only the top 32 bits of %gsr are changed; the low 32 bits are still valid. You don't want insns that set the low 32 bits to be deleted as dead code. Which is what would happen (3) I realize this version makes things difficult for 32-bit mode. There, I think you may have to settle for an unspec. And perhaps the possible benefit of Properly representing the GSR change isn't that helpful. In which case: (set (reg:DI GSR_REG) (unspec:DI [(match_dup 1) (match_dup 2) (reg:DI GSR_REG)] UNSPEC_BMASK)) > +(define_insn "bshuffle<V64I:mode>_vis" > + [(set (match_operand:V64I 0 "register_operand" "=e") > + (unspec:V64I [(match_operand:V64I 1 "register_operand" "e") > + (match_operand:V64I 2 "register_operand" "e")] > + UNSPEC_BSHUFFLE)) > + (use (reg:SI GSR_REG))] Better to push the use of the GSR_REG into the unspec, and not leave it separate in the parallel. r~
On Fri, 30 Sep 2011, Richard Henderson wrote: > Specifically, in-compiler support for #pragma STDC FENV_ACCESS and the > various <fenv.h> routines. We ought to be able to track the rounding > mode (and other relevant parameters) on a per-expression basis, tagging > each floating-point operation with the parameters in effect. For C99 and C1X it's just dynamic rounding direction (changed by fesetround, possibly changed by calls to any non-pure function unless you can prove that function doesn't call fesetround, but the default mode can be presumed unless -frounding-math or the FENV_ACCESS pragma is in effect). (asms accessing the relevant registers also need to be considered.) N1582 (status report on the C bindings for IEEE 754-2008) mentions static rounding direction support but doesn't go into details. (Practically, static rounding directions are more useful for various floating-point algorithms.) Floating-point operations implicitly read the rounding mode. They implicitly write the exception flags (as, again, do most function calls) - except that generally they only set rather than clearing flags (but function calls may also call functions that clear them). The present defaults are -fno-rounding-math -ftrapping-math. I'm not sure that with a proper implementation this would really allow much more optimization than -frounding-math -ftrapping-math. Simply enabling exceptions should disable most constant folding where the result isn't exactly representable, because the "inexact" exception is required, for example; just knowing the rounding mode and so the value of the result isn't enough to fold. And if there aren't any function calls intervening, all combinations of these options will allow common subexpression elimination (since that doesn't change the set of exceptions raised, and no support is required for counting the number of times a particular exception was raised). So the right defaults once -ftrapping-math really does what it says aren't clear. I've thought a bit about implementation approaches, but mainly at the level of how to decouple the front-end and back-end parts from the full complexity of tracking pragma state for each expression (for example, by setting variables on a whole-function basis and restricting inlining). I've also thought about how to implement testcases providing reasonably thorough coverage of the exceptions and rounding modes issues. But I haven't had time to work on implementation of any of these pieces.
From: Richard Henderson <rth@redhat.com> Date: Fri, 30 Sep 2011 14:03:52 -0700 > (3) Use optimize-mode-switching to minimize the number of changes > to the global state. This includes the use of SIAM vs %fsr, > especially when a subroutine call could have changed the > global rounding mode. Indeed, and I incidentally took a look at the mode switching optimization framework and it appears that I could use it for providing insn patterns for 'rint' and friends like i386 does. > All of which is a lot of work. > >> +(define_insn "bmask<P:mode>_vis" >> + [(set (match_operand:P 0 "register_operand" "=r") >> + (plus:P (match_operand:P 1 "register_operand" "rJ") >> + (match_operand:P 2 "register_operand" "rJ"))) >> + (clobber (reg:SI GSR_REG))] >> + "TARGET_VIS2" >> + "bmask\t%r1, %r2, %0" >> + [(set_attr "type" "array")]) > > I think this is wrong. I think you want to model this as ... >> +(define_insn "bshuffle<V64I:mode>_vis" >> + [(set (match_operand:V64I 0 "register_operand" "=e") >> + (unspec:V64I [(match_operand:V64I 1 "register_operand" "e") >> + (match_operand:V64I 2 "register_operand" "e")] >> + UNSPEC_BSHUFFLE)) >> + (use (reg:SI GSR_REG))] > > Better to push the use of the GSR_REG into the unspec, and not leave > it separate in the parallel. Thanks Richard, I'll fix these up. In general, the GSR tracking needs a bit more work.
From: Richard Henderson <rth@redhat.com> Date: Fri, 30 Sep 2011 14:03:52 -0700 > On 09/30/2011 12:59 AM, David Miller wrote: >> >>[ VIS 2.0 bmask patterns ] > > I think this is wrong. I think you want to model this as > > [(set (match_operand:DI 0 "register_operand" "=r") > (plus:DI (match_operand:DI 1 "register_or_zero_operand" "rJ") > (match_operand:DI 2 "register_or_zero_operand" "rJ"))) > (set (zero_extract:DI > (reg:DI GSR_REG) > (const_int 32) > (const_int 32)) > (plus:DI (match_dup 1) (match_dup 2)))] Yep, perfect for 64-bit. > (3) I realize this version makes things difficult for 32-bit mode. > There, I think you may have to settle for an unspec. And perhaps > the possible benefit of Properly representing the GSR change isn't > that helpful. In which case: > > (set (reg:DI GSR_REG) > (unspec:DI [(match_dup 1) (match_dup 2) (reg:DI GSR_REG)] > UNSPEC_BMASK)) Actually, can't we just use a (zero_extend:DI (plus:SI ...)) for the 32-bit case? It seems to work fine. >> +(define_insn "bshuffle<V64I:mode>_vis" >> + [(set (match_operand:V64I 0 "register_operand" "=e") >> + (unspec:V64I [(match_operand:V64I 1 "register_operand" "e") >> + (match_operand:V64I 2 "register_operand" "e")] >> + UNSPEC_BSHUFFLE)) >> + (use (reg:SI GSR_REG))] > > Better to push the use of the GSR_REG into the unspec, and not leave > it separate in the parallel. This is actually just a non-constant vec_merge, and even though the internals documentation says that the 'items' operand has to be a const_int, the compiler actually doesn't care. The only two places vec_merge is even inspected semantically by the compiler is in the RTX simplifier where it already checks explicitly for const_int, and in dwarf2out.c where it just ignores the vec_merge construct entirely since it can't be represented. So if we just code this as: (set (match_operand:V64I 0 "register_operand" "=e") (vec_merge:V64I (match_operand:V64I 2 "register_operand" "e") (match_operand:V64I 1 "register_operand" "e") (zero_extract:DI (reg:DI GSR_REG) (const_int 32) (const_int 32)))) it would mostly work. The only problem with this is that we provide the bshuffle builtin for DI mode just as we do for the faligndata instruction. simplify-rtx.c isn't happy seeing a non-vector mode and VECTOR_MODES () in a foo-modes.def file won't generate modes like V1DI and V1SI. I guess I could explicitly generate those single entry vector modes like i386 does. But, as-is, the above pattern does work for all the "actual" vector modes. More generally, rtl.def is non-specific about what operand 2 of a vec_merge has to be, it just says a bitmask, it doesn't say that it has to be a const_int. I think we should explicitly allow non-const_int objects here, and state so in the internals documentation.
On 10/02/2011 10:28 PM, David Miller wrote: >> (set (reg:DI GSR_REG) >> (unspec:DI [(match_dup 1) (match_dup 2) (reg:DI GSR_REG)] >> UNSPEC_BMASK)) > > Actually, can't we just use a (zero_extend:DI (plus:SI ...)) for the > 32-bit case? It seems to work fine. Sure. >>> +(define_insn "bshuffle<V64I:mode>_vis" >>> + [(set (match_operand:V64I 0 "register_operand" "=e") >>> + (unspec:V64I [(match_operand:V64I 1 "register_operand" "e") >>> + (match_operand:V64I 2 "register_operand" "e")] >>> + UNSPEC_BSHUFFLE)) >>> + (use (reg:SI GSR_REG))] >> >> Better to push the use of the GSR_REG into the unspec, and not leave >> it separate in the parallel. > > This is actually just a non-constant vec_merge, and even though the internals > documentation says that the 'items' operand has to be a const_int, the compiler > actually doesn't care. Um, no it isn't. The VEC_MERGE pattern uses N bits to select N elements from op0 and op1: op0 = A B C D op1 = W X Y Z bmask = 0 1 0 1 = 3 result = A X C D Your insn doesn't use single bits for the select. It uses nibbles to select from the 16 input bytes. It's akin to the VEC_SELECT pattern, except that VEC_SELECT requires a constant input parallel. --- You might have a look at the "Vector Shuffle" thread, where we've been trying to provide builtin-level access to this feature. We've not added an rtx-level code for this because so far there isn't *that* much in common between the various cpus. They all seem to differ in niggling details... You'll have a somewhat harder time than i386 for this feature, given that you've got to pack bytes into nibbles. But it can certainly be done. r~
From: Richard Henderson <rth@redhat.com> Date: Mon, 03 Oct 2011 09:49:37 -0700 > You might have a look at the "Vector Shuffle" thread, where we've been > trying to provide builtin-level access to this feature. We've not added > an rtx-level code for this because so far there isn't *that* much in > common between the various cpus. They all seem to differ in niggling > details... > > You'll have a somewhat harder time than i386 for this feature, given > that you've got to pack bytes into nibbles. But it can certainly be done. Ok, I'll take a look.
On 10/03/2011 10:42 AM, David Miller wrote: >> You might have a look at the "Vector Shuffle" thread, where we've been >> trying to provide builtin-level access to this feature. We've not added >> an rtx-level code for this because so far there isn't *that* much in >> common between the various cpus. They all seem to differ in niggling >> details... >> >> You'll have a somewhat harder time than i386 for this feature, given >> that you've got to pack bytes into nibbles. But it can certainly be done. > > Ok, I'll take a look. Oh, you should know that, at present, our generic shuffle support assumes that shuffles with a constant control (which are also generated by the vectorizer) get expanded to builtins. And as builtins we wind up with lots of them -- one per type. I'm going to start fixing that in the coming week. The vectorizer will be changed to emit VEC_SHUFFLE_EXPR. It will still use the target hook to see if the constant shuffle is supported. The lower-vector pass currently tests the target hook and swaps the VEC_SHUFFLE_EXPRs that are validate into builtins. That will be changed to simply leave them unchanged if the other target hook returns NULL. As the targets are updated to use vshuffle, the builtins get deleted to return NULL. After all targets are updated, we can remove this check and the target hook itself. This should preserve bisection on each of the affected targets. The rtl expander won't have to change. The target backends will need to accept an immediate for vshuffle op3, if anything special ought to be done for constant shuffles. In addition, the builtins should be removed, as previously noted. r~
On Mon, Oct 3, 2011 at 7:07 PM, Richard Henderson <rth@redhat.com> wrote: > On 10/03/2011 10:42 AM, David Miller wrote: >>> You might have a look at the "Vector Shuffle" thread, where we've been >>> trying to provide builtin-level access to this feature. We've not added >>> an rtx-level code for this because so far there isn't *that* much in >>> common between the various cpus. They all seem to differ in niggling >>> details... >>> >>> You'll have a somewhat harder time than i386 for this feature, given >>> that you've got to pack bytes into nibbles. But it can certainly be done. >> >> Ok, I'll take a look. > > Oh, you should know that, at present, our generic shuffle support assumes > that shuffles with a constant control (which are also generated by the > vectorizer) get expanded to builtins. And as builtins we wind up with > lots of them -- one per type. > > I'm going to start fixing that in the coming week. > > The vectorizer will be changed to emit VEC_SHUFFLE_EXPR. It will still use > the target hook to see if the constant shuffle is supported. > > The lower-vector pass currently tests the target hook and swaps the > VEC_SHUFFLE_EXPRs that are validate into builtins. That will be changed > to simply leave them unchanged if the other target hook returns NULL. > As the targets are updated to use vshuffle, the builtins get deleted > to return NULL. After all targets are updated, we can remove this check > and the target hook itself. This should preserve bisection on each of > the affected targets. > > The rtl expander won't have to change. > > The target backends will need to accept an immediate for vshuffle op3, > if anything special ought to be done for constant shuffles. In addition, > the builtins should be removed, as previously noted. > > > r~ > Several orthogonal vector-shuffling issues. Currently if vec_perm_ok returns false, we do not try to use a new vshuffle routine. Would it make sense to implement that? The only potential problem I can see is a possible performance degradation. This leads us to the second issue. When we perform vshuffle, we need to know whether it make sense to use pshufb (in case of x86) or to perform data movement via standard non-simd registers. Do we have this information in the current cost-model? Also, in certain cases, when the mask is constant, I would assume the memory movement is also faster. For example if the mask is {4,5,6,7,0,1,2,3...}, then two integer moves should do a better job. Were there any attempts to perform such an analysis, and if not, should we formalise the cases when the substitution of sorts would make some sense. Thanks, Artem.
On 10/03/2011 11:40 AM, Artem Shinkarov wrote: > Currently if vec_perm_ok returns false, we do not try to use a new > vshuffle routine. Would it make sense to implement that? The only > potential problem I can see is a possible performance degradation. > This leads us to the second issue. Implement that where? In the vectorizer? No, I don't think so. The _ok routine, while also indicating what the backend expander supports, could also be thought of as a cost cutoff predicate. Unless the vectorization folk request some more exact cost metric I don't see any reason to change this. > When we perform vshuffle, we need to know whether it make sense to use > pshufb (in case of x86) or to perform data movement via standard > non-simd registers. Do we have this information in the current > cost-model? Not really. Again, if you're talking about the vectorizer, it gets even more complicated than this because... > Also, in certain cases, when the mask is constant, I would > assume the memory movement is also faster. For example if the mask is > {4,5,6,7,0,1,2,3...}, then two integer moves should do a better job. ... even before SSSE3 PSHUFB, we have all sorts of insns that can perform a constant shuffle without having to resort to either general-purpose registers or memory. E.g. PSHUFD. For specific data types, we can handle arbitrary constant shuffle with 1 or 2 insns, even when arbitrary variable shuffles aren't. It's certainly something that we could add to tree-vect-generic.c. I have no plans to do anything of the sort, however. r~
On Mon, Oct 3, 2011 at 8:02 PM, Richard Henderson <rth@redhat.com> wrote: > On 10/03/2011 11:40 AM, Artem Shinkarov wrote: >> Currently if vec_perm_ok returns false, we do not try to use a new >> vshuffle routine. Would it make sense to implement that? The only >> potential problem I can see is a possible performance degradation. >> This leads us to the second issue. > > Implement that where? In the vectorizer? No, I don't think so. > The _ok routine, while also indicating what the backend expander > supports, could also be thought of as a cost cutoff predicate. > Unless the vectorization folk request some more exact cost metric > I don't see any reason to change this. I was thinking more about the expander of the backend itself. When we throw sorry () in the ix86_expand_vec_perm_builtin, we can fall back to the vshuffle routine, unless it would lead to the performance degradation. >> When we perform vshuffle, we need to know whether it make sense to use >> pshufb (in case of x86) or to perform data movement via standard >> non-simd registers. Do we have this information in the current >> cost-model? > > Not really. Again, if you're talking about the vectorizer, it > gets even more complicated than this because... > >> Also, in certain cases, when the mask is constant, I would >> assume the memory movement is also faster. For example if the mask is >> {4,5,6,7,0,1,2,3...}, then two integer moves should do a better job. > > ... even before SSSE3 PSHUFB, we have all sorts of insns that can > perform a constant shuffle without having to resort to either > general-purpose registers or memory. E.g. PSHUFD. For specific > data types, we can handle arbitrary constant shuffle with 1 or 2 > insns, even when arbitrary variable shuffles aren't. But these cases are more or less covered. I am thinking about the cases when vec_perm_ok, returns false, but the actual permutation could be done faster with memory/register transfers, rather than with the PSHUFB & Co. > It's certainly something that we could add to tree-vect-generic.c. > I have no plans to do anything of the sort, however. I didn't quite understand what do you think can be added to the tree-vect-generic? I thought that we are talking about more or less backend issues. In any case I am investigating these problems, and I will appreciate any help or advices. Thanks, Artem.
difficult because it influences the behavior of every float operation and I couldnt' find an easy way to express those dependencies. I tried a few easy approaches but I couldn't reliably keep the compiler from moving 'siam' across float operations. The 'siam' (Set Interval Arithmetic Mode) instruction is a mechanism to override the float rounding mode on a cycle-to-cycle basis, ie. without the cost of doing a write to the %fsr. But the rest of the VIS 2.0 stuff is here and was reasonably straightforward to add. Commited to trunk. gcc/ * config/sparc/sparc.opt (VIS2): New option. * doc/invoke.texi: Document it. * config/sparc/sparc.md (UNSPEC_EDGE8N, UNSPEC_EDGE8LN, UNSPEC_EDGE16N, UNSPEC_EDGE16LN, UNSPEC_EDGE32N, UNSPEC_EDGE32LN, UNSPEC_BSHUFFLE): New unspecs. (define_attr type): New insn type 'edgen'. (bmask<P:mode>_vis, bshuffle<V64I:mode>_vis, edge8n<P:mode>_vis, edge8ln<P:mode>_vis, edge16n<P:mode>_vis, edge16ln<P:mode>_vis, edge32n<P:mode>_vis, edge32ln<P:mode>_vis): New insn VIS 2.0 patterns. * niagara.md: Handle edgen. * niagara2.md: Likewise. * ultra1_2.md: Likewise. * ultra3.md: Likewise. * config/sparc/sparc-c.c (sparc_target_macros): Define __VIS__ to 0x200 when TARGET_VIS2. * config/sparc/sparc.c (sparc_option_override): Set MASK_VIS2 by default when targetting capable cpus. TARGET_VIS2 implies TARGET_VIS, clear and it when TARGET_FPU is disabled. (sparc_vis_init_builtins): Emit new VIS 2.0 builtins. (sparc_expand_builtin): Fix predicate indexing when builtin returns void. (sparc_fold_builtin): Do not eliminate bmask when result is ignored. * config/sparc/visintrin.h (__vis_bmask, __vis_bshuffledi, __vis_bshufflev2si, __vis_bshufflev4hi, __vis_bshufflev8qi, __vis_edge8n, __vis_edge8ln, __vis_edge16n, __vis_edge16ln, __vis_edge32n, __vis_edge32ln): New VIS 2.0 interfaces. * doc/extend.texi: Document new VIS 2.0 builtins. gcc/testsuite/ * gcc.target/sparc/bmaskbshuf.c: New test. * gcc.target/sparc/edgen.c: New test. --- gcc/ChangeLog | 31 ++++++++++ gcc/config/sparc/niagara.md | 2 +- gcc/config/sparc/niagara2.md | 4 +- gcc/config/sparc/sparc-c.c | 7 ++- gcc/config/sparc/sparc.c | 77 +++++++++++++++++++++--- gcc/config/sparc/sparc.md | 85 ++++++++++++++++++++++++++- gcc/config/sparc/sparc.opt | 6 ++- gcc/config/sparc/ultra1_2.md | 2 +- gcc/config/sparc/ultra3.md | 2 +- gcc/config/sparc/visintrin.h | 77 ++++++++++++++++++++++++ gcc/doc/extend.texi | 18 ++++++ gcc/doc/invoke.texi | 12 ++++- gcc/testsuite/ChangeLog | 5 ++ gcc/testsuite/gcc.target/sparc/bmaskbshuf.c | 34 +++++++++++ gcc/testsuite/gcc.target/sparc/edgen.c | 39 ++++++++++++ 15 files changed, 382 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gcc.target/sparc/bmaskbshuf.c create mode 100644 gcc/testsuite/gcc.target/sparc/edgen.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index ea5c6d0..96cd9d5 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,34 @@ +2011-09-30 David S. Miller <davem@davemloft.net> + + * config/sparc/sparc.opt (VIS2): New option. + * doc/invoke.texi: Document it. + * config/sparc/sparc.md (UNSPEC_EDGE8N, UNSPEC_EDGE8LN, + UNSPEC_EDGE16N, UNSPEC_EDGE16LN, UNSPEC_EDGE32N, + UNSPEC_EDGE32LN, UNSPEC_BSHUFFLE): New unspecs. + (define_attr type): New insn type 'edgen'. + (bmask<P:mode>_vis, bshuffle<V64I:mode>_vis, edge8n<P:mode>_vis, + edge8ln<P:mode>_vis, edge16n<P:mode>_vis, edge16ln<P:mode>_vis, + edge32n<P:mode>_vis, edge32ln<P:mode>_vis): New insn VIS 2.0 + patterns. + * niagara.md: Handle edgen. + * niagara2.md: Likewise. + * ultra1_2.md: Likewise. + * ultra3.md: Likewise. + * config/sparc/sparc-c.c (sparc_target_macros): Define __VIS__ + to 0x200 when TARGET_VIS2. + * config/sparc/sparc.c (sparc_option_override): Set MASK_VIS2 by + default when targetting capable cpus. TARGET_VIS2 implies + TARGET_VIS, clear and it when TARGET_FPU is disabled. + (sparc_vis_init_builtins): Emit new VIS 2.0 builtins. + (sparc_expand_builtin): Fix predicate indexing when builtin returns + void. + (sparc_fold_builtin): Do not eliminate bmask when result is ignored. + * config/sparc/visintrin.h (__vis_bmask, __vis_bshuffledi, + __vis_bshufflev2si, __vis_bshufflev4hi, __vis_bshufflev8qi, + __vis_edge8n, __vis_edge8ln, __vis_edge16n, __vis_edge16ln, + __vis_edge32n, __vis_edge32ln): New VIS 2.0 interfaces. + * doc/extend.texi: Document new VIS 2.0 builtins. + 2011-09-29 Nick Clifton <nickc@redhat.com> Bernd Schmidt <bernds@codesourcery.com> diff --git a/gcc/config/sparc/niagara.md b/gcc/config/sparc/niagara.md index a75088b..c7a2245 100644 --- a/gcc/config/sparc/niagara.md +++ b/gcc/config/sparc/niagara.md @@ -114,5 +114,5 @@ */ (define_insn_reservation "niag_vis" 8 (and (eq_attr "cpu" "niagara") - (eq_attr "type" "fga,fgm_pack,fgm_mul,fgm_cmp,fgm_pdist,edge,gsr,array")) + (eq_attr "type" "fga,fgm_pack,fgm_mul,fgm_cmp,fgm_pdist,edge,edgen,gsr,array")) "niag_pipe*8") diff --git a/gcc/config/sparc/niagara2.md b/gcc/config/sparc/niagara2.md index f261ac1..fa07bec 100644 --- a/gcc/config/sparc/niagara2.md +++ b/gcc/config/sparc/niagara2.md @@ -111,10 +111,10 @@ (define_insn_reservation "niag2_vis" 6 (and (eq_attr "cpu" "niagara2") - (eq_attr "type" "fga,fgm_pack,fgm_mul,fgm_cmp,fgm_pdist,edge,array,gsr")) + (eq_attr "type" "fga,fgm_pack,fgm_mul,fgm_cmp,fgm_pdist,edge,edgen,array,gsr")) "niag2_pipe*6") (define_insn_reservation "niag3_vis" 9 (and (eq_attr "cpu" "niagara3") - (eq_attr "type" "fga,fgm_pack,fgm_mul,fgm_cmp,fgm_pdist,edge,array,gsr")) + (eq_attr "type" "fga,fgm_pack,fgm_mul,fgm_cmp,fgm_pdist,edge,edgen,array,gsr")) "niag2_pipe*9") diff --git a/gcc/config/sparc/sparc-c.c b/gcc/config/sparc/sparc-c.c index 6e30950..0f2bee1 100644 --- a/gcc/config/sparc/sparc-c.c +++ b/gcc/config/sparc/sparc-c.c @@ -45,7 +45,12 @@ sparc_target_macros (void) cpp_assert (parse_in, "machine=sparc"); } - if (TARGET_VIS) + if (TARGET_VIS2) + { + cpp_define (parse_in, "__VIS__=0x200"); + cpp_define (parse_in, "__VIS=0x200"); + } + else if (TARGET_VIS) { cpp_define (parse_in, "__VIS__=0x100"); cpp_define (parse_in, "__VIS=0x100"); diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index c8c0677..9863174 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -769,16 +769,16 @@ sparc_option_override (void) /* UltraSPARC III */ /* ??? Check if %y issue still holds true. */ { MASK_ISA, - MASK_V9|MASK_DEPRECATED_V8_INSNS}, + MASK_V9|MASK_DEPRECATED_V8_INSNS|MASK_VIS2}, /* UltraSPARC T1 */ { MASK_ISA, MASK_V9|MASK_DEPRECATED_V8_INSNS}, /* UltraSPARC T2 */ - { MASK_ISA, MASK_V9}, + { MASK_ISA, MASK_V9|MASK_VIS2}, /* UltraSPARC T3 */ - { MASK_ISA, MASK_V9 | MASK_FMAF}, + { MASK_ISA, MASK_V9|MASK_VIS2|MASK_FMAF}, /* UltraSPARC T4 */ - { MASK_ISA, MASK_V9 | MASK_FMAF}, + { MASK_ISA, MASK_V9|MASK_VIS2|MASK_FMAF}, }; const struct cpu_table *cpu; unsigned int i; @@ -857,9 +857,13 @@ sparc_option_override (void) if (target_flags_explicit & MASK_FPU) target_flags = (target_flags & ~MASK_FPU) | fpu; - /* Don't allow -mvis or -mfmaf if FPU is disabled. */ + /* -mvis2 implies -mvis */ + if (TARGET_VIS2) + target_flags |= MASK_VIS; + + /* Don't allow -mvis, -mvis2, or -mfmaf if FPU is disabled. */ if (! TARGET_FPU) - target_flags &= ~(MASK_VIS | MASK_FMAF); + target_flags &= ~(MASK_VIS | MASK_VIS2 | MASK_FMAF); /* -mvis assumes UltraSPARC+, so we are sure v9 instructions are available. @@ -9300,6 +9304,21 @@ sparc_vis_init_builtins (void) di_ftype_ptr_ptr); def_builtin_const ("__builtin_vis_edge32l", CODE_FOR_edge32ldi_vis, di_ftype_ptr_ptr); + if (TARGET_VIS2) + { + def_builtin_const ("__builtin_vis_edge8n", CODE_FOR_edge8ndi_vis, + di_ftype_ptr_ptr); + def_builtin_const ("__builtin_vis_edge8ln", CODE_FOR_edge8lndi_vis, + di_ftype_ptr_ptr); + def_builtin_const ("__builtin_vis_edge16n", CODE_FOR_edge16ndi_vis, + di_ftype_ptr_ptr); + def_builtin_const ("__builtin_vis_edge16ln", CODE_FOR_edge16lndi_vis, + di_ftype_ptr_ptr); + def_builtin_const ("__builtin_vis_edge32n", CODE_FOR_edge32ndi_vis, + di_ftype_ptr_ptr); + def_builtin_const ("__builtin_vis_edge32ln", CODE_FOR_edge32lndi_vis, + di_ftype_ptr_ptr); + } } else { @@ -9315,6 +9334,21 @@ sparc_vis_init_builtins (void) si_ftype_ptr_ptr); def_builtin_const ("__builtin_vis_edge32l", CODE_FOR_edge32lsi_vis, si_ftype_ptr_ptr); + if (TARGET_VIS2) + { + def_builtin_const ("__builtin_vis_edge8n", CODE_FOR_edge8nsi_vis, + si_ftype_ptr_ptr); + def_builtin_const ("__builtin_vis_edge8ln", CODE_FOR_edge8lnsi_vis, + si_ftype_ptr_ptr); + def_builtin_const ("__builtin_vis_edge16n", CODE_FOR_edge16nsi_vis, + si_ftype_ptr_ptr); + def_builtin_const ("__builtin_vis_edge16ln", CODE_FOR_edge16lnsi_vis, + si_ftype_ptr_ptr); + def_builtin_const ("__builtin_vis_edge32n", CODE_FOR_edge32nsi_vis, + si_ftype_ptr_ptr); + def_builtin_const ("__builtin_vis_edge32ln", CODE_FOR_edge32lnsi_vis, + si_ftype_ptr_ptr); + } } /* Pixel compare. */ @@ -9394,6 +9428,25 @@ sparc_vis_init_builtins (void) def_builtin_const ("__builtin_vis_array32", CODE_FOR_array32si_vis, si_ftype_si_si); } + + if (TARGET_VIS2) + { + /* Byte mask and shuffle */ + if (TARGET_ARCH64) + def_builtin ("__builtin_vis_bmask", CODE_FOR_bmaskdi_vis, + di_ftype_di_di); + else + def_builtin ("__builtin_vis_bmask", CODE_FOR_bmasksi_vis, + si_ftype_si_si); + def_builtin ("__builtin_vis_bshufflev4hi", CODE_FOR_bshufflev4hi_vis, + v4hi_ftype_v4hi_v4hi); + def_builtin ("__builtin_vis_bshufflev8qi", CODE_FOR_bshufflev8qi_vis, + v8qi_ftype_v8qi_v8qi); + def_builtin ("__builtin_vis_bshufflev2si", CODE_FOR_bshufflev2si_vis, + v2si_ftype_v2si_v2si); + def_builtin ("__builtin_vis_bshuffledi", CODE_FOR_bshuffledi_vis, + di_ftype_di_di); + } } /* Handle TARGET_EXPAND_BUILTIN target hook. @@ -9428,16 +9481,18 @@ sparc_expand_builtin (tree exp, rtx target, FOR_EACH_CALL_EXPR_ARG (arg, iter, exp) { const struct insn_operand_data *insn_op; + int idx; if (arg == error_mark_node) return NULL_RTX; arg_count++; - insn_op = &insn_data[icode].operand[arg_count - !nonvoid]; + idx = arg_count - !nonvoid; + insn_op = &insn_data[icode].operand[idx]; op[arg_count] = expand_normal (arg); - if (! (*insn_data[icode].operand[arg_count].predicate) (op[arg_count], - insn_op->mode)) + if (! (*insn_data[icode].operand[idx].predicate) (op[arg_count], + insn_op->mode)) op[arg_count] = copy_to_mode_reg (insn_op->mode, op[arg_count]); } @@ -9556,7 +9611,9 @@ sparc_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, if (ignore && icode != CODE_FOR_alignaddrsi_vis && icode != CODE_FOR_alignaddrdi_vis - && icode != CODE_FOR_wrgsr_vis) + && icode != CODE_FOR_wrgsr_vis + && icode != CODE_FOR_bmasksi_vis + && icode != CODE_FOR_bmaskdi_vis) return build_zero_cst (rtype); switch (icode) diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md index 2def8d1..0446955 100644 --- a/gcc/config/sparc/sparc.md +++ b/gcc/config/sparc/sparc.md @@ -72,6 +72,14 @@ (UNSPEC_SP_SET 60) (UNSPEC_SP_TEST 61) + + (UNSPEC_EDGE8N 70) + (UNSPEC_EDGE8LN 71) + (UNSPEC_EDGE16N 72) + (UNSPEC_EDGE16LN 73) + (UNSPEC_EDGE32N 74) + (UNSPEC_EDGE32LN 75) + (UNSPEC_BSHUFFLE 76) ]) (define_constants @@ -240,7 +248,7 @@ fpcmp, fpmul,fpdivs,fpdivd, fpsqrts,fpsqrtd, - fga,fgm_pack,fgm_mul,fgm_pdist,fgm_cmp,edge,gsr,array, + fga,fgm_pack,fgm_mul,fgm_pdist,fgm_cmp,edge,edgen,gsr,array, cmove, ialuX, multi,savew,flushw,iflush,trap" @@ -8188,4 +8196,79 @@ "array32\t%r1, %r2, %0" [(set_attr "type" "array")]) +(define_insn "bmask<P:mode>_vis" + [(set (match_operand:P 0 "register_operand" "=r") + (plus:P (match_operand:P 1 "register_operand" "rJ") + (match_operand:P 2 "register_operand" "rJ"))) + (clobber (reg:SI GSR_REG))] + "TARGET_VIS2" + "bmask\t%r1, %r2, %0" + [(set_attr "type" "array")]) + +(define_insn "bshuffle<V64I:mode>_vis" + [(set (match_operand:V64I 0 "register_operand" "=e") + (unspec:V64I [(match_operand:V64I 1 "register_operand" "e") + (match_operand:V64I 2 "register_operand" "e")] + UNSPEC_BSHUFFLE)) + (use (reg:SI GSR_REG))] + "TARGET_VIS2" + "bshuffle\t%1, %2, %0" + [(set_attr "type" "fga") + (set_attr "fptype" "double")]) + +;; VIS 2.0 adds edge variants which do not set the condition codes +(define_insn "edge8n<P:mode>_vis" + [(set (match_operand:P 0 "register_operand" "=r") + (unspec:P [(match_operand:P 1 "register_operand" "rJ") + (match_operand:P 2 "register_operand" "rJ")] + UNSPEC_EDGE8N))] + "TARGET_VIS2" + "edge8n\t%r1, %r2, %0" + [(set_attr "type" "edgen")]) + +(define_insn "edge8ln<P:mode>_vis" + [(set (match_operand:P 0 "register_operand" "=r") + (unspec:P [(match_operand:P 1 "register_operand" "rJ") + (match_operand:P 2 "register_operand" "rJ")] + UNSPEC_EDGE8LN))] + "TARGET_VIS2" + "edge8ln\t%r1, %r2, %0" + [(set_attr "type" "edgen")]) + +(define_insn "edge16n<P:mode>_vis" + [(set (match_operand:P 0 "register_operand" "=r") + (unspec:P [(match_operand:P 1 "register_operand" "rJ") + (match_operand:P 2 "register_operand" "rJ")] + UNSPEC_EDGE16N))] + "TARGET_VIS2" + "edge16n\t%r1, %r2, %0" + [(set_attr "type" "edgen")]) + +(define_insn "edge16ln<P:mode>_vis" + [(set (match_operand:P 0 "register_operand" "=r") + (unspec:P [(match_operand:P 1 "register_operand" "rJ") + (match_operand:P 2 "register_operand" "rJ")] + UNSPEC_EDGE16LN))] + "TARGET_VIS2" + "edge16ln\t%r1, %r2, %0" + [(set_attr "type" "edgen")]) + +(define_insn "edge32n<P:mode>_vis" + [(set (match_operand:P 0 "register_operand" "=r") + (unspec:P [(match_operand:P 1 "register_operand" "rJ") + (match_operand:P 2 "register_operand" "rJ")] + UNSPEC_EDGE32N))] + "TARGET_VIS2" + "edge32n\t%r1, %r2, %0" + [(set_attr "type" "edgen")]) + +(define_insn "edge32ln<P:mode>_vis" + [(set (match_operand:P 0 "register_operand" "=r") + (unspec:P [(match_operand:P 1 "register_operand" "rJ") + (match_operand:P 2 "register_operand" "rJ")] + UNSPEC_EDGE32LN))] + "TARGET_VIS2" + "edge32ln\t%r1, %r2, %0" + [(set_attr "type" "edge")]) + (include "sync.md") diff --git a/gcc/config/sparc/sparc.opt b/gcc/config/sparc/sparc.opt index 6be6a75..a7b60c8 100644 --- a/gcc/config/sparc/sparc.opt +++ b/gcc/config/sparc/sparc.opt @@ -59,7 +59,11 @@ Compile for V8+ ABI mvis Target Report Mask(VIS) -Use UltraSPARC Visual Instruction Set extensions +Use UltraSPARC Visual Instruction Set version 1.0 extensions + +mvis2 +Target Report Mask(VIS2) +Use UltraSPARC Visual Instruction Set version 2.0 extensions mfmaf Target Report Mask(FMAF) diff --git a/gcc/config/sparc/ultra1_2.md b/gcc/config/sparc/ultra1_2.md index 4600205..9cdebab 100644 --- a/gcc/config/sparc/ultra1_2.md +++ b/gcc/config/sparc/ultra1_2.md @@ -94,7 +94,7 @@ (define_insn_reservation "us1_simple_ieu1" 1 (and (eq_attr "cpu" "ultrasparc") - (eq_attr "type" "compare,edge,array")) + (eq_attr "type" "compare,edge,edgen,array")) "us1_ieu1 + us1_slot012") (define_insn_reservation "us1_ialuX" 1 diff --git a/gcc/config/sparc/ultra3.md b/gcc/config/sparc/ultra3.md index c6a9f89..c891e35 100644 --- a/gcc/config/sparc/ultra3.md +++ b/gcc/config/sparc/ultra3.md @@ -56,7 +56,7 @@ (define_insn_reservation "us3_array" 2 (and (eq_attr "cpu" "ultrasparc3") - (eq_attr "type" "array")) + (eq_attr "type" "array,edgen")) "us3_ms + us3_slotany, nothing") ;; ??? Not entirely accurate. diff --git a/gcc/config/sparc/visintrin.h b/gcc/config/sparc/visintrin.h index 3bef099..1688301 100644 --- a/gcc/config/sparc/visintrin.h +++ b/gcc/config/sparc/visintrin.h @@ -354,4 +354,81 @@ __vis_array32 (long __A, long __B) return __builtin_vis_array32 (__A, __B); } +extern __inline long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__vis_bmask (long __A, long __B) +{ + return __builtin_vis_bmask (__A, __B); +} + +extern __inline __i64 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__vis_bshuffledi (__i64 __A, __i64 __B) +{ + return __builtin_vis_bshuffledi (__A, __B); +} + +extern __inline __v2si +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__vis_bshufflev2si (__v2si __A, __v2si __B) +{ + return __builtin_vis_bshufflev2si (__A, __B); +} + +extern __inline __v4hi +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__vis_bshufflev4hi (__v4hi __A, __v4hi __B) +{ + return __builtin_vis_bshufflev4hi (__A, __B); +} + +extern __inline __v8qi +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__vis_bshufflev8qi (__v8qi __A, __v8qi __B) +{ + return __builtin_vis_bshufflev8qi (__A, __B); +} + +extern __inline long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__vis_edge8n (void *__A, void *__B) +{ + return __builtin_vis_edge8n (__A, __B); +} + +extern __inline long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__vis_edge8ln (void *__A, void *__B) +{ + return __builtin_vis_edge8ln (__A, __B); +} + +extern __inline long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__vis_edge16n (void *__A, void *__B) +{ + return __builtin_vis_edge16n (__A, __B); +} + +extern __inline long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__vis_edge16ln (void *__A, void *__B) +{ + return __builtin_vis_edge16ln (__A, __B); +} + +extern __inline long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__vis_edge32n (void *__A, void *__B) +{ + return __builtin_vis_edge32n (__A, __B); +} + +extern __inline long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__vis_edge32ln (void *__A, void *__B) +{ + return __builtin_vis_edge32ln (__A, __B); +} + #endif /* _VISINTRIN_H_INCLUDED */ diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index e8a777d..7ca50da 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -13016,6 +13016,24 @@ long __builtin_vis_array16 (long, long); long __builtin_vis_array32 (long, long); @end smallexample +Additionally, when you use the @option{-mvis2} switch, the VIS version +2.0 built-in functions become available: + +@smallexample +long __builtin_vis_bmask (long, long); +int64_t __builtin_vis_bshuffledi (int64_t, int64_t); +v2si __builtin_vis_bshufflev2si (v2si, v2si); +v4hi __builtin_vis_bshufflev2si (v4hi, v4hi); +v8qi __builtin_vis_bshufflev2si (v8qi, v8qi); + +long __builtin_vis_edge8n (void *, void *); +long __builtin_vis_edge8ln (void *, void *); +long __builtin_vis_edge16n (void *, void *); +long __builtin_vis_edge16ln (void *, void *); +long __builtin_vis_edge32n (void *, void *); +long __builtin_vis_edge32ln (void *, void *); +@end smallexample + @node SPU Built-in Functions @subsection SPU Built-in Functions diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index e166964..0ce15ff 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -880,7 +880,7 @@ See RS/6000 and PowerPC Options. -mstack-bias -mno-stack-bias @gol -munaligned-doubles -mno-unaligned-doubles @gol -mv8plus -mno-v8plus -mvis -mno-vis @gol --mfmaf -mno-fmaf} +-mvis2 -mno-vis2 -mfmaf -mno-fmaf} @emph{SPU Options} @gccoptlist{-mwarn-reloc -merror-reloc @gol @@ -17430,6 +17430,16 @@ mode for all SPARC-V9 processors. With @option{-mvis}, GCC generates code that takes advantage of the UltraSPARC Visual Instruction Set extensions. The default is @option{-mno-vis}. +@item -mvis2 +@itemx -mno-vis2 +@opindex mvis2 +@opindex mno-vis2 +With @option{-mvis2}, GCC generates code that takes advantage of +version 2.0 of the UltraSPARC Visual Instruction Set extensions. The +default is @option{-mvis2} when targetting a cpu that supports such +instructions, such as UltraSPARC-III and later. Setting @option{-mvis2} +also sets @option{-mvis}. + @item -mfmaf @itemx -mno-fmaf @opindex mfmaf diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index fb41e55..e96612c 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2011-09-30 David S. Miller <davem@davemloft.net> + + * gcc.target/sparc/bmaskbshuf.c: New test. + * gcc.target/sparc/edgen.c: New test. + 2011-09-29 Janus Weil <janus@gcc.gnu.org> PR fortran/50547 diff --git a/gcc/testsuite/gcc.target/sparc/bmaskbshuf.c b/gcc/testsuite/gcc.target/sparc/bmaskbshuf.c new file mode 100644 index 0000000..7108a01 --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/bmaskbshuf.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-O -mcpu=ultrasparc3 -mvis -mvis2" } */ +typedef long long int64_t; +typedef int vec32 __attribute__((vector_size(8))); +typedef short vec16 __attribute__((vector_size(8))); +typedef unsigned char vec8 __attribute__((vector_size(8))); + +long test_bmask (long x, long y) +{ + return __builtin_vis_bmask (x, y); +} + +vec16 test_bshufv4hi (vec16 x, vec16 y) +{ + return __builtin_vis_bshufflev4hi (x, y); +} + +vec32 test_bshufv2si (vec32 x, vec32 y) +{ + return __builtin_vis_bshufflev2si (x, y); +} + +vec8 test_bshufv8qi (vec8 x, vec8 y) +{ + return __builtin_vis_bshufflev8qi (x, y); +} + +int64_t test_bshufdi (int64_t x, int64_t y) +{ + return __builtin_vis_bshuffledi (x, y); +} + +/* { dg-final { scan-assembler "bmask\t%" } } */ +/* { dg-final { scan-assembler "bshuffle\t%" } } */ diff --git a/gcc/testsuite/gcc.target/sparc/edgen.c b/gcc/testsuite/gcc.target/sparc/edgen.c new file mode 100644 index 0000000..11973b5 --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/edgen.c @@ -0,0 +1,39 @@ +/* { dg-do compile } */ +/* { dg-options "-O -mcpu=ultrasparc3 -mvis" } */ + +long test_edge8n (void *p1, void *p2) +{ + return __builtin_vis_edge8n (p1, p2); +} + +long test_edge8ln (void *p1, void *p2) +{ + return __builtin_vis_edge8ln (p1, p2); +} + +long test_edge16n (void *p1, void *p2) +{ + return __builtin_vis_edge16n (p1, p2); +} + +long test_edge16ln (void *p1, void *p2) +{ + return __builtin_vis_edge16ln (p1, p2); +} + +long test_edge32n (void *p1, void *p2) +{ + return __builtin_vis_edge32n (p1, p2); +} + +long test_edge32ln (void *p1, void *p2) +{ + return __builtin_vis_edge32ln (p1, p2); +} + +/* { dg-final { scan-assembler "edge8n\t%" } } */ +/* { dg-final { scan-assembler "edge8ln\t%" } } */ +/* { dg-final { scan-assembler "edge16n\t%" } } */ +/* { dg-final { scan-assembler "edge16ln\t%" } } */ +/* { dg-final { scan-assembler "edge32n\t%" } } */ +/* { dg-final { scan-assembler "edge32ln\t%" } } */