diff mbox

Add sparc VIS 2.0 builtins, intrinsics, and option to control them.

Message ID 20110930.035938.868908024383419283.davem@davemloft.net
State New
Headers show

Commit Message

David Miller Sept. 30, 2011, 7:59 a.m. UTC
I tried to add the 'siam' instruction too but that one is really

Comments

Richard Henderson Sept. 30, 2011, 9:03 p.m. UTC | #1
On 09/30/2011 12:59 AM, David Miller wrote:
> 
> I tried to add the 'siam' instruction too but that one is really
> difficult because it influences the behavior of every float operation
> and I couldnt' find an easy way to express those dependencies.  I
> tried a few easy approaches but I couldn't reliably keep the compiler
> from moving 'siam' across float operations.
> 
> The 'siam' (Set Interval Arithmetic Mode) instruction is a mechanism
> to override the float rounding mode on a cycle-to-cycle basis, ie.
> without the cost of doing a write to the %fsr.

I don't think I'd ever expose this via a builtin.  This seems like a feature
we've talked about for a long time, but have never done anything about.

Specifically, in-compiler support for #pragma STDC FENV_ACCESS and the
various <fenv.h> routines.  We ought to be able to track the rounding
mode (and other relevant parameters) on a per-expression basis, tagging
each floating-point operation with the parameters in effect.

At some point, at or after rtl generation time, we transform these saved
parameters into manipulations of the fpu state.  We have several options:

  (1) Alpha-like where e.g. the rounding mode is directly encoded in
      the instruction.  No further optimization necessary, unless we
      are manipulating non-rounding parameters.

  (2) IA64-like where we have multiple fpu environments, and can
      encode which to use inside the instruction.  However, in this
      case we also need to set up these alternate environments and
      merge back the exception state when the user reads it.

  (3) Use optimize-mode-switching to minimize the number of changes
      to the global state.  This includes the use of SIAM vs %fsr,
      especially when a subroutine call could have changed the
      global rounding mode.

All of which is a lot of work.

> +(define_insn "bmask<P:mode>_vis"
> +  [(set (match_operand:P 0 "register_operand" "=r")
> +        (plus:P (match_operand:P 1 "register_operand" "rJ")
> +                (match_operand:P 2 "register_operand" "rJ")))
> +   (clobber (reg:SI GSR_REG))]
> +  "TARGET_VIS2"
> +  "bmask\t%r1, %r2, %0"
> +  [(set_attr "type" "array")])

I think this is wrong.  I think you want to model this as

  [(set (match_operand:DI 0 "register_operand" "=r")
        (plus:DI (match_operand:DI 1 "register_or_zero_operand" "rJ")
                 (match_operand:DI 2 "register_or_zero_operand" "rJ")))
   (set (zero_extract:DI
	  (reg:DI GSR_REG)
	  (const_int 32)
	  (const_int 32))
        (plus:DI (match_dup 1) (match_dup 2)))]

(1) %gsr is really set to something, not just modified in
uninteresting ways; we're going to use this value later.

(2) Only the top 32 bits of %gsr are changed; the low 32 bits are
still valid.  You don't want insns that set the low 32 bits to be
deleted as dead code.  Which is what would happen

(3) I realize this version makes things difficult for 32-bit mode.
There, I think you may have to settle for an unspec.  And perhaps
the possible benefit of Properly representing the GSR change isn't
that helpful.  In which case:

    (set (reg:DI GSR_REG)
	 (unspec:DI [(match_dup 1) (match_dup 2) (reg:DI GSR_REG)]
		    UNSPEC_BMASK))

> +(define_insn "bshuffle<V64I:mode>_vis"
> +  [(set (match_operand:V64I 0 "register_operand" "=e")
> +        (unspec:V64I [(match_operand:V64I 1 "register_operand" "e")
> +	              (match_operand:V64I 2 "register_operand" "e")]
> +                     UNSPEC_BSHUFFLE))
> +   (use (reg:SI GSR_REG))]

Better to push the use of the GSR_REG into the unspec, and not leave
it separate in the parallel.



r~
Joseph Myers Sept. 30, 2011, 10:09 p.m. UTC | #2
On Fri, 30 Sep 2011, Richard Henderson wrote:

> Specifically, in-compiler support for #pragma STDC FENV_ACCESS and the
> various <fenv.h> routines.  We ought to be able to track the rounding
> mode (and other relevant parameters) on a per-expression basis, tagging
> each floating-point operation with the parameters in effect.

For C99 and C1X it's just dynamic rounding direction (changed by 
fesetround, possibly changed by calls to any non-pure function unless you 
can prove that function doesn't call fesetround, but the default mode can 
be presumed unless -frounding-math or the FENV_ACCESS pragma is in 
effect).  (asms accessing the relevant registers also need to be 
considered.)

N1582 (status report on the C bindings for IEEE 754-2008) mentions static 
rounding direction support but doesn't go into details.  (Practically, 
static rounding directions are more useful for various floating-point 
algorithms.)

Floating-point operations implicitly read the rounding mode.  They 
implicitly write the exception flags (as, again, do most function calls) - 
except that generally they only set rather than clearing flags (but 
function calls may also call functions that clear them).

The present defaults are -fno-rounding-math -ftrapping-math.  I'm not sure 
that with a proper implementation this would really allow much more 
optimization than -frounding-math -ftrapping-math.  Simply enabling 
exceptions should disable most constant folding where the result isn't 
exactly representable, because the "inexact" exception is required, for 
example; just knowing the rounding mode and so the value of the result 
isn't enough to fold.  And if there aren't any function calls intervening, 
all combinations of these options will allow common subexpression 
elimination (since that doesn't change the set of exceptions raised, and 
no support is required for counting the number of times a particular 
exception was raised).  So the right defaults once -ftrapping-math really 
does what it says aren't clear.

I've thought a bit about implementation approaches, but mainly at the 
level of how to decouple the front-end and back-end parts from the full 
complexity of tracking pragma state for each expression (for example, by 
setting variables on a whole-function basis and restricting inlining).  
I've also thought about how to implement testcases providing reasonably 
thorough coverage of the exceptions and rounding modes issues.  But I 
haven't had time to work on implementation of any of these pieces.
David Miller Sept. 30, 2011, 10:27 p.m. UTC | #3
From: Richard Henderson <rth@redhat.com>
Date: Fri, 30 Sep 2011 14:03:52 -0700

>   (3) Use optimize-mode-switching to minimize the number of changes
>       to the global state.  This includes the use of SIAM vs %fsr,
>       especially when a subroutine call could have changed the
>       global rounding mode.

Indeed, and I incidentally took a look at the mode switching
optimization framework and it appears that I could use it for
providing insn patterns for 'rint' and friends like i386 does.

> All of which is a lot of work.
> 
>> +(define_insn "bmask<P:mode>_vis"
>> +  [(set (match_operand:P 0 "register_operand" "=r")
>> +        (plus:P (match_operand:P 1 "register_operand" "rJ")
>> +                (match_operand:P 2 "register_operand" "rJ")))
>> +   (clobber (reg:SI GSR_REG))]
>> +  "TARGET_VIS2"
>> +  "bmask\t%r1, %r2, %0"
>> +  [(set_attr "type" "array")])
> 
> I think this is wrong.  I think you want to model this as
 ...
>> +(define_insn "bshuffle<V64I:mode>_vis"
>> +  [(set (match_operand:V64I 0 "register_operand" "=e")
>> +        (unspec:V64I [(match_operand:V64I 1 "register_operand" "e")
>> +	              (match_operand:V64I 2 "register_operand" "e")]
>> +                     UNSPEC_BSHUFFLE))
>> +   (use (reg:SI GSR_REG))]
> 
> Better to push the use of the GSR_REG into the unspec, and not leave
> it separate in the parallel.

Thanks Richard, I'll fix these up.  In general, the GSR tracking needs
a bit more work.
David Miller Oct. 3, 2011, 5:28 a.m. UTC | #4
From: Richard Henderson <rth@redhat.com>
Date: Fri, 30 Sep 2011 14:03:52 -0700

> On 09/30/2011 12:59 AM, David Miller wrote:
>> 
>>[ VIS 2.0 bmask patterns ]
>
> I think this is wrong.  I think you want to model this as
> 
>   [(set (match_operand:DI 0 "register_operand" "=r")
>         (plus:DI (match_operand:DI 1 "register_or_zero_operand" "rJ")
>                  (match_operand:DI 2 "register_or_zero_operand" "rJ")))
>    (set (zero_extract:DI
> 	  (reg:DI GSR_REG)
> 	  (const_int 32)
> 	  (const_int 32))
>         (plus:DI (match_dup 1) (match_dup 2)))]

Yep, perfect for 64-bit.

> (3) I realize this version makes things difficult for 32-bit mode.
> There, I think you may have to settle for an unspec.  And perhaps
> the possible benefit of Properly representing the GSR change isn't
> that helpful.  In which case:
> 
>     (set (reg:DI GSR_REG)
> 	 (unspec:DI [(match_dup 1) (match_dup 2) (reg:DI GSR_REG)]
> 		    UNSPEC_BMASK))

Actually, can't we just use a (zero_extend:DI (plus:SI ...)) for the
32-bit case?  It seems to work fine.

>> +(define_insn "bshuffle<V64I:mode>_vis"
>> +  [(set (match_operand:V64I 0 "register_operand" "=e")
>> +        (unspec:V64I [(match_operand:V64I 1 "register_operand" "e")
>> +	              (match_operand:V64I 2 "register_operand" "e")]
>> +                     UNSPEC_BSHUFFLE))
>> +   (use (reg:SI GSR_REG))]
> 
> Better to push the use of the GSR_REG into the unspec, and not leave
> it separate in the parallel.

This is actually just a non-constant vec_merge, and even though the internals
documentation says that the 'items' operand has to be a const_int, the compiler
actually doesn't care.

The only two places vec_merge is even inspected semantically by the
compiler is in the RTX simplifier where it already checks explicitly
for const_int, and in dwarf2out.c where it just ignores the vec_merge
construct entirely since it can't be represented.

So if we just code this as:

	(set (match_operand:V64I 0 "register_operand" "=e")
             (vec_merge:V64I (match_operand:V64I 2 "register_operand" "e")
                             (match_operand:V64I 1 "register_operand" "e")
                             (zero_extract:DI (reg:DI GSR_REG)
                                              (const_int 32)
                                              (const_int 32))))

it would mostly work.  The only problem with this is that we provide
the bshuffle builtin for DI mode just as we do for the faligndata
instruction.  simplify-rtx.c isn't happy seeing a non-vector mode
and VECTOR_MODES () in a foo-modes.def file won't generate modes
like V1DI and V1SI.

I guess I could explicitly generate those single entry vector modes
like i386 does.

But, as-is, the above pattern does work for all the "actual" vector
modes.

More generally, rtl.def is non-specific about what operand 2 of a
vec_merge has to be, it just says a bitmask, it doesn't say that it
has to be a const_int.

I think we should explicitly allow non-const_int objects here, and state
so in the internals documentation.
Richard Henderson Oct. 3, 2011, 4:49 p.m. UTC | #5
On 10/02/2011 10:28 PM, David Miller wrote:
>>     (set (reg:DI GSR_REG)
>> 	 (unspec:DI [(match_dup 1) (match_dup 2) (reg:DI GSR_REG)]
>> 		    UNSPEC_BMASK))
> 
> Actually, can't we just use a (zero_extend:DI (plus:SI ...)) for the
> 32-bit case?  It seems to work fine.

Sure.

>>> +(define_insn "bshuffle<V64I:mode>_vis"
>>> +  [(set (match_operand:V64I 0 "register_operand" "=e")
>>> +        (unspec:V64I [(match_operand:V64I 1 "register_operand" "e")
>>> +	              (match_operand:V64I 2 "register_operand" "e")]
>>> +                     UNSPEC_BSHUFFLE))
>>> +   (use (reg:SI GSR_REG))]
>>
>> Better to push the use of the GSR_REG into the unspec, and not leave
>> it separate in the parallel.
> 
> This is actually just a non-constant vec_merge, and even though the internals
> documentation says that the 'items' operand has to be a const_int, the compiler
> actually doesn't care.

Um, no it isn't.

The VEC_MERGE pattern uses N bits to select N elements from op0 and op1:

	op0    = A B C D
	op1    = W X Y Z
	bmask  = 0 1 0 1 = 3
	result = A X C D

Your insn doesn't use single bits for the select.  It uses nibbles to
select from the 16 input bytes.  It's akin to the VEC_SELECT pattern,
except that VEC_SELECT requires a constant input parallel.

---

You might have a look at the "Vector Shuffle" thread, where we've been
trying to provide builtin-level access to this feature.  We've not added
an rtx-level code for this because so far there isn't *that* much in
common between the various cpus.  They all seem to differ in niggling
details...

You'll have a somewhat harder time than i386 for this feature, given
that you've got to pack bytes into nibbles.  But it can certainly be done.


r~
David Miller Oct. 3, 2011, 5:42 p.m. UTC | #6
From: Richard Henderson <rth@redhat.com>
Date: Mon, 03 Oct 2011 09:49:37 -0700

> You might have a look at the "Vector Shuffle" thread, where we've been
> trying to provide builtin-level access to this feature.  We've not added
> an rtx-level code for this because so far there isn't *that* much in
> common between the various cpus.  They all seem to differ in niggling
> details...
> 
> You'll have a somewhat harder time than i386 for this feature, given
> that you've got to pack bytes into nibbles.  But it can certainly be done.

Ok, I'll take a look.
Richard Henderson Oct. 3, 2011, 6:07 p.m. UTC | #7
On 10/03/2011 10:42 AM, David Miller wrote:
>> You might have a look at the "Vector Shuffle" thread, where we've been
>> trying to provide builtin-level access to this feature.  We've not added
>> an rtx-level code for this because so far there isn't *that* much in
>> common between the various cpus.  They all seem to differ in niggling
>> details...
>>
>> You'll have a somewhat harder time than i386 for this feature, given
>> that you've got to pack bytes into nibbles.  But it can certainly be done.
> 
> Ok, I'll take a look.

Oh, you should know that, at present, our generic shuffle support assumes
that shuffles with a constant control (which are also generated by the
vectorizer) get expanded to builtins.  And as builtins we wind up with
lots of them -- one per type.

I'm going to start fixing that in the coming week.

The vectorizer will be changed to emit VEC_SHUFFLE_EXPR.  It will still use
the target hook to see if the constant shuffle is supported.

The lower-vector pass currently tests the target hook and swaps the 
VEC_SHUFFLE_EXPRs that are validate into builtins.  That will be changed
to simply leave them unchanged if the other target hook returns NULL.
As the targets are updated to use vshuffle, the builtins get deleted
to return NULL.  After all targets are updated, we can remove this check
and the target hook itself.  This should preserve bisection on each of
the affected targets.

The rtl expander won't have to change.

The target backends will need to accept an immediate for vshuffle op3,
if anything special ought to be done for constant shuffles.  In addition,
the builtins should be removed, as previously noted.


r~
Artem Shinkarov Oct. 3, 2011, 6:40 p.m. UTC | #8
On Mon, Oct 3, 2011 at 7:07 PM, Richard Henderson <rth@redhat.com> wrote:
> On 10/03/2011 10:42 AM, David Miller wrote:
>>> You might have a look at the "Vector Shuffle" thread, where we've been
>>> trying to provide builtin-level access to this feature.  We've not added
>>> an rtx-level code for this because so far there isn't *that* much in
>>> common between the various cpus.  They all seem to differ in niggling
>>> details...
>>>
>>> You'll have a somewhat harder time than i386 for this feature, given
>>> that you've got to pack bytes into nibbles.  But it can certainly be done.
>>
>> Ok, I'll take a look.
>
> Oh, you should know that, at present, our generic shuffle support assumes
> that shuffles with a constant control (which are also generated by the
> vectorizer) get expanded to builtins.  And as builtins we wind up with
> lots of them -- one per type.
>
> I'm going to start fixing that in the coming week.
>
> The vectorizer will be changed to emit VEC_SHUFFLE_EXPR.  It will still use
> the target hook to see if the constant shuffle is supported.
>
> The lower-vector pass currently tests the target hook and swaps the
> VEC_SHUFFLE_EXPRs that are validate into builtins.  That will be changed
> to simply leave them unchanged if the other target hook returns NULL.
> As the targets are updated to use vshuffle, the builtins get deleted
> to return NULL.  After all targets are updated, we can remove this check
> and the target hook itself.  This should preserve bisection on each of
> the affected targets.
>
> The rtl expander won't have to change.
>
> The target backends will need to accept an immediate for vshuffle op3,
> if anything special ought to be done for constant shuffles.  In addition,
> the builtins should be removed, as previously noted.
>
>
> r~
>

Several orthogonal vector-shuffling issues.

Currently if vec_perm_ok returns false, we do not try to use a new
vshuffle routine. Would it make sense to implement that? The only
potential problem I can see is a possible performance degradation.
This leads us to the second issue.

When we perform vshuffle, we need to know whether it make sense to use
pshufb (in case of x86) or to perform data movement via standard
non-simd registers. Do we have this information in the current
cost-model? Also, in certain cases, when the mask is constant, I would
assume the memory movement is also faster. For example if the mask is
{4,5,6,7,0,1,2,3...}, then two integer moves should do a better job.
Were there any attempts to perform such an analysis, and if not,
should we formalise the cases when the substitution of sorts would
make some sense.


Thanks,
Artem.
Richard Henderson Oct. 3, 2011, 7:02 p.m. UTC | #9
On 10/03/2011 11:40 AM, Artem Shinkarov wrote:
> Currently if vec_perm_ok returns false, we do not try to use a new
> vshuffle routine. Would it make sense to implement that? The only
> potential problem I can see is a possible performance degradation.
> This leads us to the second issue.

Implement that where?  In the vectorizer?  No, I don't think so.
The _ok routine, while also indicating what the backend expander
supports, could also be thought of as a cost cutoff predicate.
Unless the vectorization folk request some more exact cost metric
I don't see any reason to change this.

> When we perform vshuffle, we need to know whether it make sense to use
> pshufb (in case of x86) or to perform data movement via standard
> non-simd registers. Do we have this information in the current
> cost-model?

Not really.  Again, if you're talking about the vectorizer, it
gets even more complicated than this because...

> Also, in certain cases, when the mask is constant, I would
> assume the memory movement is also faster. For example if the mask is
> {4,5,6,7,0,1,2,3...}, then two integer moves should do a better job.

... even before SSSE3 PSHUFB, we have all sorts of insns that can
perform a constant shuffle without having to resort to either
general-purpose registers or memory.  E.g. PSHUFD.  For specific
data types, we can handle arbitrary constant shuffle with 1 or 2
insns, even when arbitrary variable shuffles aren't.

It's certainly something that we could add to tree-vect-generic.c.
I have no plans to do anything of the sort, however.


r~
Artem Shinkarov Oct. 3, 2011, 7:21 p.m. UTC | #10
On Mon, Oct 3, 2011 at 8:02 PM, Richard Henderson <rth@redhat.com> wrote:
> On 10/03/2011 11:40 AM, Artem Shinkarov wrote:
>> Currently if vec_perm_ok returns false, we do not try to use a new
>> vshuffle routine. Would it make sense to implement that? The only
>> potential problem I can see is a possible performance degradation.
>> This leads us to the second issue.
>
> Implement that where?  In the vectorizer?  No, I don't think so.
> The _ok routine, while also indicating what the backend expander
> supports, could also be thought of as a cost cutoff predicate.
> Unless the vectorization folk request some more exact cost metric
> I don't see any reason to change this.

I was thinking more about the expander of the backend itself. When we
throw sorry () in the ix86_expand_vec_perm_builtin, we can fall back
to the vshuffle routine, unless it would lead to the performance
degradation.

>> When we perform vshuffle, we need to know whether it make sense to use
>> pshufb (in case of x86) or to perform data movement via standard
>> non-simd registers. Do we have this information in the current
>> cost-model?
>
> Not really.  Again, if you're talking about the vectorizer, it
> gets even more complicated than this because...
>
>> Also, in certain cases, when the mask is constant, I would
>> assume the memory movement is also faster. For example if the mask is
>> {4,5,6,7,0,1,2,3...}, then two integer moves should do a better job.
>
> ... even before SSSE3 PSHUFB, we have all sorts of insns that can
> perform a constant shuffle without having to resort to either
> general-purpose registers or memory.  E.g. PSHUFD.  For specific
> data types, we can handle arbitrary constant shuffle with 1 or 2
> insns, even when arbitrary variable shuffles aren't.

But these cases are more or less covered. I am thinking about the
cases when vec_perm_ok, returns false, but the actual permutation
could be done faster with memory/register transfers, rather than with
the PSHUFB & Co.

> It's certainly something that we could add to tree-vect-generic.c.
> I have no plans to do anything of the sort, however.

I didn't quite understand what do you think can be added to the
tree-vect-generic? I thought that we are talking about more or less
backend issues.

In any case I am investigating these problems, and I will appreciate
any help or advices.


Thanks,
Artem.
diff mbox

Patch

difficult because it influences the behavior of every float operation
and I couldnt' find an easy way to express those dependencies.  I
tried a few easy approaches but I couldn't reliably keep the compiler
from moving 'siam' across float operations.

The 'siam' (Set Interval Arithmetic Mode) instruction is a mechanism
to override the float rounding mode on a cycle-to-cycle basis, ie.
without the cost of doing a write to the %fsr.

But the rest of the VIS 2.0 stuff is here and was reasonably
straightforward to add.

Commited to trunk.

gcc/

	* config/sparc/sparc.opt (VIS2): New option.
	* doc/invoke.texi: Document it.
	* config/sparc/sparc.md (UNSPEC_EDGE8N, UNSPEC_EDGE8LN,
	UNSPEC_EDGE16N, UNSPEC_EDGE16LN, UNSPEC_EDGE32N,
	UNSPEC_EDGE32LN, UNSPEC_BSHUFFLE): New unspecs.
	(define_attr type): New insn type 'edgen'.
	(bmask<P:mode>_vis, bshuffle<V64I:mode>_vis, edge8n<P:mode>_vis,
	edge8ln<P:mode>_vis, edge16n<P:mode>_vis, edge16ln<P:mode>_vis,
	edge32n<P:mode>_vis, edge32ln<P:mode>_vis): New insn VIS 2.0
	patterns.
	* niagara.md: Handle edgen.
	* niagara2.md: Likewise.
	* ultra1_2.md: Likewise.
	* ultra3.md: Likewise.
	* config/sparc/sparc-c.c (sparc_target_macros): Define __VIS__
	to 0x200 when TARGET_VIS2.
	* config/sparc/sparc.c (sparc_option_override): Set MASK_VIS2 by
	default when targetting capable cpus.  TARGET_VIS2 implies
	TARGET_VIS, clear and it when TARGET_FPU is disabled.
	(sparc_vis_init_builtins): Emit new VIS 2.0 builtins.
	(sparc_expand_builtin): Fix predicate indexing when builtin returns
	void.
	(sparc_fold_builtin): Do not eliminate bmask when result is ignored.
	* config/sparc/visintrin.h (__vis_bmask, __vis_bshuffledi,
	__vis_bshufflev2si, __vis_bshufflev4hi, __vis_bshufflev8qi,
	__vis_edge8n, __vis_edge8ln, __vis_edge16n, __vis_edge16ln,
	__vis_edge32n, __vis_edge32ln): New VIS 2.0 interfaces.
	* doc/extend.texi: Document new VIS 2.0 builtins.

gcc/testsuite/

	* gcc.target/sparc/bmaskbshuf.c: New test.
	* gcc.target/sparc/edgen.c: New test.
---
 gcc/ChangeLog                               |   31 ++++++++++
 gcc/config/sparc/niagara.md                 |    2 +-
 gcc/config/sparc/niagara2.md                |    4 +-
 gcc/config/sparc/sparc-c.c                  |    7 ++-
 gcc/config/sparc/sparc.c                    |   77 +++++++++++++++++++++---
 gcc/config/sparc/sparc.md                   |   85 ++++++++++++++++++++++++++-
 gcc/config/sparc/sparc.opt                  |    6 ++-
 gcc/config/sparc/ultra1_2.md                |    2 +-
 gcc/config/sparc/ultra3.md                  |    2 +-
 gcc/config/sparc/visintrin.h                |   77 ++++++++++++++++++++++++
 gcc/doc/extend.texi                         |   18 ++++++
 gcc/doc/invoke.texi                         |   12 ++++-
 gcc/testsuite/ChangeLog                     |    5 ++
 gcc/testsuite/gcc.target/sparc/bmaskbshuf.c |   34 +++++++++++
 gcc/testsuite/gcc.target/sparc/edgen.c      |   39 ++++++++++++
 15 files changed, 382 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/sparc/bmaskbshuf.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/edgen.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ea5c6d0..96cd9d5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,34 @@ 
+2011-09-30  David S. Miller  <davem@davemloft.net>
+
+	* config/sparc/sparc.opt (VIS2): New option.
+	* doc/invoke.texi: Document it.
+	* config/sparc/sparc.md (UNSPEC_EDGE8N, UNSPEC_EDGE8LN,
+	UNSPEC_EDGE16N, UNSPEC_EDGE16LN, UNSPEC_EDGE32N,
+	UNSPEC_EDGE32LN, UNSPEC_BSHUFFLE): New unspecs.
+	(define_attr type): New insn type 'edgen'.
+	(bmask<P:mode>_vis, bshuffle<V64I:mode>_vis, edge8n<P:mode>_vis,
+	edge8ln<P:mode>_vis, edge16n<P:mode>_vis, edge16ln<P:mode>_vis,
+	edge32n<P:mode>_vis, edge32ln<P:mode>_vis): New insn VIS 2.0
+	patterns.
+	* niagara.md: Handle edgen.
+	* niagara2.md: Likewise.
+	* ultra1_2.md: Likewise.
+	* ultra3.md: Likewise.
+	* config/sparc/sparc-c.c (sparc_target_macros): Define __VIS__
+	to 0x200 when TARGET_VIS2.
+	* config/sparc/sparc.c (sparc_option_override): Set MASK_VIS2 by
+	default when targetting capable cpus.  TARGET_VIS2 implies
+	TARGET_VIS, clear and it when TARGET_FPU is disabled.
+	(sparc_vis_init_builtins): Emit new VIS 2.0 builtins.
+	(sparc_expand_builtin): Fix predicate indexing when builtin returns
+	void.
+	(sparc_fold_builtin): Do not eliminate bmask when result is ignored.
+	* config/sparc/visintrin.h (__vis_bmask, __vis_bshuffledi,
+	__vis_bshufflev2si, __vis_bshufflev4hi, __vis_bshufflev8qi,
+	__vis_edge8n, __vis_edge8ln, __vis_edge16n, __vis_edge16ln,
+	__vis_edge32n, __vis_edge32ln): New VIS 2.0 interfaces.
+	* doc/extend.texi: Document new VIS 2.0 builtins.
+
 2011-09-29  Nick Clifton  <nickc@redhat.com>
 	    Bernd Schmidt  <bernds@codesourcery.com>
 
diff --git a/gcc/config/sparc/niagara.md b/gcc/config/sparc/niagara.md
index a75088b..c7a2245 100644
--- a/gcc/config/sparc/niagara.md
+++ b/gcc/config/sparc/niagara.md
@@ -114,5 +114,5 @@ 
  */
 (define_insn_reservation "niag_vis" 8
   (and (eq_attr "cpu" "niagara")
-    (eq_attr "type" "fga,fgm_pack,fgm_mul,fgm_cmp,fgm_pdist,edge,gsr,array"))
+    (eq_attr "type" "fga,fgm_pack,fgm_mul,fgm_cmp,fgm_pdist,edge,edgen,gsr,array"))
   "niag_pipe*8")
diff --git a/gcc/config/sparc/niagara2.md b/gcc/config/sparc/niagara2.md
index f261ac1..fa07bec 100644
--- a/gcc/config/sparc/niagara2.md
+++ b/gcc/config/sparc/niagara2.md
@@ -111,10 +111,10 @@ 
 
 (define_insn_reservation "niag2_vis" 6
   (and (eq_attr "cpu" "niagara2")
-    (eq_attr "type" "fga,fgm_pack,fgm_mul,fgm_cmp,fgm_pdist,edge,array,gsr"))
+    (eq_attr "type" "fga,fgm_pack,fgm_mul,fgm_cmp,fgm_pdist,edge,edgen,array,gsr"))
   "niag2_pipe*6")
 
 (define_insn_reservation "niag3_vis" 9
   (and (eq_attr "cpu" "niagara3")
-    (eq_attr "type" "fga,fgm_pack,fgm_mul,fgm_cmp,fgm_pdist,edge,array,gsr"))
+    (eq_attr "type" "fga,fgm_pack,fgm_mul,fgm_cmp,fgm_pdist,edge,edgen,array,gsr"))
   "niag2_pipe*9")
diff --git a/gcc/config/sparc/sparc-c.c b/gcc/config/sparc/sparc-c.c
index 6e30950..0f2bee1 100644
--- a/gcc/config/sparc/sparc-c.c
+++ b/gcc/config/sparc/sparc-c.c
@@ -45,7 +45,12 @@  sparc_target_macros (void)
       cpp_assert (parse_in, "machine=sparc");
     }
 
-  if (TARGET_VIS)
+  if (TARGET_VIS2)
+    {
+      cpp_define (parse_in, "__VIS__=0x200");
+      cpp_define (parse_in, "__VIS=0x200");
+    }
+  else if (TARGET_VIS)
     {
       cpp_define (parse_in, "__VIS__=0x100");
       cpp_define (parse_in, "__VIS=0x100");
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index c8c0677..9863174 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -769,16 +769,16 @@  sparc_option_override (void)
     /* UltraSPARC III */
     /* ??? Check if %y issue still holds true.  */
     { MASK_ISA,
-      MASK_V9|MASK_DEPRECATED_V8_INSNS},
+      MASK_V9|MASK_DEPRECATED_V8_INSNS|MASK_VIS2},
     /* UltraSPARC T1 */
     { MASK_ISA,
       MASK_V9|MASK_DEPRECATED_V8_INSNS},
     /* UltraSPARC T2 */
-    { MASK_ISA, MASK_V9},
+    { MASK_ISA, MASK_V9|MASK_VIS2},
     /* UltraSPARC T3 */
-    { MASK_ISA, MASK_V9 | MASK_FMAF},
+    { MASK_ISA, MASK_V9|MASK_VIS2|MASK_FMAF},
     /* UltraSPARC T4 */
-    { MASK_ISA, MASK_V9 | MASK_FMAF},
+    { MASK_ISA, MASK_V9|MASK_VIS2|MASK_FMAF},
   };
   const struct cpu_table *cpu;
   unsigned int i;
@@ -857,9 +857,13 @@  sparc_option_override (void)
   if (target_flags_explicit & MASK_FPU)
     target_flags = (target_flags & ~MASK_FPU) | fpu;
 
-  /* Don't allow -mvis or -mfmaf if FPU is disabled.  */
+  /* -mvis2 implies -mvis */
+  if (TARGET_VIS2)
+    target_flags |= MASK_VIS;
+
+  /* Don't allow -mvis, -mvis2, or -mfmaf if FPU is disabled.  */
   if (! TARGET_FPU)
-    target_flags &= ~(MASK_VIS | MASK_FMAF);
+    target_flags &= ~(MASK_VIS | MASK_VIS2 | MASK_FMAF);
 
   /* -mvis assumes UltraSPARC+, so we are sure v9 instructions
      are available.
@@ -9300,6 +9304,21 @@  sparc_vis_init_builtins (void)
 			 di_ftype_ptr_ptr);
       def_builtin_const ("__builtin_vis_edge32l", CODE_FOR_edge32ldi_vis,
 			 di_ftype_ptr_ptr);
+      if (TARGET_VIS2)
+	{
+	  def_builtin_const ("__builtin_vis_edge8n", CODE_FOR_edge8ndi_vis,
+			     di_ftype_ptr_ptr);
+	  def_builtin_const ("__builtin_vis_edge8ln", CODE_FOR_edge8lndi_vis,
+			     di_ftype_ptr_ptr);
+	  def_builtin_const ("__builtin_vis_edge16n", CODE_FOR_edge16ndi_vis,
+			     di_ftype_ptr_ptr);
+	  def_builtin_const ("__builtin_vis_edge16ln", CODE_FOR_edge16lndi_vis,
+			     di_ftype_ptr_ptr);
+	  def_builtin_const ("__builtin_vis_edge32n", CODE_FOR_edge32ndi_vis,
+			     di_ftype_ptr_ptr);
+	  def_builtin_const ("__builtin_vis_edge32ln", CODE_FOR_edge32lndi_vis,
+			     di_ftype_ptr_ptr);
+	}
     }
   else
     {
@@ -9315,6 +9334,21 @@  sparc_vis_init_builtins (void)
 			 si_ftype_ptr_ptr);
       def_builtin_const ("__builtin_vis_edge32l", CODE_FOR_edge32lsi_vis,
 			 si_ftype_ptr_ptr);
+      if (TARGET_VIS2)
+	{
+	  def_builtin_const ("__builtin_vis_edge8n", CODE_FOR_edge8nsi_vis,
+			     si_ftype_ptr_ptr);
+	  def_builtin_const ("__builtin_vis_edge8ln", CODE_FOR_edge8lnsi_vis,
+			     si_ftype_ptr_ptr);
+	  def_builtin_const ("__builtin_vis_edge16n", CODE_FOR_edge16nsi_vis,
+			     si_ftype_ptr_ptr);
+	  def_builtin_const ("__builtin_vis_edge16ln", CODE_FOR_edge16lnsi_vis,
+			     si_ftype_ptr_ptr);
+	  def_builtin_const ("__builtin_vis_edge32n", CODE_FOR_edge32nsi_vis,
+			     si_ftype_ptr_ptr);
+	  def_builtin_const ("__builtin_vis_edge32ln", CODE_FOR_edge32lnsi_vis,
+			     si_ftype_ptr_ptr);
+	}
     }
 
   /* Pixel compare.  */
@@ -9394,6 +9428,25 @@  sparc_vis_init_builtins (void)
       def_builtin_const ("__builtin_vis_array32", CODE_FOR_array32si_vis,
 			 si_ftype_si_si);
   }
+
+  if (TARGET_VIS2)
+    {
+      /* Byte mask and shuffle */
+      if (TARGET_ARCH64)
+	def_builtin ("__builtin_vis_bmask", CODE_FOR_bmaskdi_vis,
+		     di_ftype_di_di);
+      else
+	def_builtin ("__builtin_vis_bmask", CODE_FOR_bmasksi_vis,
+		     si_ftype_si_si);
+      def_builtin ("__builtin_vis_bshufflev4hi", CODE_FOR_bshufflev4hi_vis,
+		   v4hi_ftype_v4hi_v4hi);
+      def_builtin ("__builtin_vis_bshufflev8qi", CODE_FOR_bshufflev8qi_vis,
+		   v8qi_ftype_v8qi_v8qi);
+      def_builtin ("__builtin_vis_bshufflev2si", CODE_FOR_bshufflev2si_vis,
+		   v2si_ftype_v2si_v2si);
+      def_builtin ("__builtin_vis_bshuffledi", CODE_FOR_bshuffledi_vis,
+		   di_ftype_di_di);
+    }
 }
 
 /* Handle TARGET_EXPAND_BUILTIN target hook.
@@ -9428,16 +9481,18 @@  sparc_expand_builtin (tree exp, rtx target,
   FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
     {
       const struct insn_operand_data *insn_op;
+      int idx;
 
       if (arg == error_mark_node)
 	return NULL_RTX;
 
       arg_count++;
-      insn_op = &insn_data[icode].operand[arg_count - !nonvoid];
+      idx = arg_count - !nonvoid;
+      insn_op = &insn_data[icode].operand[idx];
       op[arg_count] = expand_normal (arg);
 
-      if (! (*insn_data[icode].operand[arg_count].predicate) (op[arg_count],
-							      insn_op->mode))
+      if (! (*insn_data[icode].operand[idx].predicate) (op[arg_count],
+							insn_op->mode))
 	op[arg_count] = copy_to_mode_reg (insn_op->mode, op[arg_count]);
     }
 
@@ -9556,7 +9611,9 @@  sparc_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
   if (ignore
       && icode != CODE_FOR_alignaddrsi_vis
       && icode != CODE_FOR_alignaddrdi_vis
-      && icode != CODE_FOR_wrgsr_vis)
+      && icode != CODE_FOR_wrgsr_vis
+      && icode != CODE_FOR_bmasksi_vis
+      && icode != CODE_FOR_bmaskdi_vis)
     return build_zero_cst (rtype);
 
   switch (icode)
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 2def8d1..0446955 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -72,6 +72,14 @@ 
 
    (UNSPEC_SP_SET		60)
    (UNSPEC_SP_TEST		61)
+
+   (UNSPEC_EDGE8N		70)
+   (UNSPEC_EDGE8LN		71)
+   (UNSPEC_EDGE16N		72)
+   (UNSPEC_EDGE16LN		73)
+   (UNSPEC_EDGE32N		74)
+   (UNSPEC_EDGE32LN		75)
+   (UNSPEC_BSHUFFLE		76)
   ])
 
 (define_constants
@@ -240,7 +248,7 @@ 
    fpcmp,
    fpmul,fpdivs,fpdivd,
    fpsqrts,fpsqrtd,
-   fga,fgm_pack,fgm_mul,fgm_pdist,fgm_cmp,edge,gsr,array,
+   fga,fgm_pack,fgm_mul,fgm_pdist,fgm_cmp,edge,edgen,gsr,array,
    cmove,
    ialuX,
    multi,savew,flushw,iflush,trap"
@@ -8188,4 +8196,79 @@ 
   "array32\t%r1, %r2, %0"
   [(set_attr "type" "array")])
 
+(define_insn "bmask<P:mode>_vis"
+  [(set (match_operand:P 0 "register_operand" "=r")
+        (plus:P (match_operand:P 1 "register_operand" "rJ")
+                (match_operand:P 2 "register_operand" "rJ")))
+   (clobber (reg:SI GSR_REG))]
+  "TARGET_VIS2"
+  "bmask\t%r1, %r2, %0"
+  [(set_attr "type" "array")])
+
+(define_insn "bshuffle<V64I:mode>_vis"
+  [(set (match_operand:V64I 0 "register_operand" "=e")
+        (unspec:V64I [(match_operand:V64I 1 "register_operand" "e")
+	              (match_operand:V64I 2 "register_operand" "e")]
+                     UNSPEC_BSHUFFLE))
+   (use (reg:SI GSR_REG))]
+  "TARGET_VIS2"
+  "bshuffle\t%1, %2, %0"
+  [(set_attr "type" "fga")
+   (set_attr "fptype" "double")])
+
+;; VIS 2.0 adds edge variants which do not set the condition codes
+(define_insn "edge8n<P:mode>_vis"
+  [(set (match_operand:P 0 "register_operand" "=r")
+        (unspec:P [(match_operand:P 1 "register_operand" "rJ")
+	           (match_operand:P 2 "register_operand" "rJ")]
+                  UNSPEC_EDGE8N))]
+  "TARGET_VIS2"
+  "edge8n\t%r1, %r2, %0"
+  [(set_attr "type" "edgen")])
+
+(define_insn "edge8ln<P:mode>_vis"
+  [(set (match_operand:P 0 "register_operand" "=r")
+        (unspec:P [(match_operand:P 1 "register_operand" "rJ")
+	           (match_operand:P 2 "register_operand" "rJ")]
+                  UNSPEC_EDGE8LN))]
+  "TARGET_VIS2"
+  "edge8ln\t%r1, %r2, %0"
+  [(set_attr "type" "edgen")])
+
+(define_insn "edge16n<P:mode>_vis"
+  [(set (match_operand:P 0 "register_operand" "=r")
+        (unspec:P [(match_operand:P 1 "register_operand" "rJ")
+                   (match_operand:P 2 "register_operand" "rJ")]
+                  UNSPEC_EDGE16N))]
+  "TARGET_VIS2"
+  "edge16n\t%r1, %r2, %0"
+  [(set_attr "type" "edgen")])
+
+(define_insn "edge16ln<P:mode>_vis"
+  [(set (match_operand:P 0 "register_operand" "=r")
+        (unspec:P [(match_operand:P 1 "register_operand" "rJ")
+                   (match_operand:P 2 "register_operand" "rJ")]
+                  UNSPEC_EDGE16LN))]
+  "TARGET_VIS2"
+  "edge16ln\t%r1, %r2, %0"
+  [(set_attr "type" "edgen")])
+
+(define_insn "edge32n<P:mode>_vis"
+  [(set (match_operand:P 0 "register_operand" "=r")
+        (unspec:P [(match_operand:P 1 "register_operand" "rJ")
+                   (match_operand:P 2 "register_operand" "rJ")]
+                  UNSPEC_EDGE32N))]
+  "TARGET_VIS2"
+  "edge32n\t%r1, %r2, %0"
+  [(set_attr "type" "edgen")])
+
+(define_insn "edge32ln<P:mode>_vis"
+  [(set (match_operand:P 0 "register_operand" "=r")
+        (unspec:P [(match_operand:P 1 "register_operand" "rJ")
+                   (match_operand:P 2 "register_operand" "rJ")]
+                  UNSPEC_EDGE32LN))]
+  "TARGET_VIS2"
+  "edge32ln\t%r1, %r2, %0"
+  [(set_attr "type" "edge")])
+
 (include "sync.md")
diff --git a/gcc/config/sparc/sparc.opt b/gcc/config/sparc/sparc.opt
index 6be6a75..a7b60c8 100644
--- a/gcc/config/sparc/sparc.opt
+++ b/gcc/config/sparc/sparc.opt
@@ -59,7 +59,11 @@  Compile for V8+ ABI
 
 mvis
 Target Report Mask(VIS)
-Use UltraSPARC Visual Instruction Set extensions
+Use UltraSPARC Visual Instruction Set version 1.0 extensions
+
+mvis2
+Target Report Mask(VIS2)
+Use UltraSPARC Visual Instruction Set version 2.0 extensions
 
 mfmaf
 Target Report Mask(FMAF)
diff --git a/gcc/config/sparc/ultra1_2.md b/gcc/config/sparc/ultra1_2.md
index 4600205..9cdebab 100644
--- a/gcc/config/sparc/ultra1_2.md
+++ b/gcc/config/sparc/ultra1_2.md
@@ -94,7 +94,7 @@ 
 
 (define_insn_reservation "us1_simple_ieu1" 1
   (and (eq_attr "cpu" "ultrasparc")
-    (eq_attr "type" "compare,edge,array"))
+    (eq_attr "type" "compare,edge,edgen,array"))
   "us1_ieu1 + us1_slot012")
 
 (define_insn_reservation "us1_ialuX" 1
diff --git a/gcc/config/sparc/ultra3.md b/gcc/config/sparc/ultra3.md
index c6a9f89..c891e35 100644
--- a/gcc/config/sparc/ultra3.md
+++ b/gcc/config/sparc/ultra3.md
@@ -56,7 +56,7 @@ 
 
 (define_insn_reservation "us3_array" 2
   (and (eq_attr "cpu" "ultrasparc3")
-    (eq_attr "type" "array"))
+    (eq_attr "type" "array,edgen"))
   "us3_ms + us3_slotany, nothing")
 
 ;; ??? Not entirely accurate.
diff --git a/gcc/config/sparc/visintrin.h b/gcc/config/sparc/visintrin.h
index 3bef099..1688301 100644
--- a/gcc/config/sparc/visintrin.h
+++ b/gcc/config/sparc/visintrin.h
@@ -354,4 +354,81 @@  __vis_array32 (long __A, long __B)
   return __builtin_vis_array32 (__A, __B);
 }
 
+extern __inline long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_bmask (long __A, long __B)
+{
+  return __builtin_vis_bmask (__A, __B);
+}
+
+extern __inline __i64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_bshuffledi (__i64 __A, __i64 __B)
+{
+  return __builtin_vis_bshuffledi (__A, __B);
+}
+
+extern __inline __v2si
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_bshufflev2si (__v2si __A, __v2si __B)
+{
+  return __builtin_vis_bshufflev2si (__A, __B);
+}
+
+extern __inline __v4hi
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_bshufflev4hi (__v4hi __A, __v4hi __B)
+{
+  return __builtin_vis_bshufflev4hi (__A, __B);
+}
+
+extern __inline __v8qi
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_bshufflev8qi (__v8qi __A, __v8qi __B)
+{
+  return __builtin_vis_bshufflev8qi (__A, __B);
+}
+
+extern __inline long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_edge8n (void *__A, void *__B)
+{
+  return __builtin_vis_edge8n (__A, __B);
+}
+
+extern __inline long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_edge8ln (void *__A, void *__B)
+{
+  return __builtin_vis_edge8ln (__A, __B);
+}
+
+extern __inline long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_edge16n (void *__A, void *__B)
+{
+  return __builtin_vis_edge16n (__A, __B);
+}
+
+extern __inline long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_edge16ln (void *__A, void *__B)
+{
+  return __builtin_vis_edge16ln (__A, __B);
+}
+
+extern __inline long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_edge32n (void *__A, void *__B)
+{
+  return __builtin_vis_edge32n (__A, __B);
+}
+
+extern __inline long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_edge32ln (void *__A, void *__B)
+{
+  return __builtin_vis_edge32ln (__A, __B);
+}
+
 #endif  /* _VISINTRIN_H_INCLUDED */
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e8a777d..7ca50da 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -13016,6 +13016,24 @@  long __builtin_vis_array16 (long, long);
 long __builtin_vis_array32 (long, long);
 @end smallexample
 
+Additionally, when you use the @option{-mvis2} switch, the VIS version
+2.0 built-in functions become available:
+
+@smallexample
+long __builtin_vis_bmask (long, long);
+int64_t __builtin_vis_bshuffledi (int64_t, int64_t);
+v2si __builtin_vis_bshufflev2si (v2si, v2si);
+v4hi __builtin_vis_bshufflev2si (v4hi, v4hi);
+v8qi __builtin_vis_bshufflev2si (v8qi, v8qi);
+
+long __builtin_vis_edge8n (void *, void *);
+long __builtin_vis_edge8ln (void *, void *);
+long __builtin_vis_edge16n (void *, void *);
+long __builtin_vis_edge16ln (void *, void *);
+long __builtin_vis_edge32n (void *, void *);
+long __builtin_vis_edge32ln (void *, void *);
+@end smallexample
+
 @node SPU Built-in Functions
 @subsection SPU Built-in Functions
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e166964..0ce15ff 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -880,7 +880,7 @@  See RS/6000 and PowerPC Options.
 -mstack-bias  -mno-stack-bias @gol
 -munaligned-doubles  -mno-unaligned-doubles @gol
 -mv8plus  -mno-v8plus  -mvis  -mno-vis @gol
--mfmaf -mno-fmaf}
+-mvis2 -mno-vis2 -mfmaf -mno-fmaf}
 
 @emph{SPU Options}
 @gccoptlist{-mwarn-reloc -merror-reloc @gol
@@ -17430,6 +17430,16 @@  mode for all SPARC-V9 processors.
 With @option{-mvis}, GCC generates code that takes advantage of the UltraSPARC
 Visual Instruction Set extensions.  The default is @option{-mno-vis}.
 
+@item -mvis2
+@itemx -mno-vis2
+@opindex mvis2
+@opindex mno-vis2
+With @option{-mvis2}, GCC generates code that takes advantage of
+version 2.0 of the UltraSPARC Visual Instruction Set extensions.  The
+default is @option{-mvis2} when targetting a cpu that supports such
+instructions, such as UltraSPARC-III and later.  Setting @option{-mvis2}
+also sets @option{-mvis}.
+
 @item -mfmaf
 @itemx -mno-fmaf
 @opindex mfmaf
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index fb41e55..e96612c 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@ 
+2011-09-30  David S. Miller  <davem@davemloft.net>
+
+	* gcc.target/sparc/bmaskbshuf.c: New test.
+	* gcc.target/sparc/edgen.c: New test.
+
 2011-09-29  Janus Weil  <janus@gcc.gnu.org>
 
 	PR fortran/50547
diff --git a/gcc/testsuite/gcc.target/sparc/bmaskbshuf.c b/gcc/testsuite/gcc.target/sparc/bmaskbshuf.c
new file mode 100644
index 0000000..7108a01
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/bmaskbshuf.c
@@ -0,0 +1,34 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O -mcpu=ultrasparc3 -mvis -mvis2" } */
+typedef long long int64_t;
+typedef int vec32 __attribute__((vector_size(8)));
+typedef short vec16 __attribute__((vector_size(8)));
+typedef unsigned char vec8 __attribute__((vector_size(8)));
+
+long test_bmask (long x, long y)
+{
+  return __builtin_vis_bmask (x, y);
+}
+
+vec16 test_bshufv4hi (vec16 x, vec16 y)
+{
+  return __builtin_vis_bshufflev4hi (x, y);
+}
+
+vec32 test_bshufv2si (vec32 x, vec32 y)
+{
+  return __builtin_vis_bshufflev2si (x, y);
+}
+
+vec8 test_bshufv8qi (vec8 x, vec8 y)
+{
+  return __builtin_vis_bshufflev8qi (x, y);
+}
+
+int64_t test_bshufdi (int64_t x, int64_t y)
+{
+  return __builtin_vis_bshuffledi (x, y);
+}
+
+/* { dg-final { scan-assembler "bmask\t%" } } */
+/* { dg-final { scan-assembler "bshuffle\t%" } } */
diff --git a/gcc/testsuite/gcc.target/sparc/edgen.c b/gcc/testsuite/gcc.target/sparc/edgen.c
new file mode 100644
index 0000000..11973b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/edgen.c
@@ -0,0 +1,39 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O -mcpu=ultrasparc3 -mvis" } */
+
+long test_edge8n (void *p1, void *p2)
+{
+  return __builtin_vis_edge8n (p1, p2);
+}
+
+long test_edge8ln (void *p1, void *p2)
+{
+  return __builtin_vis_edge8ln (p1, p2);
+}
+
+long test_edge16n (void *p1, void *p2)
+{
+  return __builtin_vis_edge16n (p1, p2);
+}
+
+long test_edge16ln (void *p1, void *p2)
+{
+  return __builtin_vis_edge16ln (p1, p2);
+}
+
+long test_edge32n (void *p1, void *p2)
+{
+  return __builtin_vis_edge32n (p1, p2);
+}
+
+long test_edge32ln (void *p1, void *p2)
+{
+  return __builtin_vis_edge32ln (p1, p2);
+}
+
+/* { dg-final { scan-assembler "edge8n\t%" } } */
+/* { dg-final { scan-assembler "edge8ln\t%" } } */
+/* { dg-final { scan-assembler "edge16n\t%" } } */
+/* { dg-final { scan-assembler "edge16ln\t%" } } */
+/* { dg-final { scan-assembler "edge32n\t%" } } */
+/* { dg-final { scan-assembler "edge32ln\t%" } } */