diff mbox

[v3] Add support for sparc compare-and-branch

Message ID 20121022.233923.1683656545305450956.davem@davemloft.net
State New
Headers show

Commit Message

David Miller Oct. 23, 2012, 3:39 a.m. UTC
Differences from v2:

1) If another control transfer comes right after a cbcond we take
   an enormous performance penalty, some 20 cycles or more.  The
   documentation specifically warns about this, so emit a nop when
   we encounter this scenerio.

2) Add a heuristic to avoid using cbcond if we know at RTL emit
   time that we're going to compare against a constant that does
   not fit in the tiny 5-bit signed immediate field.

3) Use cbcond for unconditional jumps too.

Regstrapped on sparc-unknown-linux-gnu w/--with-cpu=niagara4.

Eric and Rainer, I think that functionally this patch is fully ready
to go into the tree except for the Solaris aspects which I do not have
the means to work on.  Have either of you made any progress in this
area?

Thanks!

gcc/

2012-10-12  David S. Miller  <davem@davemloft.net>

	* configure.ac: Add check for assembler SPARC4 instruction
	support.
	* configure: Rebuild.
	* config.in: Add HAVE_AS_SPARC4 section.
	* config/sparc/sparc.opt (mcbcond): New option.
	* doc/invoke.texi: Document it.
	* config/sparc/constraints.md: New constraint 'A' for 5-bit signed
	immediates.
	* doc/md.texi: Document it.
	* config/sparc/predicates.md (arith5_operand): New predicate.
	* config/sparc/sparc.c (dump_target_flag_bits): Handle MASK_CBCOND.
	(sparc_option_override): Likewise.
	(emit_cbcond_insn): New function.
	(emit_conditional_branch_insn): Call it.
	(emit_cbcond_nop): New function.
	(output_ubranch): Use cbcond, remove label arg.
	(output_cbcond): New function.
	* config/sparc/sparc-protos.h (output_ubranch): Update.
	(output_cbcond): Declare it.
	(emit_cbcond_nop): Likewise.
	* config/sparc/sparc.md (type attribute): New types 'cbcond'
	and uncond_cbcond.
	(emit_cbcond_nop): New attribute.
	(length attribute): Handle cbcond and uncond_cbcond.
	(in_call_delay attribute): Reject cbcond and uncond_cbcond.
	(in_branch_delay attribute): Likewise.
	(in_uncond_branch_delay attribute): Likewise.
	(in_annul_branch_delay attribute): Likewise.
	(*cbcond_sp32, *cbcond_sp64): New insn patterns.
	(jump): Rewrite into an expander.
	(*jump_ubranch, *jump_cbcond): New patterns.
	* config/sparc/niagara4.md: Match 'cbcond' and 'uncond_cbcond' in
        'n4_cti'.
	* config/sparc/sparc.h (AS_NIAGARA4_FLAG): New macro, use it
	when target default is niagara4.
	(SPARC_SIMM5_P): Define.
	* config/sparc/sol2.h (AS_SPARC64_FLAG): Adjust.
	(AS_SPARC32_FLAG): Define.
	(ASM_CPU32_DEFAULT_SPEC, ASM_CPU64_DEFAULT_SPEC): Use
	AS_NIAGARA4_FLAG as needed.

Comments

David Miller Oct. 25, 2012, 6:23 p.m. UTC | #1
From: David Miller <davem@davemloft.net>
Date: Mon, 22 Oct 2012 23:39:23 -0400 (EDT)

> Eric and Rainer, I think that functionally this patch is fully ready
> to go into the tree except for the Solaris aspects which I do not have
> the means to work on.  Have either of you made any progress in this
> area?

Just wondering if either of you have had a chance to look into this?

Thanks!
Eric Botcazou Oct. 26, 2012, 8:57 a.m. UTC | #2
> Eric and Rainer, I think that functionally this patch is fully ready
> to go into the tree except for the Solaris aspects which I do not have
> the means to work on.  Have either of you made any progress in this
> area?

Not yet, but I'll have a look at the beginning of next week.


Some remarks:

> @@ -1088,7 +1093,12 @@ sparc_option_override (void)
>    if (TARGET_VIS3)
>      target_flags |= MASK_VIS2 | MASK_VIS;
> 
> -  /* Don't allow -mvis, -mvis2, -mvis3, or -mfmaf if FPU is disabled.  */
> +  /* -mcbcond implies -mvis3, -mvis2 and -mvis */
> +  if (TARGET_CBCOND)
> +    target_flags |= MASK_VIS3 | MASK_VIS2 | MASK_VIS;

> +@item -mcbcond
> +@itemx -mno-cbcond
> +@opindex mcbcond
> +@opindex mno-cbcond
> +With @option{-mcbcond}, GCC generates code that takes advantage of
> +compare-and-branch instructions, as defined in the Sparc Architecture 2011.
> +The default is @option{-mcbcond} when targeting a cpu that supports such
> +instructions, such as niagara-4 and later.  Setting @option{-mcbcond} also
> +sets @option{-mvis3}, @option{-mvis2}, and @option{-mvis}.

Why?  If -mcpu=niagara4 implies -mcbcond and -mvis3, I don't see the point.


> +  /* If we can tell early on that the comparison is against a constant
> +     that won't fit in the 5-bit signed immediate field of a cbcond,
> +     use one of the other v9 conditional branch sequences.  */
> +  if (TARGET_CBCOND
> +      && GET_CODE (operands[1]) == REG
> +      && (GET_MODE (operands[1]) == SImode
> +	  || (TARGET_ARCH64 && GET_MODE (operands[1]) == DImode))
> +      && (GET_CODE (operands[2]) != CONST_INT
> +	  || SPARC_SIMM5_P (INTVAL (operands[2]))))
> +    {
> +      emit_cbcond_insn (GET_CODE (operands[0]), operands[1], operands[2],
> operands[3]); +      return;
> +    }

Long line.


> +#ifndef HAVE_AS_SPARC4
> +#define AS_NIAGARA4_FLAG " -xarch=v9b"
> +#else
> +#define AS_NIAGARA4_FLAG " -xarch=sparc4"
> +#endif

Won't this override the AS_NIAGARA3_FLAG logic for -mcpu=niagara4?  You'll 
have v9d for -mcpu=niagara3 but v9b for -mcpu=niagara4 if the assembler 
doesn't support sparc4.


> +;; True if we are making use of compare-and-branch instructions.
> +;; True if we should emit a nop after a cbcond instruction
> +(define_attr "emit_cbcond_nop" "false,true"
> +  (symbol_ref "(emit_cbcond_nop (insn)
> +                ? EMIT_CBCOND_NOP_TRUE : EMIT_CBCOND_NOP_FALSE)"))
> +
>  (define_attr "branch_type" "none,icc,fcc,reg"
>    (const_string "none"))

There seems to be one superfluous comment line.


Otherwise looks good.  Thanks for having introduced the -mcbcond switch, I 
think that's perfectly appropriate for a feature like this brand new set of 
branch instructions.
Rainer Orth Oct. 26, 2012, 9:14 a.m. UTC | #3
David Miller <davem@davemloft.net> writes:

>> Eric and Rainer, I think that functionally this patch is fully ready
>> to go into the tree except for the Solaris aspects which I do not have
>> the means to work on.  Have either of you made any progress in this
>> area?
>
> Just wondering if either of you have had a chance to look into this?

I tried a bootstrap on Solaris 11.1, but ran into lots of comparison
failures I've not yet investigated.

	Rainer
David Miller Oct. 26, 2012, 9:14 a.m. UTC | #4
From: Eric Botcazou <ebotcazou@adacore.com>
Date: Fri, 26 Oct 2012 10:57 +0200

>> @@ -1088,7 +1093,12 @@ sparc_option_override (void)
>>    if (TARGET_VIS3)
>>      target_flags |= MASK_VIS2 | MASK_VIS;
>> 
>> -  /* Don't allow -mvis, -mvis2, -mvis3, or -mfmaf if FPU is disabled.  */
>> +  /* -mcbcond implies -mvis3, -mvis2 and -mvis */
>> +  if (TARGET_CBCOND)
>> +    target_flags |= MASK_VIS3 | MASK_VIS2 | MASK_VIS;
> 
>> +@item -mcbcond
>> +@itemx -mno-cbcond
>> +@opindex mcbcond
>> +@opindex mno-cbcond
>> +With @option{-mcbcond}, GCC generates code that takes advantage of
>> +compare-and-branch instructions, as defined in the Sparc Architecture 2011.
>> +The default is @option{-mcbcond} when targeting a cpu that supports such
>> +instructions, such as niagara-4 and later.  Setting @option{-mcbcond} also
>> +sets @option{-mvis3}, @option{-mvis2}, and @option{-mvis}.
> 
> Why?  If -mcpu=niagara4 implies -mcbcond and -mvis3, I don't see the point.

Ok.  I'll make cbcond independent of the other switches.

>> +  if (TARGET_CBCOND
>> +      && GET_CODE (operands[1]) == REG
>> +      && (GET_MODE (operands[1]) == SImode
>> +	  || (TARGET_ARCH64 && GET_MODE (operands[1]) == DImode))
>> +      && (GET_CODE (operands[2]) != CONST_INT
>> +	  || SPARC_SIMM5_P (INTVAL (operands[2]))))
>> +    {
>> +      emit_cbcond_insn (GET_CODE (operands[0]), operands[1], operands[2],
>> operands[3]); +      return;
>> +    }
> 
> Long line.

Thanks, will fix.

>> +#ifndef HAVE_AS_SPARC4
>> +#define AS_NIAGARA4_FLAG " -xarch=v9b"
>> +#else
>> +#define AS_NIAGARA4_FLAG " -xarch=sparc4"
>> +#endif
> 
> Won't this override the AS_NIAGARA3_FLAG logic for -mcpu=niagara4?  You'll 
> have v9d for -mcpu=niagara3 but v9b for -mcpu=niagara4 if the assembler 
> doesn't support sparc4.

This is part of what we need to sort out.  What I'd really like is for
both the Solaris and generic definitions to share the switch since they
can be the same.

I'll make sure this uses the NIAGARA3 switch as the backup in the final
version I commit.

>> +;; True if we are making use of compare-and-branch instructions.
>> +;; True if we should emit a nop after a cbcond instruction
>> +(define_attr "emit_cbcond_nop" "false,true"
>> +  (symbol_ref "(emit_cbcond_nop (insn)
>> +                ? EMIT_CBCOND_NOP_TRUE : EMIT_CBCOND_NOP_FALSE)"))
>> +
>>  (define_attr "branch_type" "none,icc,fcc,reg"
>>    (const_string "none"))
> 
> There seems to be one superfluous comment line.

Thanks for pointing that out, I'll fix it up.

> Otherwise looks good.  Thanks for having introduced the -mcbcond switch, I 
> think that's perfectly appropriate for a feature like this brand new set of 
> branch instructions.

Thanks for reviewing.
Eric Botcazou Nov. 11, 2012, 10:28 p.m. UTC | #5
> Eric and Rainer, I think that functionally this patch is fully ready
> to go into the tree except for the Solaris aspects which I do not have
> the means to work on.  Have either of you made any progress in this
> area?

Rainer, could you post an excerpt of the man page of a recent 'as' supporting 
the SPARC-T4?  I'm mainly interested in the values of the -xarch= option.

Thanks in advance.
David Miller Nov. 11, 2012, 11:16 p.m. UTC | #6
From: Eric Botcazou <ebotcazou@adacore.com>
Date: Sun, 11 Nov 2012 23:28:38 +0100

>> Eric and Rainer, I think that functionally this patch is fully ready
>> to go into the tree except for the Solaris aspects which I do not have
>> the means to work on.  Have either of you made any progress in this
>> area?
> 
> Rainer, could you post an excerpt of the man page of a recent 'as' supporting 
> the SPARC-T4?  I'm mainly interested in the values of the -xarch= option.
> 
> Thanks in advance.

I strongly doubt that they will be different from the options
supported both in cc and fbe in the Solaris Studio 12.3 release:

     -xarch=sparc Enables the assembler	 to  accept  instructions
		  defined  in  the  SPARC-V9  architecture.   The
		  resulting object code	is in ELF32  format  when
		  compiled  with -m32, ELF64 format with -m64. It
		  will not execute on a	Oracle Solaris V8  system
		  (a  machine with a V8	processor).  It	will exe-
		  cute on a Oracle Solaris V8+ system.

     -xarch=sparcvis
		  Enables the assembler	 to  accept  instructions
		  defined  in  the SPARC-V9 architecture plus the
		  instructions	in  the	 Visual	 Instruction  Set
		  (VIS)	 version  1.0.	The resulting object code
		  is in	V8+ ELF32 format when compiled with -m32,
		  ELF64	 format	with -m64. It will not execute on
		  a Oracle Solaris system with a V8 processor. It
		  will	execute	on a Oracle Solaris system with	a
		  V8+ processor.

     -xarch=sparcvis2
		  Enables the assembler	 to  accept  instructions
		  defined  in the SPARC-V9 architecture, plus the
		  instructions	in  the	 Visual	 Instruction  Set
		  (VIS)	 version  2.0, with UltraSPARC-III exten-
		  sions.  The resulting	object	code  is  in  V8+
		  ELF32	 format	 when  compiled	 with -m32, ELF64
		  format with -m64.

     -xarch=sparcvis3
		  Accept instructions defined for the  SPARC  VIS
		  version   3  of  the	SPARC-V9  ISA  which  are
		  instructions from the	SPARC-V9 instruction set,
		  plus	the  UltraSPARC	extensions, including the
		  Visual Instruction Set (VIS) version	1.0,  the
		  UltraSPARC-III extensions, including the Visual
		  Instruction Set (VIS)	version	 2.0,  the  fused
		  multiply-add	 instructions,	 and  the  Visual
		  Instruction Set (VIS)	version	3.0

     -xarch=sparcfmaf
		  Accept instructions defined for  the	sparcfmaf
		  version   of	 the   SPARC-V9	  ISA,	plus  the
		  UltraSPARC  extensions,  including  the  Visual
		  Instruction	Set   (VIS)   version	1.0,  the
		  UltraSPARC-III extensions, including the Visual
		  Instruction  Set  (VIS)  version  2.0,  and the
		  SPARC64  VI	extensions   for   floating-point
		  multiply-add.

     -xarch=sparcima
		  Accept instructions defined  for  the	 sparcima
		  version  of the SPARC-V9 ISA which are instruc-
		  tions	from the SPARC-V9 instruction  set,  plus
		  the UltraSPARC extensions, including the Visual
		  Instruction  Set   (VIS)   version   1.0,   the
		  UltraSPARC-III extensions, including the Visual
		  Instruction Set (VIS)	version	2.0, the  SPARC64
		  VI  extensions for floating-point multiply-add,
		  and the  SPARC64  VII	 extensions  for  integer
		  multiply-add.

     -xarch=sparc4
		  Accept instructions defined for the sparc4 ver-
		  sion of the SPARC-V9 ISA which are instructions
		  from the SPARC-V9  instruction  set,	plus  the
		  extensions,	which	includes   VIS	1.0,  the
		  UltraSPARC-III extensions, which  includes  VIS
		  2.0,	 the  fused  floating-point  multiply-add
		  instructions,	VIS 3.0, and SPARC4 instructions.
Eric Botcazou Nov. 12, 2012, 8:35 a.m. UTC | #7
> I strongly doubt that they will be different from the options
> supported both in cc and fbe in the Solaris Studio 12.3 release:

They need to provide some form of backward compatibility though, they cannot 
break the interface of 'as' like that.  Apparently 'fbe' has had its own set 
of -xarch values for a while and they haven't been compatible with 'as'.
Rainer Orth Nov. 12, 2012, 2:39 p.m. UTC | #8
Eric Botcazou <ebotcazou@adacore.com> writes:

>> I strongly doubt that they will be different from the options
>> supported both in cc and fbe in the Solaris Studio 12.3 release:
>
> They need to provide some form of backward compatibility though, they cannot 
> break the interface of 'as' like that.  Apparently 'fbe' has had its own set 
> of -xarch values for a while and they haven't been compatible with 'as'.

No, quite the contrary.  as is just a (sometimes partial) backport of
Studio fbe, though it's hard to tell exactly which Studio version of fbe
forms the basis of as.  Especially for the Solaris 10 as patches, only
particular bugfixes/enhancements have been backported.

Backward compatibility is maintained, of course.  as(1) lists

     -xarch=v9

         Equivalent to: -m64 -xarch=sparc

and many more.

	Rainer
Eric Botcazou Nov. 12, 2012, 3:26 p.m. UTC | #9
> No, quite the contrary.  as is just a (sometimes partial) backport of
> Studio fbe, though it's hard to tell exactly which Studio version of fbe
> forms the basis of as.  Especially for the Solaris 10 as patches, only
> particular bugfixes/enhancements have been backported.
> 
> Backward compatibility is maintained, of course.  as(1) lists
> 
>      -xarch=v9
> 
>          Equivalent to: -m64 -xarch=sparc
> 
> and many more.

Does it list -xarch=v8pluse/-xarch=v9e as equivalent to -m32/64 -xarch=sparc4?
If so, I don't think that we need to change our scheme, using 'e' instead of 
'd' for SPARC4 instructions should work just fine with both GNU and Sun as.
Rainer Orth Nov. 12, 2012, 3:46 p.m. UTC | #10
Eric Botcazou <ebotcazou@adacore.com> writes:

>> No, quite the contrary.  as is just a (sometimes partial) backport of
>> Studio fbe, though it's hard to tell exactly which Studio version of fbe
>> forms the basis of as.  Especially for the Solaris 10 as patches, only
>> particular bugfixes/enhancements have been backported.
>> 
>> Backward compatibility is maintained, of course.  as(1) lists
>> 
>>      -xarch=v9
>> 
>>          Equivalent to: -m64 -xarch=sparc
>> 
>> and many more.
>
> Does it list -xarch=v8pluse/-xarch=v9e as equivalent to -m32/64 -xarch=sparc4?
> If so, I don't think that we need to change our scheme, using 'e' instead of 
> 'd' for SPARC4 instructions should work just fine with both GNU and Sun as.

as(1) mentions no -xarch value beyond v9b, while strings on the as binary
reveals v9, v9[a-dv], but no v9e.  Seems to be a gas invention.

	Rainer
Richard Henderson Nov. 12, 2012, 5:56 p.m. UTC | #11
On 10/22/2012 08:39 PM, David Miller wrote:
> +  /* Compare and Branch is limited to +-2KB.  If it is too far away,
> +     change
> +
> +     cxbne X, Y, .LC30
> +
> +     to
> +
> +     cxbe X, Y, .+12
> +     ba,pt xcc, .LC30
> +      nop  */

Based on your no-control-after cbcond comment at the top
of the patch, surely this should contain another nop as well.

> +  *p++ = '\t';
> +  *p++ = '%';
> +  *p++ = '1';
> +  *p++ = ',';
> +  *p++ = ' ';
> +  *p++ = '%';
> +  *p++ = '2';
> +  *p++ = ',';
> +  *p++ = ' ';

And surely all this code isn't so performance sensitive that
it needs to be written in such an unreadable way.

  p = stpcpy (p, "\t%1, %2, ");

is at least a little better.

Though really there's just 3 variable portions of the pattern, so
I wonder if a lesser number of snprintf calls might be good enough.

  if (far)
    {
      if (veryfar)
        snprintf (buf, sizeof(buf), "c%cb%s\t%%1, %%2, .+16\n\t"
		  "b\t%%l3\n\t nop", size_char, cond_str);
      else
        snprintf (buf, sizeof(buf), "c%cb%s\t%%1, %%2, .+16\n\t"
		  "ba,pt\t%%xcc,%%l3\n\t nop", size_char, cond_str);
    }
  else
    snprintf (buf, sizeof(buf), "c%cb%s\t%%1, %%2, %%l3", size_char, cond_str);


r~
David Miller Nov. 12, 2012, 7:35 p.m. UTC | #12
From: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
Date: Mon, 12 Nov 2012 16:46:37 +0100

> Eric Botcazou <ebotcazou@adacore.com> writes:
> 
>>> No, quite the contrary.  as is just a (sometimes partial) backport of
>>> Studio fbe, though it's hard to tell exactly which Studio version of fbe
>>> forms the basis of as.  Especially for the Solaris 10 as patches, only
>>> particular bugfixes/enhancements have been backported.
>>> 
>>> Backward compatibility is maintained, of course.  as(1) lists
>>> 
>>>      -xarch=v9
>>> 
>>>          Equivalent to: -m64 -xarch=sparc
>>> 
>>> and many more.
>>
>> Does it list -xarch=v8pluse/-xarch=v9e as equivalent to -m32/64 -xarch=sparc4?
>> If so, I don't think that we need to change our scheme, using 'e' instead of 
>> 'd' for SPARC4 instructions should work just fine with both GNU and Sun as.
> 
> as(1) mentions no -xarch value beyond v9b, while strings on the as binary
> reveals v9, v9[a-dv], but no v9e.  Seems to be a gas invention.

It is indeed, a gas invention.

We really need to start using the newer names, as Sun is not going to
provide single letter indicators for sparc4 or future xarch values.

In fact, that's exactly what needed to be worked on from the beginning
for the solaris side of this cbcond patch.  We're talking in circles.
:-)
David Miller Nov. 12, 2012, 7:37 p.m. UTC | #13
From: Eric Botcazou <ebotcazou@adacore.com>
Date: Mon, 12 Nov 2012 09:35:48 +0100

>> I strongly doubt that they will be different from the options
>> supported both in cc and fbe in the Solaris Studio 12.3 release:
> 
> They need to provide some form of backward compatibility though, they cannot 
> break the interface of 'as' like that.  Apparently 'fbe' has had its own set 
> of -xarch values for a while and they haven't been compatible with 'as'.

You give them far too much credit :-)

The 'as' updates constantly add inconsistencies in options and
behavior, and even worse (in my opinion) they effectively stopped
updating the 'as' manual page in this area.
David Miller Nov. 12, 2012, 7:38 p.m. UTC | #14
From: Richard Henderson <rth@redhat.com>
Date: Mon, 12 Nov 2012 09:56:21 -0800

> On 10/22/2012 08:39 PM, David Miller wrote:
>> +  /* Compare and Branch is limited to +-2KB.  If it is too far away,
>> +     change
>> +
>> +     cxbne X, Y, .LC30
>> +
>> +     to
>> +
>> +     cxbe X, Y, .+12
>> +     ba,pt xcc, .LC30
>> +      nop  */
> 
> Based on your no-control-after cbcond comment at the top
> of the patch, surely this should contain another nop as well.

Indeed, I'll fix this up.

> And surely all this code isn't so performance sensitive that
> it needs to be written in such an unreadable way.

Sure, I'll change the code to use one of the the clearer mechanisms
you suggested.

Thanks for the review.
David Miller Nov. 13, 2012, 2:46 a.m. UTC | #15
From: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
Date: Fri, 26 Oct 2012 11:14:33 +0200

> I tried a bootstrap on Solaris 11.1, but ran into lots of comparison
> failures I've not yet investigated.

I started working on this patch again, in order to incorporate
Richard Henderson's feedback, and I am now getting a comparison
failure.  Is this what you're seeing?

Comparing stages 2 and 3
warning: gcc/cc1-checksum.o differs
warning: gcc/cc1plus-checksum.o differs
warning: gcc/cc1obj-checksum.o differs
Bootstrap comparison failure!
libdecnumber/decNumber.o differs
make[2]: *** [compare] Error 1
make[1]: *** [stage3-bubble] Error 2
make: *** [all] Error 2

In any case, I'm looking into it.
Rainer Orth Nov. 15, 2012, 2:29 p.m. UTC | #16
David Miller <davem@davemloft.net> writes:

> I started working on this patch again, in order to incorporate
> Richard Henderson's feedback, and I am now getting a comparison
> failure.  Is this what you're seeing?
>
> Comparing stages 2 and 3
> warning: gcc/cc1-checksum.o differs
> warning: gcc/cc1plus-checksum.o differs
> warning: gcc/cc1obj-checksum.o differs
> Bootstrap comparison failure!
> libdecnumber/decNumber.o differs
> make[2]: *** [compare] Error 1
> make[1]: *** [stage3-bubble] Error 2
> make: *** [all] Error 2
>
> In any case, I'm looking into it.

No, if I did get the comparison failures, it seems every single file was
different.  It only happened for Solaris 11 bootstraps (on different
machines running either 11.0 or 11.1, with either as or gas, with a
vanilla tree or your patch), and was totally intermittent.  So far, it
seemed to affect Solaris 11 only, though, Solaris 10 didn't show it.

I still haven't managed to investigate more closely.

	Rainer
diff mbox

Patch

diff --git a/gcc/config.in b/gcc/config.in
index b13805d..791d14a 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -266,6 +266,12 @@ 
 #endif
 
 
+/* Define if your assembler supports SPARC4 instructions. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_SPARC4
+#endif
+
+
 /* Define if your assembler supports fprnd. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_FPRND
diff --git a/gcc/config/sparc/constraints.md b/gcc/config/sparc/constraints.md
index 472490f..8862ea1 100644
--- a/gcc/config/sparc/constraints.md
+++ b/gcc/config/sparc/constraints.md
@@ -18,7 +18,7 @@ 
 ;; <http://www.gnu.org/licenses/>.
 
 ;;; Unused letters:
-;;;    AB                       
+;;;     B
 ;;;    a        jkl    q  tuv xyz
 
 
@@ -62,6 +62,11 @@ 
 
 ;; Integer constant constraints
 
+(define_constraint "A"
+ "Signed 5-bit integer constant"
+ (and (match_code "const_int")
+      (match_test "SPARC_SIMM5_P (ival)")))
+
 (define_constraint "H"
  "Valid operand of double arithmetic operation"
  (and (match_code "const_double")
diff --git a/gcc/config/sparc/niagara4.md b/gcc/config/sparc/niagara4.md
index 272c8ff..61ca801 100644
--- a/gcc/config/sparc/niagara4.md
+++ b/gcc/config/sparc/niagara4.md
@@ -56,7 +56,7 @@ 
 
 (define_insn_reservation "n4_cti" 2
   (and (eq_attr "cpu" "niagara4")
-    (eq_attr "type" "branch,call,sibcall,call_no_delay_slot,uncond_branch,return"))
+    (eq_attr "type" "cbcond,uncond_cbcond,branch,call,sibcall,call_no_delay_slot,uncond_branch,return"))
   "n4_slot1, nothing")
 
 (define_insn_reservation "n4_fp" 11
diff --git a/gcc/config/sparc/predicates.md b/gcc/config/sparc/predicates.md
index 326524b..b64e109 100644
--- a/gcc/config/sparc/predicates.md
+++ b/gcc/config/sparc/predicates.md
@@ -391,6 +391,14 @@ 
   (ior (match_operand 0 "register_operand")
        (match_operand 0 "uns_small_int_operand")))
 
+;; Return true if OP is a register, or is a CONST_INT that can fit in a
+;; signed 5-bit immediate field.  This is an acceptable second operand for
+;; the cbcond instructions.
+(define_predicate "arith5_operand"
+  (ior (match_operand 0 "register_operand")
+       (and (match_code "const_int")
+            (match_test "SPARC_SIMM5_P (INTVAL (op))"))))
+
 
 ;; Predicates for miscellaneous instructions.
 
diff --git a/gcc/config/sparc/sol2.h b/gcc/config/sparc/sol2.h
index ba2ec35..68cc592 100644
--- a/gcc/config/sparc/sol2.h
+++ b/gcc/config/sparc/sol2.h
@@ -58,8 +58,10 @@  along with GCC; see the file COPYING3.  If not see
    other assemblers will accept.  */
 
 #ifndef USE_GAS
-#define AS_SPARC64_FLAG	"-xarch=v9"
+#define AS_SPARC32_FLAG	"-m32 -xarch=v9"
+#define AS_SPARC64_FLAG	"-m64 -xarch=v9"
 #else
+#define AS_SPARC32_FLAG	"-TSO -32 -Av9"
 #define AS_SPARC64_FLAG	"-TSO -64 -Av9"
 #endif
 
@@ -136,9 +138,9 @@  along with GCC; see the file COPYING3.  If not see
 #undef CPP_CPU64_DEFAULT_SPEC
 #define CPP_CPU64_DEFAULT_SPEC ""
 #undef ASM_CPU32_DEFAULT_SPEC
-#define ASM_CPU32_DEFAULT_SPEC "-xarch=v8plusb"
+#define ASM_CPU32_DEFAULT_SPEC AS_SPARC32_FLAG AS_NIAGARA4_FLAG
 #undef ASM_CPU64_DEFAULT_SPEC
-#define ASM_CPU64_DEFAULT_SPEC AS_SPARC64_FLAG "b"
+#define ASM_CPU64_DEFAULT_SPEC AS_SPARC64_FLAG AS_NIAGARA4_FLAG
 #undef ASM_CPU_DEFAULT_SPEC
 #define ASM_CPU_DEFAULT_SPEC ASM_CPU32_DEFAULT_SPEC
 #endif
@@ -241,7 +243,7 @@  extern const char *host_detect_local_cpu (int argc, const char **argv);
 %{mcpu=niagara:" DEF_ARCH32_SPEC("-xarch=v8plusb") DEF_ARCH64_SPEC(AS_SPARC64_FLAG "b") "} \
 %{mcpu=niagara2:" DEF_ARCH32_SPEC("-xarch=v8plusb") DEF_ARCH64_SPEC(AS_SPARC64_FLAG "b") "} \
 %{mcpu=niagara3:" DEF_ARCH32_SPEC("-xarch=v8plus" AS_NIAGARA3_FLAG) DEF_ARCH64_SPEC(AS_SPARC64_FLAG AS_NIAGARA3_FLAG) "} \
-%{mcpu=niagara4:" DEF_ARCH32_SPEC("-xarch=v8plus" AS_NIAGARA3_FLAG) DEF_ARCH64_SPEC(AS_SPARC64_FLAG AS_NIAGARA3_FLAG) "} \
+%{mcpu=niagara4:" DEF_ARCH32_SPEC(AS_SPARC32_FLAG AS_NIAGARA4_FLAG) DEF_ARCH64_SPEC(AS_SPARC64_FLAG AS_NIAGARA4_FLAG) "} \
 %{!mcpu=niagara4:%{!mcpu=niagara3:%{!mcpu=niagara2:%{!mcpu=niagara:%{!mcpu=ultrasparc3:%{!mcpu=ultrasparc:%{!mcpu=v9:%{mcpu*:" DEF_ARCH32_SPEC("-xarch=v8") DEF_ARCH64_SPEC(AS_SPARC64_FLAG) "}}}}}}}} \
 %{!mcpu*:%(asm_cpu_default)} \
 "
diff --git a/gcc/config/sparc/sparc-protos.h b/gcc/config/sparc/sparc-protos.h
index 97f6233..d5b2b1f 100644
--- a/gcc/config/sparc/sparc-protos.h
+++ b/gcc/config/sparc/sparc-protos.h
@@ -71,7 +71,7 @@  extern void sparc_emit_set_symbolic_const64 (rtx, rtx, rtx);
 extern int sparc_splitdi_legitimate (rtx, rtx);
 extern int sparc_split_regreg_legitimate (rtx, rtx);
 extern int sparc_absnegfloat_split_legitimate (rtx, rtx);
-extern const char *output_ubranch (rtx, int, rtx);
+extern const char *output_ubranch (rtx, rtx);
 extern const char *output_cbranch (rtx, rtx, int, int, int, rtx);
 extern const char *output_return (rtx);
 extern const char *output_sibcall (rtx, rtx);
@@ -79,10 +79,12 @@  extern const char *output_v8plus_shift (rtx, rtx *, const char *);
 extern const char *output_v8plus_mult (rtx, rtx *, const char *);
 extern const char *output_v9branch (rtx, rtx, int, int, int, int, rtx);
 extern const char *output_probe_stack_range (rtx, rtx);
+extern const char *output_cbcond (rtx, rtx, rtx);
 extern bool emit_scc_insn (rtx []);
 extern void emit_conditional_branch_insn (rtx []);
 extern int mems_ok_for_ldd_peep (rtx, rtx, rtx);
 extern int empty_delay_slot (rtx);
+extern int emit_cbcond_nop (rtx);
 extern int eligible_for_return_delay (rtx);
 extern int eligible_for_sibcall_delay (rtx);
 extern int tls_call_delay (rtx);
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index 8849c03..202f064 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -840,6 +840,8 @@  dump_target_flag_bits (const int flags)
     fprintf (stderr, "VIS2 ");
   if (flags & MASK_VIS3)
     fprintf (stderr, "VIS3 ");
+  if (flags & MASK_CBCOND)
+    fprintf (stderr, "CBCOND ");
   if (flags & MASK_DEPRECATED_V8_INSNS)
     fprintf (stderr, "DEPRECATED_V8_INSNS ");
   if (flags & MASK_SPARCLET)
@@ -946,7 +948,7 @@  sparc_option_override (void)
       MASK_V9|MASK_POPC|MASK_VIS2|MASK_VIS3|MASK_FMAF },
     /* UltraSPARC T4 */
     { "niagara4",	MASK_ISA,
-      MASK_V9|MASK_POPC|MASK_VIS2|MASK_VIS3|MASK_FMAF },
+      MASK_V9|MASK_POPC|MASK_VIS2|MASK_VIS3|MASK_FMAF|MASK_CBCOND },
   };
   const struct cpu_table *cpu;
   unsigned int i;
@@ -1073,6 +1075,9 @@  sparc_option_override (void)
 #ifndef HAVE_AS_FMAF_HPC_VIS3
 		   & ~(MASK_FMAF | MASK_VIS3)
 #endif
+#ifndef HAVE_AS_SPARC4
+		   & ~MASK_CBCOND
+#endif
 		   );
 
   /* If -mfpu or -mno-fpu was explicitly used, don't override with
@@ -1088,7 +1093,12 @@  sparc_option_override (void)
   if (TARGET_VIS3)
     target_flags |= MASK_VIS2 | MASK_VIS;
 
-  /* Don't allow -mvis, -mvis2, -mvis3, or -mfmaf if FPU is disabled.  */
+  /* -mcbcond implies -mvis3, -mvis2 and -mvis */
+  if (TARGET_CBCOND)
+    target_flags |= MASK_VIS3 | MASK_VIS2 | MASK_VIS;
+
+  /* Don't allow -mvis, -mvis2, -mvis3, or -mfmaf if FPU is
+     disabled.  */
   if (! TARGET_FPU)
     target_flags &= ~(MASK_VIS | MASK_VIS2 | MASK_VIS3 | MASK_FMAF);
 
@@ -2660,6 +2670,24 @@  emit_v9_brxx_insn (enum rtx_code code, rtx op0, rtx label)
 				    pc_rtx)));
 }
 
+/* Emit a conditional jump insn for the UA2011 architecture using
+   comparison code CODE and jump target LABEL.  This function exists
+   to take advantage of the UA2011 Compare and Branch insns.  */
+
+static void
+emit_cbcond_insn (enum rtx_code code, rtx op0, rtx op1, rtx label)
+{
+  rtx if_then_else;
+
+  if_then_else = gen_rtx_IF_THEN_ELSE (VOIDmode,
+				       gen_rtx_fmt_ee(code, GET_MODE(op0),
+						      op0, op1),
+				       gen_rtx_LABEL_REF (VOIDmode, label),
+				       pc_rtx);
+
+  emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx, if_then_else));
+}
+
 void
 emit_conditional_branch_insn (rtx operands[])
 {
@@ -2674,6 +2702,20 @@  emit_conditional_branch_insn (rtx operands[])
       operands[2] = XEXP (operands[0], 1);
     }
 
+  /* If we can tell early on that the comparison is against a constant
+     that won't fit in the 5-bit signed immediate field of a cbcond,
+     use one of the other v9 conditional branch sequences.  */
+  if (TARGET_CBCOND
+      && GET_CODE (operands[1]) == REG
+      && (GET_MODE (operands[1]) == SImode
+	  || (TARGET_ARCH64 && GET_MODE (operands[1]) == DImode))
+      && (GET_CODE (operands[2]) != CONST_INT
+	  || SPARC_SIMM5_P (INTVAL (operands[2]))))
+    {
+      emit_cbcond_insn (GET_CODE (operands[0]), operands[1], operands[2], operands[3]);
+      return;
+    }
+
   if (TARGET_ARCH64 && operands[2] == const0_rtx
       && GET_CODE (operands[1]) == REG
       && GET_MODE (operands[1]) == DImode)
@@ -3014,6 +3056,44 @@  empty_delay_slot (rtx insn)
   return 1;
 }
 
+/* Return nonzero if we should emit a nop after a cbcond instruction.
+   The cbcond instruction does not have a delay slot, however there is
+   a severe performance penalty if a control transfer appears right
+   after a cbcond.  Therefore we emit a nop when we detect this
+   situation.  */
+
+int
+emit_cbcond_nop (rtx insn)
+{
+  rtx next = next_real_insn (insn);
+
+  if (!next)
+    return 1;
+
+  if (GET_CODE (next) == INSN
+      && GET_CODE (PATTERN (next)) == SEQUENCE)
+    next = XVECEXP (PATTERN (next), 0, 0);
+  else if (GET_CODE (next) == CALL_INSN
+	   && GET_CODE (PATTERN (next)) == PARALLEL)
+    {
+      rtx delay = XVECEXP (PATTERN (next), 0, 1);
+
+      if (GET_CODE (delay) == RETURN)
+	{
+	  /* It's a sibling call.  Do not emit the nop if we're going
+	     to emit something other than the jump itself as the first
+	     instruction of the sibcall sequence.  */
+	  if (sparc_leaf_function_p || TARGET_FLAT)
+	    return 0;
+	}
+    }
+
+  if (NONJUMP_INSN_P (next))
+    return 0;
+
+  return 1;
+}
+
 /* Return nonzero if TRIAL can go into the call delay slot.  */
 
 int
@@ -7102,19 +7182,49 @@  sparc_preferred_simd_mode (enum machine_mode mode)
    DEST is the destination insn (i.e. the label), INSN is the source.  */
 
 const char *
-output_ubranch (rtx dest, int label, rtx insn)
+output_ubranch (rtx dest, rtx insn)
 {
   static char string[64];
   bool v9_form = false;
+  int delta;
   char *p;
 
-  if (TARGET_V9 && INSN_ADDRESSES_SET_P ())
+  /* Even if we are trying to use cbcond for this, evaluate
+     whether we can use V9 branches as our backup plan.  */
+
+  delta = 5000000;
+  if (INSN_ADDRESSES_SET_P ())
+    delta = (INSN_ADDRESSES (INSN_UID (dest))
+	     - INSN_ADDRESSES (INSN_UID (insn)));
+
+  /* Leave some instructions for "slop".  */
+  if (TARGET_V9 && delta >= -260000 && delta < 260000)
+    v9_form = true;
+
+  if (TARGET_CBCOND)
     {
-      int delta = (INSN_ADDRESSES (INSN_UID (dest))
-		   - INSN_ADDRESSES (INSN_UID (insn)));
-      /* Leave some instructions for "slop".  */
-      if (delta >= -260000 && delta < 260000)
-	v9_form = true;
+      bool emit_nop = emit_cbcond_nop (insn);
+      bool far = false;
+      const char *rval;
+
+      if (delta < -500 || delta > 500)
+	far = true;
+
+      if (far)
+	{
+	  if (v9_form)
+	    rval = "ba,a,pt\t%%xcc, %l0";
+	  else
+	    rval = "b,a\t%l0";
+	}
+      else
+	{
+	  if (emit_nop)
+	    rval = "cwbe\t%%g0, %%g0, %l0\n\tnop";
+	  else
+	    rval = "cwbe\t%%g0, %%g0, %l0";
+	}
+      return rval;
     }
 
   if (v9_form)
@@ -7125,7 +7235,7 @@  output_ubranch (rtx dest, int label, rtx insn)
   p = strchr (string, '\0');
   *p++ = '%';
   *p++ = 'l';
-  *p++ = '0' + label;
+  *p++ = '0';
   *p++ = '%';
   *p++ = '(';
   *p = '\0';
@@ -7604,6 +7714,183 @@  sparc_emit_fixunsdi (rtx *operands, enum machine_mode mode)
   emit_label (donelab);
 }
 
+/* Return the string to output a compare and branch instruction to DEST.
+   DEST is the destination insn (i.e. the label), INSN is the source,
+   and OP is the conditional expression.  */
+
+const char *
+output_cbcond (rtx op, rtx dest, rtx insn)
+{
+  enum machine_mode mode = GET_MODE (XEXP (op, 0));
+  enum rtx_code code = GET_CODE (op);
+  static char string[64];
+  int far, emit_nop, len;
+  char *p;
+
+  /* Compare and Branch is limited to +-2KB.  If it is too far away,
+     change
+
+     cxbne X, Y, .LC30
+
+     to
+
+     cxbe X, Y, .+12
+     ba,pt xcc, .LC30
+      nop  */
+
+  len = get_attr_length (insn);
+
+  far = len == 3;
+  emit_nop = len == 2;
+
+  if (far)
+    code = reverse_condition (code);
+
+  p = string;
+
+  *p++ = 'c';
+  *p++ = mode == SImode ? 'w' : 'x';
+  *p++ = 'b';
+
+  switch (code)
+    {
+    case NE:
+      *p++ = 'n';
+      *p++ = 'e';
+      break;
+
+    case EQ:
+      *p++ = 'e';
+      break;
+
+    case GE:
+      if (mode == CC_NOOVmode || mode == CCX_NOOVmode)
+	{
+	  *p++ = 'p';
+	  *p++ = 'o';
+	  *p++ = 's';
+	}
+      else
+	{
+	  *p++ = 'g';
+	  *p++ = 'e';
+	}
+      break;
+
+    case GT:
+      *p++ = 'g';
+      break;
+
+    case LE:
+      *p++ = 'l';
+      *p++ = 'e';
+      break;
+
+    case LT:
+      if (mode == CC_NOOVmode || mode == CCX_NOOVmode)
+	{
+	  *p++ = 'n';
+	  *p++ = 'e';
+	  *p++ = 'g';
+	}
+      else
+	*p++ = 'l';
+      break;
+
+    case GEU:
+      *p++ = 'c';
+      *p++ = 'c';
+      break;
+
+    case GTU:
+      *p++ = 'g';
+      *p++ = 'u';
+      break;
+
+    case LEU:
+      *p++ = 'l';
+      *p++ = 'e';
+      *p++ = 'u';
+      break;
+
+    case LTU:
+      *p++ = 'c';
+      *p++ = 's';
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  *p++ = '\t';
+  *p++ = '%';
+  *p++ = '1';
+  *p++ = ',';
+  *p++ = ' ';
+  *p++ = '%';
+  *p++ = '2';
+  *p++ = ',';
+  *p++ = ' ';
+
+  if (far)
+    {
+      int veryfar = 1, delta;
+
+      if (INSN_ADDRESSES_SET_P ())
+	{
+	  delta = (INSN_ADDRESSES (INSN_UID (dest))
+		   - INSN_ADDRESSES (INSN_UID (insn)));
+	  /* Leave some instructions for "slop".  */
+	  if (delta >= -260000 && delta < 260000)
+	    veryfar = 0;
+	}
+      *p++ = '.';
+      *p++ = '+';
+      *p++ = '1';
+      *p++ = '2';
+      *p++ = '\n';
+      *p++ = '\t';
+      if (veryfar)
+	{
+	  *p++ = 'b';
+	  *p++ = '\t';
+	}
+      else
+	{
+	  *p++ = 'b';
+	  *p++ = 'a';
+	  *p++ = ',';
+	  *p++ = 'p';
+	  *p++ = 't';
+	  *p++ = '\t';
+	  *p++ = '%';
+	  *p++ = '%';
+	  *p++ = 'x';
+	  *p++ = 'c';
+	  *p++ = 'c';
+	  *p++ = ',';
+	  *p++ = ' ';
+	}
+    }
+
+  *p++ = '%';
+  *p++ = 'l';
+  *p++ = '3';
+
+  if (far || emit_nop)
+    {
+      *p++ = '\n';
+      *p++ = '\t';
+      *p++ = 'n';
+      *p++ = 'o';
+      *p++ = 'p';
+    }
+
+  *p = '\0';
+
+  return string;
+}
+
 /* Return the string to output a conditional branch to LABEL, testing
    register REG.  LABEL is the operand number of the label; REG is the
    operand number of the reg.  OP is the conditional expression.  The mode
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index 8f86100..374919f 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -195,7 +195,7 @@  extern enum cmodel sparc_cmodel;
 #endif
 #if TARGET_CPU_DEFAULT == TARGET_CPU_niagara4
 #define CPP_CPU64_DEFAULT_SPEC "-D__sparc_v9__"
-#define ASM_CPU64_DEFAULT_SPEC "-Av9" AS_NIAGARA3_FLAG
+#define ASM_CPU64_DEFAULT_SPEC AS_NIAGARA4_FLAG
 #endif
 
 #else
@@ -337,7 +337,7 @@  extern enum cmodel sparc_cmodel;
 %{mcpu=niagara:%{!mv8plus:-Av9b}} \
 %{mcpu=niagara2:%{!mv8plus:-Av9b}} \
 %{mcpu=niagara3:%{!mv8plus:-Av9" AS_NIAGARA3_FLAG "}} \
-%{mcpu=niagara4:%{!mv8plus:-Av9" AS_NIAGARA3_FLAG "}} \
+%{mcpu=niagara4:%{!mv8plus:" AS_NIAGARA4_FLAG "}} \
 %{!mcpu*:%(asm_cpu_default)} \
 "
 
@@ -1006,7 +1006,8 @@  extern char leaf_reg_remap[];
 /* Local macro to handle the two v9 classes of FP regs.  */
 #define FP_REG_CLASS_P(CLASS) ((CLASS) == FP_REGS || (CLASS) == EXTRA_FP_REGS)
 
-/* Predicates for 10-bit, 11-bit and 13-bit signed constants.  */
+/* Predicates for 5-bit, 10-bit, 11-bit and 13-bit signed constants.  */
+#define SPARC_SIMM5_P(X)  ((unsigned HOST_WIDE_INT) (X) + 0x10 < 0x20)
 #define SPARC_SIMM10_P(X) ((unsigned HOST_WIDE_INT) (X) + 0x200 < 0x400)
 #define SPARC_SIMM11_P(X) ((unsigned HOST_WIDE_INT) (X) + 0x400 < 0x800)
 #define SPARC_SIMM13_P(X) ((unsigned HOST_WIDE_INT) (X) + 0x1000 < 0x2000)
@@ -1746,6 +1747,12 @@  extern int sparc_indent_opcode;
 #define AS_NIAGARA3_FLAG "d"
 #endif
 
+#ifndef HAVE_AS_SPARC4
+#define AS_NIAGARA4_FLAG " -xarch=v9b"
+#else
+#define AS_NIAGARA4_FLAG " -xarch=sparc4"
+#endif
+
 /* We use gcc _mcount for profiling.  */
 #define NO_PROFILE_COUNTERS 0
 
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index f604f46..bdc8a8d 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -257,6 +257,7 @@ 
   "ialu,compare,shift,
    load,sload,store,
    uncond_branch,branch,call,sibcall,call_no_delay_slot,return,
+   cbcond,uncond_cbcond,
    imul,idiv,
    fpload,fpstore,
    fp,fpmove,
@@ -275,6 +276,12 @@ 
   (symbol_ref "(empty_delay_slot (insn)
 		? EMPTY_DELAY_SLOT_TRUE : EMPTY_DELAY_SLOT_FALSE)"))
 
+;; True if we are making use of compare-and-branch instructions.
+;; True if we should emit a nop after a cbcond instruction
+(define_attr "emit_cbcond_nop" "false,true"
+  (symbol_ref "(emit_cbcond_nop (insn)
+                ? EMIT_CBCOND_NOP_TRUE : EMIT_CBCOND_NOP_FALSE)"))
+
 (define_attr "branch_type" "none,icc,fcc,reg"
   (const_string "none"))
 
@@ -377,6 +384,30 @@ 
 	       (if_then_else (eq_attr "empty_delay_slot" "true")
 		 (const_int 4)
 		 (const_int 3))))
+         (eq_attr "type" "cbcond")
+	   (if_then_else (lt (pc) (match_dup 3))
+	     (if_then_else (lt (minus (match_dup 3) (pc)) (const_int 500))
+               (if_then_else (eq_attr "emit_cbcond_nop" "true")
+                 (const_int 2)
+                 (const_int 1))
+               (const_int 3))
+	     (if_then_else (lt (minus (pc) (match_dup 3)) (const_int 500))
+               (if_then_else (eq_attr "emit_cbcond_nop" "true")
+                 (const_int 2)
+                 (const_int 1))
+               (const_int 3)))
+         (eq_attr "type" "uncond_cbcond")
+	   (if_then_else (lt (pc) (match_dup 0))
+	     (if_then_else (lt (minus (match_dup 0) (pc)) (const_int 500))
+               (if_then_else (eq_attr "emit_cbcond_nop" "true")
+                 (const_int 2)
+                 (const_int 1))
+               (const_int 1))
+	     (if_then_else (lt (minus (pc) (match_dup 0)) (const_int 500))
+               (if_then_else (eq_attr "emit_cbcond_nop" "true")
+                 (const_int 2)
+                 (const_int 1))
+               (const_int 1)))
 	 ] (const_int 1)))
 
 ;; FP precision.
@@ -397,7 +428,7 @@ 
 		? TLS_CALL_DELAY_TRUE : TLS_CALL_DELAY_FALSE)"))
 
 (define_attr "in_call_delay" "false,true"
-  (cond [(eq_attr "type" "uncond_branch,branch,call,sibcall,call_no_delay_slot,multi")
+  (cond [(eq_attr "type" "uncond_branch,branch,cbcond,uncond_cbcond,call,sibcall,call_no_delay_slot,multi")
 		(const_string "false")
 	 (eq_attr "type" "load,fpload,store,fpstore")
 		(if_then_else (eq_attr "length" "1")
@@ -431,19 +462,19 @@ 
 ;; because it prevents us from moving back the final store of inner loops.
 
 (define_attr "in_branch_delay" "false,true"
-  (if_then_else (and (eq_attr "type" "!uncond_branch,branch,call,sibcall,call_no_delay_slot,multi")
+  (if_then_else (and (eq_attr "type" "!uncond_branch,branch,cbcond,uncond_cbcond,call,sibcall,call_no_delay_slot,multi")
 		     (eq_attr "length" "1"))
 		(const_string "true")
 		(const_string "false")))
 
 (define_attr "in_uncond_branch_delay" "false,true"
-  (if_then_else (and (eq_attr "type" "!uncond_branch,branch,call,sibcall,call_no_delay_slot,multi")
+  (if_then_else (and (eq_attr "type" "!uncond_branch,branch,cbcond,uncond_cbcond,call,sibcall,call_no_delay_slot,multi")
 		     (eq_attr "length" "1"))
 		(const_string "true")
 		(const_string "false")))
 
 (define_attr "in_annul_branch_delay" "false,true"
-  (if_then_else (and (eq_attr "type" "!uncond_branch,branch,call,sibcall,call_no_delay_slot,multi")
+  (if_then_else (and (eq_attr "type" "!uncond_branch,branch,cbcond,uncond_cbcond,call,sibcall,call_no_delay_slot,multi")
 		     (eq_attr "length" "1"))
 		(const_string "true")
 		(const_string "false")))
@@ -1313,6 +1344,32 @@ 
 ;; SPARC V9-specific jump insns.  None of these are guaranteed to be
 ;; in the architecture.
 
+(define_insn "*cbcond_sp32"
+  [(set (pc)
+        (if_then_else (match_operator 0 "noov_compare_operator"
+                       [(match_operand:SI 1 "register_operand" "r")
+                        (match_operand:SI 2 "arith5_operand" "rA")])
+                      (label_ref (match_operand 3 "" ""))
+                      (pc)))]
+  "TARGET_CBCOND"
+{
+  return output_cbcond (operands[0], operands[3], insn);
+}
+  [(set_attr "type" "cbcond")])
+
+(define_insn "*cbcond_sp64"
+  [(set (pc)
+        (if_then_else (match_operator 0 "noov_compare_operator"
+                       [(match_operand:DI 1 "register_operand" "r")
+                        (match_operand:DI 2 "arith5_operand" "rA")])
+                      (label_ref (match_operand 3 "" ""))
+                      (pc)))]
+  "TARGET_ARCH64 && TARGET_CBCOND"
+{
+  return output_cbcond (operands[0], operands[3], insn);
+}
+  [(set_attr "type" "cbcond")])
+
 ;; There are no 32 bit brreg insns.
 
 ;; XXX
@@ -6076,12 +6133,22 @@ 
 
 ;; Unconditional and other jump instructions.
 
-(define_insn "jump"
+(define_expand "jump"
   [(set (pc) (label_ref (match_operand 0 "" "")))]
-  ""
-  "* return output_ubranch (operands[0], 0, insn);"
+  "")
+
+(define_insn "*jump_ubranch"
+  [(set (pc) (label_ref (match_operand 0 "" "")))]
+  "!TARGET_CBCOND"
+  "* return output_ubranch (operands[0], insn);"
   [(set_attr "type" "uncond_branch")])
 
+(define_insn "*jump_cbcond"
+  [(set (pc) (label_ref (match_operand 0 "" "")))]
+  "TARGET_CBCOND"
+  "* return output_ubranch (operands[0], insn);"
+  [(set_attr "type" "uncond_cbcond")])
+
 (define_expand "tablejump"
   [(parallel [(set (pc) (match_operand 0 "register_operand" "r"))
 	      (use (label_ref (match_operand 1 "" "")))])]
diff --git a/gcc/config/sparc/sparc.opt b/gcc/config/sparc/sparc.opt
index 58ba6b7..241cb07 100644
--- a/gcc/config/sparc/sparc.opt
+++ b/gcc/config/sparc/sparc.opt
@@ -73,6 +73,10 @@  mvis3
 Target Report Mask(VIS3)
 Use UltraSPARC Visual Instruction Set version 3.0 extensions
 
+mcbcond
+Target Report Mask(CBCOND)
+Use UltraSPARC Compare-and-Branch extensions
+
 mfmaf
 Target Report Mask(FMAF)
 Use UltraSPARC Fused Multiply-Add extensions
diff --git a/gcc/configure b/gcc/configure
index a223c60..3fc0088 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -24090,6 +24090,48 @@  if test $gcc_cv_as_sparc_fmaf = yes; then
 $as_echo "#define HAVE_AS_FMAF_HPC_VIS3 1" >>confdefs.h
 
 fi
+
+    { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for SPARC4 instructions" >&5
+$as_echo_n "checking assembler for SPARC4 instructions... " >&6; }
+if test "${gcc_cv_as_sparc_sparc4+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_sparc_sparc4=no
+  if test x$gcc_cv_as != x; then
+    $as_echo '.text
+       .register %g2, #scratch
+       .register %g3, #scratch
+       .align 4
+       cxbe %g2, %g3, 1f
+1:     cwbneg %g2, %g3, 1f
+1:     sha1
+       md5
+       aes_kexpand0 %f4, %f6, %f8
+       des_round %f38, %f40, %f42, %f44
+       camellia_f %f54, %f56, %f58, %f60
+       kasumi_fi_xor %f46, %f48, %f50, %f52' > conftest.s
+    if { ac_try='$gcc_cv_as $gcc_cv_as_flags -xarch=sparc4 -o conftest.o conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+    then
+	gcc_cv_as_sparc_sparc4=yes
+    else
+      echo "configure: failed program was" >&5
+      cat conftest.s >&5
+    fi
+    rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_sparc_sparc4" >&5
+$as_echo "$gcc_cv_as_sparc_sparc4" >&6; }
+if test $gcc_cv_as_sparc_sparc4 = yes; then
+
+$as_echo "#define HAVE_AS_SPARC4 1" >>confdefs.h
+
+fi
     ;;
 
   i[34567]86-*-* | x86_64-*-*)
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 17e1d86..95007a2 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3501,6 +3501,24 @@  foo:
        fnaddd %f10, %f12, %f14],,
       [AC_DEFINE(HAVE_AS_FMAF_HPC_VIS3, 1,
                 [Define if your assembler supports FMAF, HPC, and VIS 3.0 instructions.])])
+
+    gcc_GAS_CHECK_FEATURE([SPARC4 instructions],
+      gcc_cv_as_sparc_sparc4,,
+      [-xarch=sparc4],
+      [.text
+       .register %g2, #scratch
+       .register %g3, #scratch
+       .align 4
+       cxbe %g2, %g3, 1f
+1:     cwbneg %g2, %g3, 1f
+1:     sha1
+       md5
+       aes_kexpand0 %f4, %f6, %f8
+       des_round %f38, %f40, %f42, %f44
+       camellia_f %f54, %f56, %f58, %f60
+       kasumi_fi_xor %f46, %f48, %f50, %f52],,
+      [AC_DEFINE(HAVE_AS_SPARC4, 1,
+                [Define if your assembler supports SPARC4 instructions.])])
     ;;
 
 changequote(,)dnl
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index f8c9230..fc2addc 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -918,6 +918,7 @@  See RS/6000 and PowerPC Options.
 -munaligned-doubles  -mno-unaligned-doubles @gol
 -mv8plus  -mno-v8plus  -mvis  -mno-vis @gol
 -mvis2  -mno-vis2  -mvis3  -mno-vis3 @gol
+-mcbcond -mno-cbcond @gol
 -mfmaf  -mno-fmaf  -mpopc  -mno-popc @gol
 -mfix-at697f}
 
@@ -18878,6 +18879,16 @@  default is @option{-mvis3} when targeting a cpu that supports such
 instructions, such as niagara-3 and later.  Setting @option{-mvis3}
 also sets @option{-mvis2} and @option{-mvis}.
 
+@item -mcbcond
+@itemx -mno-cbcond
+@opindex mcbcond
+@opindex mno-cbcond
+With @option{-mcbcond}, GCC generates code that takes advantage of
+compare-and-branch instructions, as defined in the Sparc Architecture 2011.
+The default is @option{-mcbcond} when targeting a cpu that supports such
+instructions, such as niagara-4 and later.  Setting @option{-mcbcond} also
+sets @option{-mvis3}, @option{-mvis2}, and @option{-mvis}.
+
 @item -mpopc
 @itemx -mno-popc
 @opindex mpopc
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 32866d5..250cb1c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3157,6 +3157,9 @@  when the Visual Instruction Set is available.
 @item h
 64-bit global or out register for the SPARC-V8+ architecture.
 
+@item A
+Signed 5-bit constant
+
 @item D
 A vector constant