diff mbox

Add __builtin_clrsb, similar to clz/ctz

Message ID 4DFFA1AE.7070405@codesourcery.com
State New
Headers show

Commit Message

Bernd Schmidt June 20, 2011, 7:38 p.m. UTC
On 06/16/2011 06:25 PM, Richard Henderson wrote:
> On 06/16/2011 05:44 AM, Bernd Schmidt wrote:
>> +@deftypefn {Built-in Function} int __builtin_clrsb (unsigned int x)
>> +Returns the number of leading redundant sign bits in @var{x}, starting
>> +at the most significant bit position.
>> +@end deftypefn
> 
> Do we want a signed argument, since we're talking about signs?

Err, yes. It's signed everywhere else (builtins.def etc.).

> It would seem that unlike clz, this function is not undefined for zero.
> What about INT_MIN?  Do all cpus handle those edge cases the same way?

-1 and zero should both produce the same value, 31 (for a 32 bit
integer). I don't see why INT_MIN should be special - the return value
is zero. This is true for C6X and Blackfin; ARM documentation suggests
it's also true for their VCLS instruction. I've not found proper
picochip documentation but some other documents that suggest it's also
implemented this way.

> Do you get smaller code in general from
> 
>   if (x < 0)
>     x = ~x;
>   if (x == 0)
>     return W_TYPE_SIZE - 1;
>   count_leading_zeros(ret, x);
>   return ret - 1;

Probably.

>> -(define_insn "signbitssi2"
>> +(define_insn "clrsbsi2"
>>    [(set (match_operand:HI 0 "register_operand" "=d")
>>  	(if_then_else:HI
>>  	 (lt (match_operand:SI 1 "register_operand" "d") (const_int 0))
> 
> No use of the new rtx code?

D'oh. Blackfin has a (clrsb:HI (operand:SI)) instruction, so adding this
showed a problem with some of the existing simplify_const_unop cases:
for ffs/clz/ctz/clrsb/parity/popcount, we should look at the mode of the
operand, rather than the mode of the operation. This limits what we can
do in that function, since op_mode is sometimes VOIDmode - we really
should add builtin folders for these at some point.

New patch below. Retested on i686 and bfin.


Bernd
libgcc/
	* Makefile.in (lib2funcs): Add _clrsbsi2 and _clrsbdi2.
	* libgcc-std.ver.in (GCC_4.7.0): New section.

	gcc/
	* doc/extend.texi (__builtin_clrsb, __builtin_clrsbl,
	__builtin_clrsbll): Document.
	* doc/rtl.texi (clrsb): New entry.
	* optabs.c (widen_leading): Renamed from widen_clz.  New argument
	UNOPTAB.  All callers changed.  Use UNOPTAB instead of clz_optab.
	(expand_unop): Handle clrsb_optab.
	(init_optabs): Initialize it.
	* optabs.h (enum optab_index): New entry OTI_clrsb.
	(clrsb_optab): Define.
	* genopinit.c (optabs): Add an entry for it.
	* builtins.c (expand_builtin): Handle clrsb builtin functions.
	* builtins.def (BUILT_IN_CLRSB, BUILT_IN_CLRSBIMAX, BUILT_IN_CLRSBL,
	BUILT_IN_CLRSBLL): New.
	* rtl.def (CLRSB): New code.
	* dwarf2out.c (mem_loc_descriptor): Handle it.
	* simplify-rtx.c (simplify_const_unary_operation): Likewise.
	Use op_mode rather than mode when optimizing ffs, clz, ctz, parity
	and popcount.
	* libgcc2.c (__clrsbSI2, __clrsbDI2): New functions.
	* libgcc2.h (__clrsbSI2, __clrsbDI2): Define and declare.
	(__ctzDI2): Move declaration.
	* config/bfin/bfin.md (clrsbsi2): New expander.
	(signbitssi2): Use the CLRSB rtx.
	(clrsbhi2): Renamed from signbitshi2.  Use the CLRSB rtx.
	* config/bfin/bfin.c (bdesc_1arg): Changed accordingly.

	gcc/testsuite/
	* gcc.c-torture/excute/builtin-bitops-1.c (MAKE_FUNS): Make
	my_clrsb test functions.
	(main): Test clrsb.
	* gcc.dg/builtin-protos-1.c (test_s, test_u, test_sl, test_ul,
	test_sll, test_ull): Add clrsb tests.
	* gcc.dg/torture/builtin-attr-1.c: Add tests for clrsb, clrsbl,
	clrsbll.

Comments

Richard Henderson June 20, 2011, 8:11 p.m. UTC | #1
On 06/20/2011 12:38 PM, Bernd Schmidt wrote:
> D'oh. Blackfin has a (clrsb:HI (operand:SI)) instruction, so adding this
> showed a problem with some of the existing simplify_const_unop cases:
> for ffs/clz/ctz/clrsb/parity/popcount, we should look at the mode of the
> operand, rather than the mode of the operation. This limits what we can
> do in that function, since op_mode is sometimes VOIDmode - we really
> should add builtin folders for these at some point.
> 
> New patch below. Retested on i686 and bfin.
> 

Ok.

I'm not going to bike-shed the function name.  If, at some point
before 4.7 is released we agree on another name, we can change it.


r~
H.J. Lu June 23, 2011, 5:40 a.m. UTC | #2
On Mon, Jun 20, 2011 at 12:38 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 06/16/2011 06:25 PM, Richard Henderson wrote:
>> On 06/16/2011 05:44 AM, Bernd Schmidt wrote:
>>> +@deftypefn {Built-in Function} int __builtin_clrsb (unsigned int x)
>>> +Returns the number of leading redundant sign bits in @var{x}, starting
>>> +at the most significant bit position.
>>> +@end deftypefn
>>
>> Do we want a signed argument, since we're talking about signs?
>
> Err, yes. It's signed everywhere else (builtins.def etc.).
>
>> It would seem that unlike clz, this function is not undefined for zero.
>> What about INT_MIN?  Do all cpus handle those edge cases the same way?
>
> -1 and zero should both produce the same value, 31 (for a 32 bit
> integer). I don't see why INT_MIN should be special - the return value
> is zero. This is true for C6X and Blackfin; ARM documentation suggests
> it's also true for their VCLS instruction. I've not found proper
> picochip documentation but some other documents that suggest it's also
> implemented this way.
>
>> Do you get smaller code in general from
>>
>>   if (x < 0)
>>     x = ~x;
>>   if (x == 0)
>>     return W_TYPE_SIZE - 1;
>>   count_leading_zeros(ret, x);
>>   return ret - 1;
>
> Probably.
>
>>> -(define_insn "signbitssi2"
>>> +(define_insn "clrsbsi2"
>>>    [(set (match_operand:HI 0 "register_operand" "=d")
>>>      (if_then_else:HI
>>>       (lt (match_operand:SI 1 "register_operand" "d") (const_int 0))
>>
>> No use of the new rtx code?
>
> D'oh. Blackfin has a (clrsb:HI (operand:SI)) instruction, so adding this
> showed a problem with some of the existing simplify_const_unop cases:
> for ffs/clz/ctz/clrsb/parity/popcount, we should look at the mode of the
> operand, rather than the mode of the operation. This limits what we can
> do in that function, since op_mode is sometimes VOIDmode - we really
> should add builtin folders for these at some point.
>
> New patch below. Retested on i686 and bfin.

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49512
Hans-Peter Nilsson July 12, 2011, 12:23 a.m. UTC | #3
On Mon, 20 Jun 2011, Bernd Schmidt wrote:
> New patch below. Retested on i686 and bfin.

Yay, bikeshedding opportunity! :P
Can we call them "leading *repeated* sign bits"? (in docs and
comments)  Calling them "redundant" makes you think the
representation is  not two's complement but new and improved...
like (bitwise) (1<<31) == -1 or something.

brgds, H-P
PS. just a minor change, I can do the legwork.
Jakub Jelinek Aug. 23, 2011, 9:05 a.m. UTC | #4
On Mon, Jun 20, 2011 at 09:38:22PM +0200, Bernd Schmidt wrote:
> D'oh. Blackfin has a (clrsb:HI (operand:SI)) instruction, so adding this
> showed a problem with some of the existing simplify_const_unop cases:
> for ffs/clz/ctz/clrsb/parity/popcount, we should look at the mode of the
> operand, rather than the mode of the operation. This limits what we can
> do in that function, since op_mode is sometimes VOIDmode - we really
> should add builtin folders for these at some point.

> 	* simplify-rtx.c (simplify_const_unary_operation): Likewise.
> 	Use op_mode rather than mode when optimizing ffs, clz, ctz, parity
> 	and popcount.

This change is IMHO wrong, see e.g.
PR50161 where we have (subreg:SI (popcount:DI (const_int -1))).  This
is supposed to yield 64, but with your changes
it yields 128 - the op_mode here is VOIDmode, so the first if that used
to handle it is no longer used, but as width is <= 2 *
HOST_BITS_PER_WIDE_INT, it is treated as TImode constant.

IMHO best would be just to mandate that for these unary ops like
FFS, CLZ, CLRSB, CTZ, POPCOUNT, PARITY, BSWAP the operand has the same mode
(or VOIDmode) as the unary rtx and that the operation is being carried in
the unop's mode, it shouldn't be hard to adjust the few
*.md patterns (mainly in avr.md, bfin.md).
I think it is bad enough that ZERO_EXTEND must not have CONST_INT argument,
making CONST_INT undefined also for all these unary ops is unnecessary.
I think for NEG/NOT we already have such a guarantee (and thus your change
pessimizes it anyway).
avr.md/bfin.md etc. can use (subreg:HI (popcount:SI (match_operand:SI ...)))
(or (zero_extend:HI (popcount:QI (match_operand:QI ...))) and similar.

Or the
  /* We can do some operations on integer CONST_DOUBLEs.  Also allow
     for a DImode operation on a CONST_INT.  */
  else if (GET_MODE (op) == VOIDmode
           && width <= HOST_BITS_PER_WIDE_INT * 2
           && (GET_CODE (op) == CONST_DOUBLE
               || CONST_INT_P (op)))
case would need to change too to test that op_width ==
HOST_BITS_PER_WIDE_INT * 2 (but then, it would again pessimize at least
NEG/NOT/ABS that are defined sanely).  But we'd also need to change many
other places, e.g. cse_process_notes_1, that currently special case
ZERO_EXTEND/SUBREG (and sometimes SIGN_EXTEND) and pessimize them because
those rtxes aren't allowed to have VOIDmode arguments.  cse_process_notes_1
perhaps could be changed for VOIDmode new_rtx to try to
simplify_replace_rtx it...

	Jakub
Bernd Schmidt Aug. 23, 2011, 9:35 a.m. UTC | #5
On 08/23/11 11:05, Jakub Jelinek wrote:
> On Mon, Jun 20, 2011 at 09:38:22PM +0200, Bernd Schmidt wrote:
>> D'oh. Blackfin has a (clrsb:HI (operand:SI)) instruction, so adding this
>> showed a problem with some of the existing simplify_const_unop cases:
>> for ffs/clz/ctz/clrsb/parity/popcount, we should look at the mode of the
>> operand, rather than the mode of the operation. This limits what we can
>> do in that function, since op_mode is sometimes VOIDmode - we really
>> should add builtin folders for these at some point.
> 
>> 	* simplify-rtx.c (simplify_const_unary_operation): Likewise.
>> 	Use op_mode rather than mode when optimizing ffs, clz, ctz, parity
>> 	and popcount.
> 
> This change is IMHO wrong,

Conceptually, I think it is exactly right. It may however be
inconvenient in some cases.

> see e.g.
> PR50161 where we have (subreg:SI (popcount:DI (const_int -1))).  This
> is supposed to yield 64, but with your changes
> it yields 128 - the op_mode here is VOIDmode,

This is what shouldn't happen.

> cse_process_notes_1
> perhaps could be changed for VOIDmode new_rtx to try to
> simplify_replace_rtx it...

Is this where the problem came from? Sounds like it's worth a try.

Wasn't Richard S. working on a patch to give constants modes?


Bernd
Richard Biener Aug. 23, 2011, 9:49 a.m. UTC | #6
On Tue, Aug 23, 2011 at 11:35 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 08/23/11 11:05, Jakub Jelinek wrote:
>> On Mon, Jun 20, 2011 at 09:38:22PM +0200, Bernd Schmidt wrote:
>>> D'oh. Blackfin has a (clrsb:HI (operand:SI)) instruction, so adding this
>>> showed a problem with some of the existing simplify_const_unop cases:
>>> for ffs/clz/ctz/clrsb/parity/popcount, we should look at the mode of the
>>> operand, rather than the mode of the operation. This limits what we can
>>> do in that function, since op_mode is sometimes VOIDmode - we really
>>> should add builtin folders for these at some point.
>>
>>>      * simplify-rtx.c (simplify_const_unary_operation): Likewise.
>>>      Use op_mode rather than mode when optimizing ffs, clz, ctz, parity
>>>      and popcount.
>>
>> This change is IMHO wrong,
>
> Conceptually, I think it is exactly right. It may however be
> inconvenient in some cases.
>
>> see e.g.
>> PR50161 where we have (subreg:SI (popcount:DI (const_int -1))).  This
>> is supposed to yield 64, but with your changes
>> it yields 128 - the op_mode here is VOIDmode,
>
> This is what shouldn't happen.

If it shouldn't happen, does some verifier catch it?

>> cse_process_notes_1
>> perhaps could be changed for VOIDmode new_rtx to try to
>> simplify_replace_rtx it...
>
> Is this where the problem came from? Sounds like it's worth a try.
>
> Wasn't Richard S. working on a patch to give constants modes?
>
>
> Bernd
>
>
Jakub Jelinek Aug. 23, 2011, 9:52 a.m. UTC | #7
On Tue, Aug 23, 2011 at 11:35:07AM +0200, Bernd Schmidt wrote:
> > cse_process_notes_1
> > perhaps could be changed for VOIDmode new_rtx to try to
> > simplify_replace_rtx it...
> 
> Is this where the problem came from? Sounds like it's worth a try.

In this case, yes.  But there are many other places all around the
compiler that need to disallow unary op with VOIDmode operand.
In cse.c alone e.g. fold_rtx (twice), in combine.c e.g. in do_SUBST,
subst, etc.  Do we want to special case all those 7 unary ops there too?
Is it really worth it to save one subreg or truncate in the md patterns
for rarely used rtxes?

> Wasn't Richard S. working on a patch to give constants modes?

I don't think this is achievable for 4.7...

	Jakub
Bernd Schmidt Aug. 23, 2011, 9:57 a.m. UTC | #8
On 08/23/11 11:52, Jakub Jelinek wrote:
> On Tue, Aug 23, 2011 at 11:35:07AM +0200, Bernd Schmidt wrote:
>>> cse_process_notes_1
>>> perhaps could be changed for VOIDmode new_rtx to try to
>>> simplify_replace_rtx it...
>>
>> Is this where the problem came from? Sounds like it's worth a try.
> 
> In this case, yes.  But there are many other places all around the
> compiler that need to disallow unary op with VOIDmode operand.
> In cse.c alone e.g. fold_rtx (twice), in combine.c e.g. in do_SUBST,
> subst, etc.  Do we want to special case all those 7 unary ops there too?
> Is it really worth it to save one subreg or truncate in the md patterns
> for rarely used rtxes?

Maybe not. I'll approve a patch to change it back, even if I think it's
not a good representation.


Bernd
Richard Sandiford Aug. 23, 2011, 10:07 a.m. UTC | #9
Bernd Schmidt <bernds@codesourcery.com> writes:
> Wasn't Richard S. working on a patch to give constants modes?

A whole series more like.  Don't hold your breath!

Richard
diff mbox

Patch

Index: libgcc/Makefile.in
===================================================================
--- libgcc/Makefile.in	(revision 174339)
+++ libgcc/Makefile.in	(working copy)
@@ -320,7 +320,7 @@  lib2funcs = _muldi3 _negdi2 _lshrdi3 _as
 	    _ctzsi2 _ctzdi2 _popcount_tab _popcountsi2 _popcountdi2	   \
 	    _paritysi2 _paritydi2 _powisf2 _powidf2 _powixf2 _powitf2	   \
 	    _mulsc3 _muldc3 _mulxc3 _multc3 _divsc3 _divdc3 _divxc3	   \
-	    _divtc3 _bswapsi2 _bswapdi2
+	    _divtc3 _bswapsi2 _bswapdi2 _clrsbsi2 _clrsbdi2
 
 # The floating-point conversion routines that involve a single-word integer.
 # XX stands for the integer mode.
Index: libgcc/libgcc-std.ver.in
===================================================================
--- libgcc/libgcc-std.ver.in	(revision 174339)
+++ libgcc/libgcc-std.ver.in	(working copy)
@@ -1920,3 +1920,10 @@  GCC_4.6.0 {
   __morestack_initial_sp
   __splitstack_find
 }
+
+%inherit GCC_4.7.0 GCC_4.6.0
+GCC_4.7.0 {
+  __PFX__clrsbsi2
+  __PFX__clrsbdi2
+  __PFX__clrsbti2
+}
Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 174339)
+++ gcc/doc/extend.texi	(working copy)
@@ -7828,6 +7828,12 @@  Returns the number of trailing 0-bits in
 significant bit position.  If @var{x} is 0, the result is undefined.
 @end deftypefn
 
+@deftypefn {Built-in Function} int __builtin_clrsb (int x)
+Returns the number of leading redundant sign bits in @var{x}, i.e. the
+number of bits following the most significant bit which are identical
+to it.  There are no special cases for 0 or other values. 
+@end deftypefn
+
 @deftypefn {Built-in Function} int __builtin_popcount (unsigned int x)
 Returns the number of 1-bits in @var{x}.
 @end deftypefn
@@ -7852,6 +7858,11 @@  Similar to @code{__builtin_ctz}, except
 @code{unsigned long}.
 @end deftypefn
 
+@deftypefn {Built-in Function} int __builtin_clrsbl (long)
+Similar to @code{__builtin_clrsb}, except the argument type is
+@code{long}.
+@end deftypefn
+
 @deftypefn {Built-in Function} int __builtin_popcountl (unsigned long)
 Similar to @code{__builtin_popcount}, except the argument type is
 @code{unsigned long}.
@@ -7877,6 +7888,11 @@  Similar to @code{__builtin_ctz}, except
 @code{unsigned long long}.
 @end deftypefn
 
+@deftypefn {Built-in Function} int __builtin_clrsbll (long long)
+Similar to @code{__builtin_clrsb}, except the argument type is
+@code{long long}.
+@end deftypefn
+
 @deftypefn {Built-in Function} int __builtin_popcountll (unsigned long long)
 Similar to @code{__builtin_popcount}, except the argument type is
 @code{unsigned long long}.
Index: gcc/doc/rtl.texi
===================================================================
--- gcc/doc/rtl.texi	(revision 174339)
+++ gcc/doc/rtl.texi	(working copy)
@@ -2400,6 +2400,14 @@  zero if @var{x} is zero.)  The mode of @
 depending on the target machine, various mode combinations may be
 valid.
 
+@findex clrsb
+@item (clrsb:@var{m} @var{x})
+Represents the number of redundant leading sign bits in @var{x},
+represented as an integer of mode @var{m}, starting at the most
+significant bit position.  This is one less than the number of leading
+sign bits (either 0 or 1), with no special cases.  The mode of @var{x}
+will usually be an integer mode and may differ from @var{m}.
+
 @findex clz
 @item (clz:@var{m} @var{x})
 Represents the number of leading 0-bits in @var{x}, represented as an
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 174339)
+++ gcc/optabs.c	(working copy)
@@ -2317,9 +2317,12 @@  expand_simple_unop (enum machine_mode mo
 /* Try calculating
 	(clz:narrow x)
    as
-	(clz:wide (zero_extend:wide x)) - ((width wide) - (width narrow)).  */
+	(clz:wide (zero_extend:wide x)) - ((width wide) - (width narrow)).
+
+   A similar operation can be used for clrsb.  UNOPTAB says which operation
+   we are trying to expand.  */
 static rtx
-widen_clz (enum machine_mode mode, rtx op0, rtx target)
+widen_leading (enum machine_mode mode, rtx op0, rtx target, optab unoptab)
 {
   enum mode_class mclass = GET_MODE_CLASS (mode);
   if (CLASS_HAS_WIDER_MODES_P (mclass))
@@ -2329,7 +2332,7 @@  widen_clz (enum machine_mode mode, rtx o
 	   wider_mode != VOIDmode;
 	   wider_mode = GET_MODE_WIDER_MODE (wider_mode))
 	{
-	  if (optab_handler (clz_optab, wider_mode) != CODE_FOR_nothing)
+	  if (optab_handler (unoptab, wider_mode) != CODE_FOR_nothing)
 	    {
 	      rtx xop0, temp, last;
 
@@ -2338,7 +2341,7 @@  widen_clz (enum machine_mode mode, rtx o
 	      if (target == 0)
 		target = gen_reg_rtx (mode);
 	      xop0 = widen_operand (op0, wider_mode, mode, true, false);
-	      temp = expand_unop (wider_mode, clz_optab, xop0, NULL_RTX, true);
+	      temp = expand_unop (wider_mode, unoptab, xop0, NULL_RTX, true);
 	      if (temp != 0)
 		temp = expand_binop (wider_mode, sub_optab, temp,
 				     GEN_INT (GET_MODE_BITSIZE (wider_mode)
@@ -2832,7 +2835,7 @@  expand_unop (enum machine_mode mode, opt
   /* Widening (or narrowing) clz needs special treatment.  */
   if (unoptab == clz_optab)
     {
-      temp = widen_clz (mode, op0, target);
+      temp = widen_leading (mode, op0, target, unoptab);
       if (temp)
 	return temp;
 
@@ -2844,7 +2847,15 @@  expand_unop (enum machine_mode mode, opt
 	    return temp;
 	}
 
-	goto try_libcall;
+      goto try_libcall;
+    }
+
+  if (unoptab == clrsb_optab)
+    {
+      temp = widen_leading (mode, op0, target, unoptab);
+      if (temp)
+	return temp;
+      goto try_libcall;
     }
 
   /* Widening (or narrowing) bswap needs special treatment.  */
@@ -2999,7 +3010,8 @@  expand_unop (enum machine_mode mode, opt
       /* All of these functions return small values.  Thus we choose to
 	 have them return something that isn't a double-word.  */
       if (unoptab == ffs_optab || unoptab == clz_optab || unoptab == ctz_optab
-	  || unoptab == popcount_optab || unoptab == parity_optab)
+	  || unoptab == clrsb_optab || unoptab == popcount_optab
+	  || unoptab == parity_optab)
 	outmode
 	  = GET_MODE (hard_libcall_value (TYPE_MODE (integer_type_node),
 					  optab_libfunc (unoptab, mode)));
@@ -5943,6 +5955,7 @@  init_optabs (void)
   init_optab (ffs_optab, FFS);
   init_optab (clz_optab, CLZ);
   init_optab (ctz_optab, CTZ);
+  init_optab (clrsb_optab, CLRSB);
   init_optab (popcount_optab, POPCOUNT);
   init_optab (parity_optab, PARITY);
   init_optab (sqrt_optab, SQRT);
@@ -6173,6 +6186,9 @@  init_optabs (void)
   ctz_optab->libcall_basename = "ctz";
   ctz_optab->libcall_suffix = '2';
   ctz_optab->libcall_gen = gen_int_libfunc;
+  clrsb_optab->libcall_basename = "clrsb";
+  clrsb_optab->libcall_suffix = '2';
+  clrsb_optab->libcall_gen = gen_int_libfunc;
   popcount_optab->libcall_basename = "popcount";
   popcount_optab->libcall_suffix = '2';
   popcount_optab->libcall_gen = gen_int_libfunc;
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	(revision 174339)
+++ gcc/optabs.h	(working copy)
@@ -220,6 +220,7 @@  enum optab_index
   OTI_ffs,
   OTI_clz,
   OTI_ctz,
+  OTI_clrsb,
   OTI_popcount,
   OTI_parity,
   /* Square root */
@@ -456,6 +457,7 @@  enum optab_index
 #define ffs_optab (&optab_table[OTI_ffs])
 #define clz_optab (&optab_table[OTI_clz])
 #define ctz_optab (&optab_table[OTI_ctz])
+#define clrsb_optab (&optab_table[OTI_clrsb])
 #define popcount_optab (&optab_table[OTI_popcount])
 #define parity_optab (&optab_table[OTI_parity])
 #define sqrt_optab (&optab_table[OTI_sqrt])
Index: gcc/genopinit.c
===================================================================
--- gcc/genopinit.c	(revision 174339)
+++ gcc/genopinit.c	(working copy)
@@ -199,6 +199,7 @@  static const char * const optabs[] =
   "set_optab_handler (ffs_optab, $A, CODE_FOR_$(ffs$a2$))",
   "set_optab_handler (clz_optab, $A, CODE_FOR_$(clz$a2$))",
   "set_optab_handler (ctz_optab, $A, CODE_FOR_$(ctz$a2$))",
+  "set_optab_handler (clrsb_optab, $A, CODE_FOR_$(clrsb$a2$))",
   "set_optab_handler (popcount_optab, $A, CODE_FOR_$(popcount$a2$))",
   "set_optab_handler (parity_optab, $A, CODE_FOR_$(parity$a2$))",
   "set_optab_handler (mov_optab, $A, CODE_FOR_$(mov$a$))",
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	(revision 174339)
+++ gcc/builtins.c	(working copy)
@@ -6068,6 +6068,14 @@  expand_builtin (tree exp, rtx target, rt
 	return target;
       break;
 
+    CASE_INT_FN (BUILT_IN_CLRSB):
+    case BUILT_IN_CLRSBIMAX:
+      target = expand_builtin_unop (target_mode, exp, target,
+				    subtarget, clrsb_optab);
+      if (target)
+	return target;
+      break;
+
     CASE_INT_FN (BUILT_IN_POPCOUNT):
     case BUILT_IN_POPCOUNTIMAX:
       target = expand_builtin_unop (target_mode, exp, target,
Index: gcc/testsuite/gcc.c-torture/execute/builtin-bitops-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/builtin-bitops-1.c	(revision 174339)
+++ gcc/testsuite/gcc.c-torture/execute/builtin-bitops-1.c	(working copy)
@@ -62,6 +62,16 @@  int my_clz##suffix(type x) {						\
     return i;								\
 }									\
 									\
+int my_clrsb##suffix(type x) {						\
+    int i;								\
+    int leading = (x >> CHAR_BIT * sizeof (type) - 1) & 1;		\
+    for (i = 1; i < CHAR_BIT * sizeof (type); i++)			\
+	if (((x >> ((CHAR_BIT * sizeof (type)) - i - 1)) & 1)		\
+	    != leading)							\
+	    break;							\
+    return i - 1;							\
+}									\
+									\
 int my_popcount##suffix(type x) {					\
     int i;								\
     int count = 0;							\
@@ -176,6 +186,8 @@  main (void)
       if (ints[i] != 0
 	  && __builtin_ctz (ints[i]) != my_ctz (ints[i]))
 	abort ();
+      if (__builtin_clrsb (ints[i]) != my_clrsb (ints[i]))
+	abort ();
       if (__builtin_popcount (ints[i]) != my_popcount (ints[i]))
 	abort ();
       if (__builtin_parity (ints[i]) != my_parity (ints[i]))
@@ -192,6 +204,8 @@  main (void)
       if (longs[i] != 0
 	  && __builtin_ctzl (longs[i]) != my_ctzl (longs[i]))
 	abort ();
+      if (__builtin_clrsbl (longs[i]) != my_clrsbl (longs[i]))
+	abort ();
       if (__builtin_popcountl (longs[i]) != my_popcountl (longs[i]))
 	abort ();
       if (__builtin_parityl (longs[i]) != my_parityl (longs[i]))
@@ -208,6 +222,8 @@  main (void)
       if (longlongs[i] != 0
 	  && __builtin_ctzll (longlongs[i]) != my_ctzll (longlongs[i]))
 	abort ();
+      if (__builtin_clrsbll (longlongs[i]) != my_clrsbll (longlongs[i]))
+	abort ();
       if (__builtin_popcountll (longlongs[i]) != my_popcountll (longlongs[i]))
 	abort ();
       if (__builtin_parityll (longlongs[i]) != my_parityll (longlongs[i]))
@@ -223,6 +239,8 @@  main (void)
     abort ();								\
   if (x != 0 && __builtin_ctz##suffix (x) != my_ctz##suffix (x))	\
     abort ();								\
+  if (__builtin_clrsb##suffix (x) != my_clrsb##suffix (x))		\
+    abort ();								\
   if (__builtin_popcount##suffix (x) != my_popcount##suffix (x))	\
     abort ();								\
   if (__builtin_parity##suffix (x) != my_parity##suffix (x))		\
Index: gcc/testsuite/gcc.dg/builtin-protos-1.c
===================================================================
--- gcc/testsuite/gcc.dg/builtin-protos-1.c	(revision 174339)
+++ gcc/testsuite/gcc.dg/builtin-protos-1.c	(working copy)
@@ -7,6 +7,7 @@  test_s (signed int x)
   return __builtin_abs (x)	/* { dg-bogus "as unsigned due to prototype" } */
     + __builtin_clz (x)		/* { dg-warning "as unsigned due to prototype" } */
     + __builtin_ctz (x)		/* { dg-warning "as unsigned due to prototype" } */
+    + __builtin_clrsb (x)	/* { dg-bogus "as unsigned due to prototype" } */
     + __builtin_ffs (x)		/* { dg-bogus "as unsigned due to prototype" } */
     + __builtin_parity (x)	/* { dg-warning "as unsigned due to prototype" } */
     + __builtin_popcount (x);	/* { dg-warning "as unsigned due to prototype" } */
@@ -18,6 +19,7 @@  test_u (unsigned int x)
   return __builtin_abs (x)	/* { dg-warning "as signed due to prototype" } */
     + __builtin_clz (x)		/* { dg-bogus "as signed due to prototype" } */
     + __builtin_ctz (x)		/* { dg-bogus "as signed due to prototype" } */
+    + __builtin_clrsb (x)	/* { dg-warning "as signed due to prototype" } */
     + __builtin_ffs (x)		/* { dg-warning "as signed due to prototype" } */
     + __builtin_parity (x)	/* { dg-bogus "as signed due to prototype" } */
     + __builtin_popcount (x);	/* { dg-bogus "as signed due to prototype" } */
@@ -29,6 +31,7 @@  test_sl (signed long x)
   return __builtin_labs (x)	/* { dg-bogus "as unsigned due to prototype" } */
     + __builtin_clzl (x)	/* { dg-warning "as unsigned due to prototype" } */
     + __builtin_ctzl (x)	/* { dg-warning "as unsigned due to prototype" } */
+    + __builtin_clrsbl (x)	/* { dg-bogus "as unsigned due to prototype" } */
     + __builtin_ffsl (x)	/* { dg-bogus "as unsigned due to prototype" } */
     + __builtin_parityl (x)	/* { dg-warning "as unsigned due to prototype" } */
     + __builtin_popcountl (x);	/* { dg-warning "as unsigned due to prototype" } */
@@ -40,6 +43,7 @@  test_ul (unsigned long x)
   return __builtin_labs (x)	/* { dg-warning "as signed due to prototype" } */
     + __builtin_clzl (x)	/* { dg-bogus "as signed due to prototype" } */
     + __builtin_ctzl (x)	/* { dg-bogus "as signed due to prototype" } */
+    + __builtin_clrsbl (x)	/* { dg-warning "as signed due to prototype" } */
     + __builtin_ffsl (x)	/* { dg-warning "as signed due to prototype" } */
     + __builtin_parityl (x)	/* { dg-bogus "as signed due to prototype" } */
     + __builtin_popcountl (x);	/* { dg-bogus "as signed due to prototype" } */
@@ -51,6 +55,7 @@  test_sll (signed long long x)
   return __builtin_llabs (x)	/* { dg-bogus "as unsigned due to prototype" } */
     + __builtin_clzll (x)	/* { dg-warning "as unsigned due to prototype" } */
     + __builtin_ctzll (x)	/* { dg-warning "as unsigned due to prototype" } */
+    + __builtin_clrsbll (x)	/* { dg-bogus "as unsigned due to prototype" } */
     + __builtin_ffsll (x)	/* { dg-bogus "as unsigned due to prototype" } */
     + __builtin_parityll (x)	/* { dg-warning "as unsigned due to prototype" } */
     + __builtin_popcountll (x);	/* { dg-warning "as unsigned due to prototype" } */
@@ -62,6 +67,7 @@  test_ull (unsigned long long x)
   return __builtin_llabs (x)	/* { dg-warning "as signed due to prototype" } */
     + __builtin_clzll (x)	/* { dg-bogus "as signed due to prototype" } */
     + __builtin_ctzll (x)	/* { dg-bogus "as signed due to prototype" } */
+    + __builtin_clrsbll (x)	/* { dg-warning "as signed due to prototype" } */
     + __builtin_ffsll (x)	/* { dg-warning "as signed due to prototype" } */
     + __builtin_parityll (x)	/* { dg-bogus "as signed due to prototype" } */
     + __builtin_popcountll (x);	/* { dg-bogus "as signed due to prototype" } */
Index: gcc/testsuite/gcc.dg/torture/builtin-attr-1.c
===================================================================
--- gcc/testsuite/gcc.dg/torture/builtin-attr-1.c	(revision 174339)
+++ gcc/testsuite/gcc.dg/torture/builtin-attr-1.c	(working copy)
@@ -416,6 +416,9 @@  BUILTIN_TEST1 (clzll, long long)
 BUILTIN_TEST1 (ctz, int)
 BUILTIN_TEST1 (ctzl, long)
 BUILTIN_TEST1 (ctzll, long long)
+BUILTIN_TEST1 (clrsb, int)
+BUILTIN_TEST1 (clrsbl, long)
+BUILTIN_TEST1 (clrsbll, long long)
 TEST1         (ffs, int, int)
 TEST1         (ffsl, long, int)
 TEST1         (ffsll, long long, int)
Index: gcc/builtins.def
===================================================================
--- gcc/builtins.def	(revision 174339)
+++ gcc/builtins.def	(working copy)
@@ -620,6 +620,10 @@  DEF_GCC_BUILTIN        (BUILT_IN_CTZ, "c
 DEF_GCC_BUILTIN        (BUILT_IN_CTZIMAX, "ctzimax", BT_FN_INT_UINTMAX, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN        (BUILT_IN_CTZL, "ctzl", BT_FN_INT_ULONG, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN        (BUILT_IN_CTZLL, "ctzll", BT_FN_INT_ULONGLONG, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN        (BUILT_IN_CLRSB, "clrsb", BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN        (BUILT_IN_CLRSBIMAX, "clrsbimax", BT_FN_INT_INTMAX, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN        (BUILT_IN_CLRSBL, "clrsbl", BT_FN_INT_LONG, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN        (BUILT_IN_CLRSBLL, "clrsbll", BT_FN_INT_LONGLONG, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_EXT_LIB_BUILTIN    (BUILT_IN_DCGETTEXT, "dcgettext", BT_FN_STRING_CONST_STRING_CONST_STRING_INT, ATTR_FORMAT_ARG_2)
 DEF_EXT_LIB_BUILTIN    (BUILT_IN_DGETTEXT, "dgettext", BT_FN_STRING_CONST_STRING_CONST_STRING, ATTR_FORMAT_ARG_2)
 DEF_GCC_BUILTIN        (BUILT_IN_DWARF_CFA, "dwarf_cfa", BT_FN_PTR, ATTR_NULL)
Index: gcc/rtl.def
===================================================================
--- gcc/rtl.def	(revision 174339)
+++ gcc/rtl.def	(working copy)
@@ -613,6 +613,10 @@  DEF_RTL_EXPR(BSWAP, "bswap", "e", RTX_UN
    or 0 if arg is 0.  */
 DEF_RTL_EXPR(FFS, "ffs", "e", RTX_UNARY)
 
+/* Count number of leading redundant sign bits (number of leading
+   sign bits minus one).  */
+DEF_RTL_EXPR(CLRSB, "clrsb", "e", RTX_UNARY)
+
 /* Count leading zeros.  */
 DEF_RTL_EXPR(CLZ, "clz", "e", RTX_UNARY)
 
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	(revision 174339)
+++ gcc/dwarf2out.c	(working copy)
@@ -14874,6 +14874,7 @@  mem_loc_descriptor (rtx rtl, enum machin
     case FFS:
     case CLZ:
     case CTZ:
+    case CLRSB:
     case POPCOUNT:
     case PARITY:
     case ASM_OPERANDS:
Index: gcc/libgcc2.c
===================================================================
--- gcc/libgcc2.c	(revision 174339)
+++ gcc/libgcc2.c	(working copy)
@@ -762,7 +762,50 @@  __ctzDI2 (UDWtype x)
   return ret + add;
 }
 #endif
+
+#ifdef L_clrsbsi2
+#undef int
+int
+__clrsbSI2 (Wtype x)
+{
+  Wtype ret;
 
+  if (x < 0)
+    x = ~x;
+  if (x == 0)
+    return W_TYPE_SIZE - 1;
+  count_leading_zeros (ret, x);
+  return ret - 1;
+}
+#endif
+
+#ifdef L_clrsbdi2
+#undef int
+int
+__clrsbDI2 (DWtype x)
+{
+  const DWunion uu = {.ll = x};
+  UWtype word;
+  Wtype ret, add;
+
+  if (uu.s.high == 0)
+    word = uu.s.low, add = W_TYPE_SIZE;
+  else if (uu.s.high == -1)
+    word = ~uu.s.low, add = W_TYPE_SIZE;
+  else if (uu.s.high >= 0)
+    word = uu.s.high, add = 0;
+  else
+    word = ~uu.s.high, add = 0;
+
+  if (word == 0)
+    ret = W_TYPE_SIZE;
+  else
+    count_leading_zeros (ret, word);
+
+  return ret + add - 1;
+}
+#endif
+
 #ifdef L_popcount_tab
 const UQItype __popcount_tab[256] =
 {
Index: gcc/libgcc2.h
===================================================================
--- gcc/libgcc2.h	(revision 174339)
+++ gcc/libgcc2.h	(working copy)
@@ -315,11 +315,13 @@  typedef int shift_count_type __attribute
 #define __ffsSI2	__NW(ffs,2)
 #define __clzSI2	__NW(clz,2)
 #define __ctzSI2	__NW(ctz,2)
+#define __clrsbSI2	__NW(clrsb,2)
 #define __popcountSI2	__NW(popcount,2)
 #define __paritySI2	__NW(parity,2)
 #define __ffsDI2	__NDW(ffs,2)
 #define __clzDI2	__NDW(clz,2)
 #define __ctzDI2	__NDW(ctz,2)
+#define __clrsbDI2	__NDW(clrsb,2)
 #define __popcountDI2	__NDW(popcount,2)
 #define __parityDI2	__NDW(parity,2)
 
@@ -508,9 +510,11 @@  extern const UQItype __clz_tab[256];
 extern int __clzDI2 (UDWtype);
 extern int __clzSI2 (UWtype);
 extern int __ctzSI2 (UWtype);
+extern int __ctzDI2 (UDWtype);
+extern int __clrsbSI2 (Wtype);
+extern int __clrsbDI2 (DWtype);
 extern int __ffsSI2 (UWtype);
 extern int __ffsDI2 (DWtype);
-extern int __ctzDI2 (UDWtype);
 extern int __popcountSI2 (UWtype);
 extern int __popcountDI2 (UDWtype);
 extern int __paritySI2 (UWtype);
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	(revision 174339)
+++ gcc/simplify-rtx.c	(working copy)
@@ -1127,6 +1127,7 @@  simplify_const_unary_operation (enum rtx
 				rtx op, enum machine_mode op_mode)
 {
   unsigned int width = GET_MODE_BITSIZE (mode);
+  unsigned int op_width = GET_MODE_BITSIZE (op_mode);
 
   if (code == VEC_DUPLICATE)
     {
@@ -1237,7 +1238,8 @@  simplify_const_unary_operation (enum rtx
     }
 
   if (CONST_INT_P (op)
-      && width <= HOST_BITS_PER_WIDE_INT && width > 0)
+      && width <= HOST_BITS_PER_WIDE_INT
+      && op_width <= HOST_BITS_PER_WIDE_INT && op_width > 0)
     {
       HOST_WIDE_INT arg0 = INTVAL (op);
       HOST_WIDE_INT val;
@@ -1257,40 +1259,50 @@  simplify_const_unary_operation (enum rtx
 	  break;
 
 	case FFS:
-	  arg0 &= GET_MODE_MASK (mode);
+	  arg0 &= GET_MODE_MASK (op_mode);
 	  val = ffs_hwi (arg0);
 	  break;
 
 	case CLZ:
-	  arg0 &= GET_MODE_MASK (mode);
-	  if (arg0 == 0 && CLZ_DEFINED_VALUE_AT_ZERO (mode, val))
+	  arg0 &= GET_MODE_MASK (op_mode);
+	  if (arg0 == 0 && CLZ_DEFINED_VALUE_AT_ZERO (op_mode, val))
 	    ;
 	  else
-	    val = GET_MODE_BITSIZE (mode) - floor_log2 (arg0) - 1;
+	    val = GET_MODE_BITSIZE (op_mode) - floor_log2 (arg0) - 1;
+	  break;
+
+	case CLRSB:
+	  arg0 &= GET_MODE_MASK (op_mode);
+	  if (arg0 == 0)
+	    val = GET_MODE_BITSIZE (op_mode) - 1;
+	  else if (arg0 >= 0)
+	    val = GET_MODE_BITSIZE (op_mode) - floor_log2 (arg0) - 2;
+	  else if (arg0 < 0)
+	    val = GET_MODE_BITSIZE (op_mode) - floor_log2 (~arg0) - 2;
 	  break;
 
 	case CTZ:
-	  arg0 &= GET_MODE_MASK (mode);
+	  arg0 &= GET_MODE_MASK (op_mode);
 	  if (arg0 == 0)
 	    {
 	      /* Even if the value at zero is undefined, we have to come
 		 up with some replacement.  Seems good enough.  */
-	      if (! CTZ_DEFINED_VALUE_AT_ZERO (mode, val))
-		val = GET_MODE_BITSIZE (mode);
+	      if (! CTZ_DEFINED_VALUE_AT_ZERO (op_mode, val))
+		val = GET_MODE_BITSIZE (op_mode);
 	    }
 	  else
 	    val = ctz_hwi (arg0);
 	  break;
 
 	case POPCOUNT:
-	  arg0 &= GET_MODE_MASK (mode);
+	  arg0 &= GET_MODE_MASK (op_mode);
 	  val = 0;
 	  while (arg0)
 	    val++, arg0 &= arg0 - 1;
 	  break;
 
 	case PARITY:
-	  arg0 &= GET_MODE_MASK (mode);
+	  arg0 &= GET_MODE_MASK (op_mode);
 	  val = 0;
 	  while (arg0)
 	    val++, arg0 &= arg0 - 1;
Index: gcc/config/bfin/bfin.c
===================================================================
--- gcc/config/bfin/bfin.c	(revision 174339)
+++ gcc/config/bfin/bfin.c	(working copy)
@@ -6254,11 +6254,11 @@  static const struct builtin_description
 
   { CODE_FOR_ones, "__builtin_bfin_ones", BFIN_BUILTIN_ONES, 0 },
 
-  { CODE_FOR_signbitshi2, "__builtin_bfin_norm_fr1x16", BFIN_BUILTIN_NORM_1X16, 0 },
+  { CODE_FOR_clrsbhi2, "__builtin_bfin_norm_fr1x16", BFIN_BUILTIN_NORM_1X16, 0 },
   { CODE_FOR_ssneghi2, "__builtin_bfin_negate_fr1x16", BFIN_BUILTIN_NEG_1X16, 0 },
   { CODE_FOR_abshi2, "__builtin_bfin_abs_fr1x16", BFIN_BUILTIN_ABS_1X16, 0 },
 
-  { CODE_FOR_signbitssi2, "__builtin_bfin_norm_fr1x32", BFIN_BUILTIN_NORM_1X32, 0 },
+  { CODE_FOR_clrsbsi2, "__builtin_bfin_norm_fr1x32", BFIN_BUILTIN_NORM_1X32, 0 },
   { CODE_FOR_ssroundsi2, "__builtin_bfin_round_fr1x32", BFIN_BUILTIN_ROUND_1X32, 0 },
   { CODE_FOR_ssnegsi2, "__builtin_bfin_negate_fr1x32", BFIN_BUILTIN_NEG_1X32, 0 },
   { CODE_FOR_ssabssi2, "__builtin_bfin_abs_fr1x32", BFIN_BUILTIN_ABS_1X32, 0 },
Index: gcc/config/bfin/bfin.md
===================================================================
--- gcc/config/bfin/bfin.md	(revision 174339)
+++ gcc/config/bfin/bfin.md	(working copy)
@@ -1461,12 +1461,19 @@  (define_insn "one_cmplsi2"
   "%0 = ~%1;"
   [(set_attr "type" "alu0")])
 
+(define_expand "clrsbsi2"
+  [(set (match_dup 2)
+	(clrsb:HI (match_operand:SI 1 "register_operand" "d")))
+   (set (match_operand:SI 0 "register_operand")
+	(zero_extend:SI (match_dup 2)))]
+  ""
+{
+  operands[2] = gen_reg_rtx (HImode);
+})
+
 (define_insn "signbitssi2"
   [(set (match_operand:HI 0 "register_operand" "=d")
-	(if_then_else:HI
-	 (lt (match_operand:SI 1 "register_operand" "d") (const_int 0))
-	 (clz:HI (not:SI (match_dup 1)))
-	 (clz:HI (match_dup 1))))]
+	(clrsb:HI (match_operand:SI 1 "register_operand" "d")))]
   ""
   "%h0 = signbits %1%!"
   [(set_attr "type" "dsp32")])
@@ -1518,12 +1525,9 @@  (define_insn "ssneghi2"
   "%0 = -%1 (V)%!"
   [(set_attr "type" "dsp32")])
 
-(define_insn "signbitshi2"
+(define_insn "clrsbhi2"
   [(set (match_operand:HI 0 "register_operand" "=d")
-	(if_then_else:HI
-	 (lt (match_operand:HI 1 "register_operand" "d") (const_int 0))
-	 (clz:HI (not:HI (match_dup 1)))
-	 (clz:HI (match_dup 1))))]
+	(clrsb:HI (match_operand:HI 1 "register_operand" "d")))]
   ""
   "%h0 = signbits %h1%!"
   [(set_attr "type" "dsp32")])