diff mbox

[wide-int] Handle more ltu_p cases inline

Message ID 87vbzcph8h.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com
State New
Headers show

Commit Message

Richard Sandiford Nov. 28, 2013, 5:29 p.m. UTC
The existing ltu_p fast path can handle any pairs of single-HWI inputs,
even for precision > HOST_BITS_PER_WIDE_INT.  In that case both xl and
yl are implicitly sign-extended to the larger precision, but with the
extended values still being compared as unsigned.  The extension doesn't
change the result in that case.

When compiling a recent fold-const.ii, this reduces the number of
ltu_p_large calls from 23849 to 697.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard

Comments

Richard Earnshaw Nov. 28, 2013, 6:40 p.m. UTC | #1
On 28/11/13 17:29, Richard Sandiford wrote:
> The existing ltu_p fast path can handle any pairs of single-HWI inputs,
> even for precision > HOST_BITS_PER_WIDE_INT.  In that case both xl and
> yl are implicitly sign-extended to the larger precision, but with the
> extended values still being compared as unsigned.  The extension doesn't
> change the result in that case.
> 
> When compiling a recent fold-const.ii, this reduces the number of
> ltu_p_large calls from 23849 to 697.
> 

Are these sorts of nuggets of information going to be recorded anywhere?

R.
Kenneth Zadeck Nov. 29, 2013, 2:04 a.m. UTC | #2
this is fine.

kenny
On 11/28/2013 12:29 PM, Richard Sandiford wrote:
> The existing ltu_p fast path can handle any pairs of single-HWI inputs,
> even for precision > HOST_BITS_PER_WIDE_INT.  In that case both xl and
> yl are implicitly sign-extended to the larger precision, but with the
> extended values still being compared as unsigned.  The extension doesn't
> change the result in that case.
>
> When compiling a recent fold-const.ii, this reduces the number of
> ltu_p_large calls from 23849 to 697.
>
> Tested on x86_64-linux-gnu.  OK to install?
>
> Thanks,
> Richard
>
>
> Index: gcc/alias.c
> ===================================================================
> --- gcc/alias.c	2013-11-20 12:12:49.393055063 +0000
> +++ gcc/alias.c	2013-11-28 12:24:23.307549245 +0000
> @@ -342,7 +342,7 @@ ao_ref_from_mem (ao_ref *ref, const_rtx
>   	  || (DECL_P (ref->base)
>   	      && (DECL_SIZE (ref->base) == NULL_TREE
>   		  || TREE_CODE (DECL_SIZE (ref->base)) != INTEGER_CST
> -		  || wi::ltu_p (DECL_SIZE (ref->base),
> +		  || wi::ltu_p (wi::to_offset (DECL_SIZE (ref->base)),
>   				ref->offset + ref->size)))))
>       return false;
>   
> Index: gcc/wide-int.h
> ===================================================================
> --- gcc/wide-int.h	2013-11-28 11:44:39.041731636 +0000
> +++ gcc/wide-int.h	2013-11-28 12:48:36.200764215 +0000
> @@ -1740,13 +1740,15 @@ wi::ltu_p (const T1 &x, const T2 &y)
>     unsigned int precision = get_binary_precision (x, y);
>     WIDE_INT_REF_FOR (T1) xi (x, precision);
>     WIDE_INT_REF_FOR (T2) yi (y, precision);
> -  /* Optimize comparisons with constants and with sub-HWI unsigned
> -     integers.  */
> +  /* Optimize comparisons with constants.  */
>     if (STATIC_CONSTANT_P (yi.len == 1 && yi.val[0] >= 0))
>       return xi.len == 1 && xi.to_uhwi () < (unsigned HOST_WIDE_INT) yi.val[0];
>     if (STATIC_CONSTANT_P (xi.len == 1 && xi.val[0] >= 0))
>       return yi.len != 1 || yi.to_uhwi () > (unsigned HOST_WIDE_INT) xi.val[0];
> -  if (precision <= HOST_BITS_PER_WIDE_INT)
> +  /* Optimize the case of two HWIs.  The HWIs are implicitly sign-extended
> +     for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both
> +     values does not change the result.  */
> +  if (xi.len + yi.len == 2)
>       {
>         unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
>         unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
>
Richard Sandiford Nov. 29, 2013, 10:08 a.m. UTC | #3
Richard Earnshaw <rearnsha@arm.com> writes:
> On 28/11/13 17:29, Richard Sandiford wrote:
>> The existing ltu_p fast path can handle any pairs of single-HWI inputs,
>> even for precision > HOST_BITS_PER_WIDE_INT.  In that case both xl and
>> yl are implicitly sign-extended to the larger precision, but with the
>> extended values still being compared as unsigned.  The extension doesn't
>> change the result in that case.
>> 
>> When compiling a recent fold-const.ii, this reduces the number of
>> ltu_p_large calls from 23849 to 697.
>> 
>
> Are these sorts of nuggets of information going to be recorded anywhere?

You mean put the fold-const.ii numbers in a comment?  I could if you like,
but it's really just a general principle that checking for two len == 1
integers catches more cases than checking for the precision being <=
HOST_BITS_PER_WIDE_INT.  Every precision <= HOST_BITS_PER_WIDE_INT
will have a length of 1, but many integers with a length of 1 have a
precision > HOST_BITS_PER_WIDE_INT (because of offset_int and widest_int).

Thanks,
Richard
Richard Biener Nov. 29, 2013, 11:18 a.m. UTC | #4
On Fri, Nov 29, 2013 at 11:08 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Richard Earnshaw <rearnsha@arm.com> writes:
>> On 28/11/13 17:29, Richard Sandiford wrote:
>>> The existing ltu_p fast path can handle any pairs of single-HWI inputs,
>>> even for precision > HOST_BITS_PER_WIDE_INT.  In that case both xl and
>>> yl are implicitly sign-extended to the larger precision, but with the
>>> extended values still being compared as unsigned.  The extension doesn't
>>> change the result in that case.
>>>
>>> When compiling a recent fold-const.ii, this reduces the number of
>>> ltu_p_large calls from 23849 to 697.
>>>
>>
>> Are these sorts of nuggets of information going to be recorded anywhere?
>
> You mean put the fold-const.ii numbers in a comment?  I could if you like,
> but it's really just a general principle that checking for two len == 1
> integers catches more cases than checking for the precision being <=
> HOST_BITS_PER_WIDE_INT.  Every precision <= HOST_BITS_PER_WIDE_INT
> will have a length of 1, but many integers with a length of 1 have a
> precision > HOST_BITS_PER_WIDE_INT (because of offset_int and widest_int).

Indeed - to be really useful shortcuts should work with len == 1
instead of just with precision <= HOST_BITS_PER_WIDE_INT.
Which usually means handling of result len == 2.

Richard.

> Thanks,
> Richard
diff mbox

Patch

Index: gcc/alias.c
===================================================================
--- gcc/alias.c	2013-11-20 12:12:49.393055063 +0000
+++ gcc/alias.c	2013-11-28 12:24:23.307549245 +0000
@@ -342,7 +342,7 @@  ao_ref_from_mem (ao_ref *ref, const_rtx
 	  || (DECL_P (ref->base)
 	      && (DECL_SIZE (ref->base) == NULL_TREE
 		  || TREE_CODE (DECL_SIZE (ref->base)) != INTEGER_CST
-		  || wi::ltu_p (DECL_SIZE (ref->base),
+		  || wi::ltu_p (wi::to_offset (DECL_SIZE (ref->base)),
 				ref->offset + ref->size)))))
     return false;
 
Index: gcc/wide-int.h
===================================================================
--- gcc/wide-int.h	2013-11-28 11:44:39.041731636 +0000
+++ gcc/wide-int.h	2013-11-28 12:48:36.200764215 +0000
@@ -1740,13 +1740,15 @@  wi::ltu_p (const T1 &x, const T2 &y)
   unsigned int precision = get_binary_precision (x, y);
   WIDE_INT_REF_FOR (T1) xi (x, precision);
   WIDE_INT_REF_FOR (T2) yi (y, precision);
-  /* Optimize comparisons with constants and with sub-HWI unsigned
-     integers.  */
+  /* Optimize comparisons with constants.  */
   if (STATIC_CONSTANT_P (yi.len == 1 && yi.val[0] >= 0))
     return xi.len == 1 && xi.to_uhwi () < (unsigned HOST_WIDE_INT) yi.val[0];
   if (STATIC_CONSTANT_P (xi.len == 1 && xi.val[0] >= 0))
     return yi.len != 1 || yi.to_uhwi () > (unsigned HOST_WIDE_INT) xi.val[0];
-  if (precision <= HOST_BITS_PER_WIDE_INT)
+  /* Optimize the case of two HWIs.  The HWIs are implicitly sign-extended
+     for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both
+     values does not change the result.  */
+  if (xi.len + yi.len == 2)
     {
       unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
       unsigned HOST_WIDE_INT yl = yi.to_uhwi ();