[wide-int] Handle more cmps and cmpu cases inline

Message ID	87r4a0ph0x.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com
State	New
Headers	show Return-Path: <gcc-patches-return-356608-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=pnVXvfLp30tkZlSlWgexG8VFxZ7ItU9jCfRjqcViiSWlJjDwgZ jdnm6yVIYL/1sEkTv3I/R0VYKe0K4MVUXdsDUXZf4kip3rVudk4VCn+qmqVB4VRr O2KsnqUffcQi4kaAhTX5CNwLByiNQapcXiRlTssljBQYi2dl20McMJkqc= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Gateway: Authorized Use Only! Violators will be prosecuted for <gcc-patches@gcc.gnu.org> from <rsandifo@linux.vnet.ibm.com>; Thu, 28 Nov 2013 17:34:14 -0000 Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 28 Nov 2013 17:34:13 -0000 From: Richard Sandiford <rsandifo@linux.vnet.ibm.com> To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, Kenneth Zadeck <zadeck@naturalbridge.com>, Mike Stump <mikestump@comcast.net>, rsandifo@linux.vnet.ibm.com Cc: Kenneth Zadeck <zadeck@naturalbridge.com>, Mike Stump <mikestump@comcast.net> Subject: [wide-int] Handle more cmps and cmpu cases inline Date: Thu, 28 Nov 2013 17:34:06 +0000 Message-ID: <87r4a0ph0x.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain

Message ID

87r4a0ph0x.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:mime-version:content-type; q=dns;
	s=default; b=pnVXvfLp30tkZlSlWgexG8VFxZ7ItU9jCfRjqcViiSWlJjDwgZ
	jdnm6yVIYL/1sEkTv3I/R0VYKe0K4MVUXdsDUXZf4kip3rVudk4VCn+qmqVB4VRr
	O2KsnqUffcQi4kaAhTX5CNwLByiNQapcXiRlTssljBQYi2dl20McMJkqc=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org,
	Kenneth Zadeck <zadeck@naturalbridge.com>,
	Mike Stump <mikestump@comcast.net>,
	rsandifo@linux.vnet.ibm.com
Cc: Kenneth Zadeck <zadeck@naturalbridge.com>,
	Mike Stump <mikestump@comcast.net>
Subject: [wide-int] Handle more cmps and cmpu cases inline
Date: Thu, 28 Nov 2013 17:34:06 +0000
Message-ID: <87r4a0ph0x.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain

Commit Message

Richard Sandiford Nov. 28, 2013, 5:34 p.m. UTC

As Richi asked, this patch makes cmps use the same shortcuts as lts_p.
It also makes cmpu use the shortcut that I justed added to ltu_p.

On that same fold-const.ii testcase, this reduces the number of cmps_large
calls from 66924 to 916.  It reduces the number of cmpu_large calls from
3462 to 4.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard

Comments

Kenneth Zadeck Nov. 29, 2013, 2:06 a.m. UTC | #1

like the add/sub patch, enhance the comment so that it says that it is 
designed to hit the widestint and offset int common cases.

kenny
On 11/28/2013 12:34 PM, Richard Sandiford wrote:
> As Richi asked, this patch makes cmps use the same shortcuts as lts_p.
> It also makes cmpu use the shortcut that I justed added to ltu_p.
>
> On that same fold-const.ii testcase, this reduces the number of cmps_large
> calls from 66924 to 916.  It reduces the number of cmpu_large calls from
> 3462 to 4.
>
> Tested on x86_64-linux-gnu.  OK to install?
>
> Thanks,
> Richard
>
>
> Index: gcc/wide-int.h
> ===================================================================
> --- gcc/wide-int.h	2001-01-01 00:00:00.000000000 +0000
> +++ gcc/wide-int.h	2013-11-28 16:08:22.527681077 +0000
> @@ -1858,17 +1858,31 @@ wi::cmps (const T1 &x, const T2 &y)
>     unsigned int precision = get_binary_precision (x, y);
>     WIDE_INT_REF_FOR (T1) xi (x, precision);
>     WIDE_INT_REF_FOR (T2) yi (y, precision);
> -  if (precision <= HOST_BITS_PER_WIDE_INT)
> +  if (wi::fits_shwi_p (yi))
>       {
> -      HOST_WIDE_INT xl = xi.to_shwi ();
> -      HOST_WIDE_INT yl = yi.to_shwi ();
> -      if (xl < yl)
> +      /* Special case for comparisons with 0.  */
> +      if (STATIC_CONSTANT_P (yi.val[0] == 0))
> +	return neg_p (xi) ? -1 : !(xi.len == 1 && xi.val[0] == 0);
> +      /* If x fits into a signed HWI, we can compare directly.  */
> +      if (wi::fits_shwi_p (xi))
> +	{
> +	  HOST_WIDE_INT xl = xi.to_shwi ();
> +	  HOST_WIDE_INT yl = yi.to_shwi ();
> +	  return xl < yl ? -1 : xl > yl;
> +	}
> +      /* If x doesn't fit and is negative, then it must be more
> +	 negative than any signed HWI, and hence smaller than y.  */
> +      if (neg_p (xi))
>   	return -1;
> -      else if (xl > yl)
> -	return 1;
> -      else
> -	return 0;
> +      /* If x is positive, then it must be larger than any signed HWI,
> +	 and hence greater than y.  */
> +      return 1;
>       }
> +  /* Optimize the opposite case, if it can be detected at compile time.  */
> +  if (STATIC_CONSTANT_P (xi.len == 1))
> +    /* If YI is negative it is lower than the least HWI.
> +       If YI is positive it is greater than the greatest HWI.  */
> +    return neg_p (yi) ? 1 : -1;
>     return cmps_large (xi.val, xi.len, precision, yi.val, yi.len);
>   }
>   
> @@ -1881,16 +1895,35 @@ wi::cmpu (const T1 &x, const T2 &y)
>     unsigned int precision = get_binary_precision (x, y);
>     WIDE_INT_REF_FOR (T1) xi (x, precision);
>     WIDE_INT_REF_FOR (T2) yi (y, precision);
> -  if (precision <= HOST_BITS_PER_WIDE_INT)
> +  /* Optimize comparisons with constants.  */
> +  if (STATIC_CONSTANT_P (yi.len == 1 && yi.val[0] >= 0))
>       {
> +      /* If XI doesn't fit in a HWI then it must be larger than YI.  */
> +      if (xi.len != 1)
> +	return 1;
> +      /* Otherwise compare directly.  */
>         unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
> -      unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
> -      if (xl < yl)
> +      unsigned HOST_WIDE_INT yl = yi.val[0];
> +      return xl < yl ? -1 : xl > yl;
> +    }
> +  if (STATIC_CONSTANT_P (xi.len == 1 && xi.val[0] >= 0))
> +    {
> +      /* If YI doesn't fit in a HWI then it must be larger than XI.  */
> +      if (yi.len != 1)
>   	return -1;
> -      else if (xl == yl)
> -	return 0;
> -      else
> -	return 1;
> +      /* Otherwise compare directly.  */
> +      unsigned HOST_WIDE_INT xl = xi.val[0];
> +      unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
> +      return xl < yl ? -1 : xl > yl;
> +    }
> +  /* Optimize the case of two HWIs.  The HWIs are implicitly sign-extended
> +     for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both
> +     values does not change the result.  */
> +  if (xi.len + yi.len == 2)
> +    {
> +      unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
> +      unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
> +      return xl < yl ? -1 : xl > yl;
>       }
>     return cmpu_large (xi.val, xi.len, precision, yi.val, yi.len);
>   }
>

Richard Sandiford Nov. 29, 2013, 10:17 a.m. UTC | #2

Kenneth Zadeck <zadeck@naturalbridge.com> writes:
> like the add/sub patch, enhance the comment so that it says that it is 
> designed to hit the widestint and offset int common cases.

These cases are designed for all types, not just offset_int and widest_int.
We're using length-based tests because that catches more cases for all types.

The add/sub case is different because I'm keeping the existing
precision-based fast path for wide_int and adding a new fast path
for offset_int and widest_int.

Thanks,
Richard

> kenny
> On 11/28/2013 12:34 PM, Richard Sandiford wrote:
>> As Richi asked, this patch makes cmps use the same shortcuts as lts_p.
>> It also makes cmpu use the shortcut that I justed added to ltu_p.
>>
>> On that same fold-const.ii testcase, this reduces the number of cmps_large
>> calls from 66924 to 916.  It reduces the number of cmpu_large calls from
>> 3462 to 4.
>>
>> Tested on x86_64-linux-gnu.  OK to install?
>>
>> Thanks,
>> Richard
>>
>>
>> Index: gcc/wide-int.h
>> ===================================================================
>> --- gcc/wide-int.h	2001-01-01 00:00:00.000000000 +0000
>> +++ gcc/wide-int.h	2013-11-28 16:08:22.527681077 +0000
>> @@ -1858,17 +1858,31 @@ wi::cmps (const T1 &x, const T2 &y)
>>     unsigned int precision = get_binary_precision (x, y);
>>     WIDE_INT_REF_FOR (T1) xi (x, precision);
>>     WIDE_INT_REF_FOR (T2) yi (y, precision);
>> -  if (precision <= HOST_BITS_PER_WIDE_INT)
>> +  if (wi::fits_shwi_p (yi))
>>       {
>> -      HOST_WIDE_INT xl = xi.to_shwi ();
>> -      HOST_WIDE_INT yl = yi.to_shwi ();
>> -      if (xl < yl)
>> +      /* Special case for comparisons with 0.  */
>> +      if (STATIC_CONSTANT_P (yi.val[0] == 0))
>> +	return neg_p (xi) ? -1 : !(xi.len == 1 && xi.val[0] == 0);
>> +      /* If x fits into a signed HWI, we can compare directly.  */
>> +      if (wi::fits_shwi_p (xi))
>> +	{
>> +	  HOST_WIDE_INT xl = xi.to_shwi ();
>> +	  HOST_WIDE_INT yl = yi.to_shwi ();
>> +	  return xl < yl ? -1 : xl > yl;
>> +	}
>> +      /* If x doesn't fit and is negative, then it must be more
>> +	 negative than any signed HWI, and hence smaller than y.  */
>> +      if (neg_p (xi))
>>   	return -1;
>> -      else if (xl > yl)
>> -	return 1;
>> -      else
>> -	return 0;
>> +      /* If x is positive, then it must be larger than any signed HWI,
>> +	 and hence greater than y.  */
>> +      return 1;
>>       }
>> +  /* Optimize the opposite case, if it can be detected at compile time.  */
>> +  if (STATIC_CONSTANT_P (xi.len == 1))
>> +    /* If YI is negative it is lower than the least HWI.
>> +       If YI is positive it is greater than the greatest HWI.  */
>> +    return neg_p (yi) ? 1 : -1;
>>     return cmps_large (xi.val, xi.len, precision, yi.val, yi.len);
>>   }
>>   
>> @@ -1881,16 +1895,35 @@ wi::cmpu (const T1 &x, const T2 &y)
>>     unsigned int precision = get_binary_precision (x, y);
>>     WIDE_INT_REF_FOR (T1) xi (x, precision);
>>     WIDE_INT_REF_FOR (T2) yi (y, precision);
>> -  if (precision <= HOST_BITS_PER_WIDE_INT)
>> +  /* Optimize comparisons with constants.  */
>> +  if (STATIC_CONSTANT_P (yi.len == 1 && yi.val[0] >= 0))
>>       {
>> +      /* If XI doesn't fit in a HWI then it must be larger than YI.  */
>> +      if (xi.len != 1)
>> +	return 1;
>> +      /* Otherwise compare directly.  */
>>         unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
>> -      unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
>> -      if (xl < yl)
>> +      unsigned HOST_WIDE_INT yl = yi.val[0];
>> +      return xl < yl ? -1 : xl > yl;
>> +    }
>> +  if (STATIC_CONSTANT_P (xi.len == 1 && xi.val[0] >= 0))
>> +    {
>> +      /* If YI doesn't fit in a HWI then it must be larger than XI.  */
>> +      if (yi.len != 1)
>>   	return -1;
>> -      else if (xl == yl)
>> -	return 0;
>> -      else
>> -	return 1;
>> +      /* Otherwise compare directly.  */
>> +      unsigned HOST_WIDE_INT xl = xi.val[0];
>> +      unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
>> +      return xl < yl ? -1 : xl > yl;
>> +    }
>> +  /* Optimize the case of two HWIs.  The HWIs are implicitly sign-extended
>> +     for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both
>> +     values does not change the result.  */
>> +  if (xi.len + yi.len == 2)
>> +    {
>> +      unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
>> +      unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
>> +      return xl < yl ? -1 : xl > yl;
>>       }
>>     return cmpu_large (xi.val, xi.len, precision, yi.val, yi.len);
>>   }
>>

Index: gcc/wide-int.h
===================================================================
--- gcc/wide-int.h	2001-01-01 00:00:00.000000000 +0000
+++ gcc/wide-int.h	2013-11-28 16:08:22.527681077 +0000
@@ -1858,17 +1858,31 @@  wi::cmps (const T1 &x, const T2 &y)
   unsigned int precision = get_binary_precision (x, y);
   WIDE_INT_REF_FOR (T1) xi (x, precision);
   WIDE_INT_REF_FOR (T2) yi (y, precision);
-  if (precision <= HOST_BITS_PER_WIDE_INT)
+  if (wi::fits_shwi_p (yi))
     {
-      HOST_WIDE_INT xl = xi.to_shwi ();
-      HOST_WIDE_INT yl = yi.to_shwi ();
-      if (xl < yl)
+      /* Special case for comparisons with 0.  */
+      if (STATIC_CONSTANT_P (yi.val[0] == 0))
+	return neg_p (xi) ? -1 : !(xi.len == 1 && xi.val[0] == 0);
+      /* If x fits into a signed HWI, we can compare directly.  */
+      if (wi::fits_shwi_p (xi))
+	{
+	  HOST_WIDE_INT xl = xi.to_shwi ();
+	  HOST_WIDE_INT yl = yi.to_shwi ();
+	  return xl < yl ? -1 : xl > yl;
+	}
+      /* If x doesn't fit and is negative, then it must be more
+	 negative than any signed HWI, and hence smaller than y.  */
+      if (neg_p (xi))
 	return -1;
-      else if (xl > yl)
-	return 1;
-      else
-	return 0;
+      /* If x is positive, then it must be larger than any signed HWI,
+	 and hence greater than y.  */
+      return 1;
     }
+  /* Optimize the opposite case, if it can be detected at compile time.  */
+  if (STATIC_CONSTANT_P (xi.len == 1))
+    /* If YI is negative it is lower than the least HWI.
+       If YI is positive it is greater than the greatest HWI.  */
+    return neg_p (yi) ? 1 : -1;
   return cmps_large (xi.val, xi.len, precision, yi.val, yi.len);
 }
 
@@ -1881,16 +1895,35 @@  wi::cmpu (const T1 &x, const T2 &y)
   unsigned int precision = get_binary_precision (x, y);
   WIDE_INT_REF_FOR (T1) xi (x, precision);
   WIDE_INT_REF_FOR (T2) yi (y, precision);
-  if (precision <= HOST_BITS_PER_WIDE_INT)
+  /* Optimize comparisons with constants.  */
+  if (STATIC_CONSTANT_P (yi.len == 1 && yi.val[0] >= 0))
     {
+      /* If XI doesn't fit in a HWI then it must be larger than YI.  */
+      if (xi.len != 1)
+	return 1;
+      /* Otherwise compare directly.  */
       unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
-      unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
-      if (xl < yl)
+      unsigned HOST_WIDE_INT yl = yi.val[0];
+      return xl < yl ? -1 : xl > yl;
+    }
+  if (STATIC_CONSTANT_P (xi.len == 1 && xi.val[0] >= 0))
+    {
+      /* If YI doesn't fit in a HWI then it must be larger than XI.  */
+      if (yi.len != 1)
 	return -1;
-      else if (xl == yl)
-	return 0;
-      else
-	return 1;
+      /* Otherwise compare directly.  */
+      unsigned HOST_WIDE_INT xl = xi.val[0];
+      unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
+      return xl < yl ? -1 : xl > yl;
+    }
+  /* Optimize the case of two HWIs.  The HWIs are implicitly sign-extended
+     for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both
+     values does not change the result.  */
+  if (xi.len + yi.len == 2)
+    {
+      unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
+      unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
+      return xl < yl ? -1 : xl > yl;
     }
   return cmpu_large (xi.val, xi.len, precision, yi.val, yi.len);
 }

[wide-int] Handle more cmps and cmpu cases inline

Commit Message

Comments

Patch