Patchwork IVOPT improvement patch

login
register
mail settings
Submitter H.J. Lu
Date July 29, 2010, 4 p.m.
Message ID <AANLkTi=a8ksz1=3dq6f0YYzW7gRqJsEuHbqLwMY8GseO@mail.gmail.com>
Download mbox | patch
Permalink /patch/60276/
State New
Headers show

Comments

H.J. Lu - July 29, 2010, 4 p.m.
On Thu, Jul 29, 2010 at 8:22 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jul 28, 2010 at 9:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>> The attached patch should fix the problem -- it reverts a small part
>> of the last patch that is needed for fixing sixtrack performance
>> regression caused by wrong iv-use costs because address offset range
>> is conservatively computed. I will revert the change first and
>> investigate better fix (Suggestions are welcome).
>>
>
> Since "gcc -m32" works on Linux/x86-64 and goes into an infinite loop,
> it sounds like a HOST_WIDE_INT issue.
>

This patch fixed the infinite loop.
Richard Guenther - July 29, 2010, 4:16 p.m.
On Thu, Jul 29, 2010 at 6:00 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Jul 29, 2010 at 8:22 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Jul 28, 2010 at 9:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> The attached patch should fix the problem -- it reverts a small part
>>> of the last patch that is needed for fixing sixtrack performance
>>> regression caused by wrong iv-use costs because address offset range
>>> is conservatively computed. I will revert the change first and
>>> investigate better fix (Suggestions are welcome).
>>>
>>
>> Since "gcc -m32" works on Linux/x86-64 and goes into an infinite loop,
>> it sounds like a HOST_WIDE_INT issue.
>>
>
> This patch fixed the infinite loop.

That doesn't make sense.  Please use double_ints instead.

Richard.

>
> --
> H.J.
> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
> index 519f66e..44f2eb2 100644
> --- a/gcc/tree-ssa-loop-ivopts.c
> +++ b/gcc/tree-ssa-loop-ivopts.c
> @@ -3207,7 +3207,7 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT
> ratio, enum machine_mode mode,
>
>  typedef struct
>  {
> -  HOST_WIDE_INT min_offset, max_offset;
> +  HOST_WIDEST_INT min_offset, max_offset;
>   unsigned costs[2][2][2][2];
>  } *address_cost_data;
>
> @@ -3240,9 +3240,9 @@ get_address_cost (bool symbol_present, bool var_present,
>   data = VEC_index (address_cost_data, address_cost_data_list, data_index);
>   if (!data)
>     {
> -      HOST_WIDE_INT i;
> -      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
> -      HOST_WIDE_INT rat, off;
> +      HOST_WIDEST_INT i;
> +      HOST_WIDEST_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
> +      HOST_WIDEST_INT rat, off;
>       int old_cse_not_expected, width;
>       unsigned sym_p, var_p, off_p, rat_p, add_c;
>       rtx seq, addr, base;
>
H.J. Lu - July 29, 2010, 4:51 p.m.
On Thu, Jul 29, 2010 at 9:16 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Jul 29, 2010 at 6:00 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Thu, Jul 29, 2010 at 8:22 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Wed, Jul 28, 2010 at 9:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> The attached patch should fix the problem -- it reverts a small part
>>>> of the last patch that is needed for fixing sixtrack performance
>>>> regression caused by wrong iv-use costs because address offset range
>>>> is conservatively computed. I will revert the change first and
>>>> investigate better fix (Suggestions are welcome).
>>>>
>>>
>>> Since "gcc -m32" works on Linux/x86-64 and goes into an infinite loop,
>>> it sounds like a HOST_WIDE_INT issue.
>>>
>>
>> This patch fixed the infinite loop.
>
> That doesn't make sense.  Please use double_ints instead.
>
> Richard.

I am not familiar wit this code. I will leave it to David.


H.J.
---
>>
>> --
>> H.J.
>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>> index 519f66e..44f2eb2 100644
>> --- a/gcc/tree-ssa-loop-ivopts.c
>> +++ b/gcc/tree-ssa-loop-ivopts.c
>> @@ -3207,7 +3207,7 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT
>> ratio, enum machine_mode mode,
>>
>>  typedef struct
>>  {
>> -  HOST_WIDE_INT min_offset, max_offset;
>> +  HOST_WIDEST_INT min_offset, max_offset;
>>   unsigned costs[2][2][2][2];
>>  } *address_cost_data;
>>
>> @@ -3240,9 +3240,9 @@ get_address_cost (bool symbol_present, bool var_present,
>>   data = VEC_index (address_cost_data, address_cost_data_list, data_index);
>>   if (!data)
>>     {
>> -      HOST_WIDE_INT i;
>> -      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>> -      HOST_WIDE_INT rat, off;
>> +      HOST_WIDEST_INT i;
>> +      HOST_WIDEST_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>> +      HOST_WIDEST_INT rat, off;
>>       int old_cse_not_expected, width;
>>       unsigned sym_p, var_p, off_p, rat_p, add_c;
>>       rtx seq, addr, base;
>>
>
Xinliang David Li - July 30, 2010, 12:55 a.m.
Please take a look at the following patch -- it is less conservative
than before and also does not lead to infinite loop (due to integer
overflow).

Testing is under going. Ok for trunk after that is done?

Thanks,

David

On Thu, Jul 29, 2010 at 9:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Jul 29, 2010 at 9:16 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Thu, Jul 29, 2010 at 6:00 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Thu, Jul 29, 2010 at 8:22 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Wed, Jul 28, 2010 at 9:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> The attached patch should fix the problem -- it reverts a small part
>>>>> of the last patch that is needed for fixing sixtrack performance
>>>>> regression caused by wrong iv-use costs because address offset range
>>>>> is conservatively computed. I will revert the change first and
>>>>> investigate better fix (Suggestions are welcome).
>>>>>
>>>>
>>>> Since "gcc -m32" works on Linux/x86-64 and goes into an infinite loop,
>>>> it sounds like a HOST_WIDE_INT issue.
>>>>
>>>
>>> This patch fixed the infinite loop.
>>
>> That doesn't make sense.  Please use double_ints instead.
>>
>> Richard.
>
> I am not familiar wit this code. I will leave it to David.
>
>
> H.J.
> ---
>>>
>>> --
>>> H.J.
>>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>>> index 519f66e..44f2eb2 100644
>>> --- a/gcc/tree-ssa-loop-ivopts.c
>>> +++ b/gcc/tree-ssa-loop-ivopts.c
>>> @@ -3207,7 +3207,7 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT
>>> ratio, enum machine_mode mode,
>>>
>>>  typedef struct
>>>  {
>>> -  HOST_WIDE_INT min_offset, max_offset;
>>> +  HOST_WIDEST_INT min_offset, max_offset;
>>>   unsigned costs[2][2][2][2];
>>>  } *address_cost_data;
>>>
>>> @@ -3240,9 +3240,9 @@ get_address_cost (bool symbol_present, bool var_present,
>>>   data = VEC_index (address_cost_data, address_cost_data_list, data_index);
>>>   if (!data)
>>>     {
>>> -      HOST_WIDE_INT i;
>>> -      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>>> -      HOST_WIDE_INT rat, off;
>>> +      HOST_WIDEST_INT i;
>>> +      HOST_WIDEST_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>>> +      HOST_WIDEST_INT rat, off;
>>>       int old_cse_not_expected, width;
>>>       unsigned sym_p, var_p, off_p, rat_p, add_c;
>>>       rtx seq, addr, base;
>>>
>>
>
>
>
> --
> H.J.
>
H.J. Lu - July 30, 2010, 1:04 a.m.
It looks strange:

+      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
+          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = 1; i < width; i++)
 	{
-	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
+          HOST_WIDE_INT offset = (1ll << i);
+	  XEXP (addr, 1) = gen_int_mode (offset, address_mode);
 	  if (!memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}

HOST_WIDE_INT may be long or long long. "1ll" isn't always correct.
I think width can be >= 31. Depending on HOST_WIDE_INT,

HOST_WIDE_INT offset = -(1ll << i);

may have different values. The whole function looks odd to me.


H.J.
----
On Thu, Jul 29, 2010 at 5:55 PM, Xinliang David Li <davidxl@google.com> wrote:
> Please take a look at the following patch -- it is less conservative
> than before and also does not lead to infinite loop (due to integer
> overflow).
>
> Testing is under going. Ok for trunk after that is done?
>
> Thanks,
>
> David
>
> On Thu, Jul 29, 2010 at 9:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Thu, Jul 29, 2010 at 9:16 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Thu, Jul 29, 2010 at 6:00 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Thu, Jul 29, 2010 at 8:22 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Wed, Jul 28, 2010 at 9:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>> The attached patch should fix the problem -- it reverts a small part
>>>>>> of the last patch that is needed for fixing sixtrack performance
>>>>>> regression caused by wrong iv-use costs because address offset range
>>>>>> is conservatively computed. I will revert the change first and
>>>>>> investigate better fix (Suggestions are welcome).
>>>>>>
>>>>>
>>>>> Since "gcc -m32" works on Linux/x86-64 and goes into an infinite loop,
>>>>> it sounds like a HOST_WIDE_INT issue.
>>>>>
>>>>
>>>> This patch fixed the infinite loop.
>>>
>>> That doesn't make sense.  Please use double_ints instead.
>>>
>>> Richard.
>>
>> I am not familiar wit this code. I will leave it to David.
>>
>>
>> H.J.
>> ---
>>>>
>>>> --
>>>> H.J.
>>>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>>>> index 519f66e..44f2eb2 100644
>>>> --- a/gcc/tree-ssa-loop-ivopts.c
>>>> +++ b/gcc/tree-ssa-loop-ivopts.c
>>>> @@ -3207,7 +3207,7 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT
>>>> ratio, enum machine_mode mode,
>>>>
>>>>  typedef struct
>>>>  {
>>>> -  HOST_WIDE_INT min_offset, max_offset;
>>>> +  HOST_WIDEST_INT min_offset, max_offset;
>>>>   unsigned costs[2][2][2][2];
>>>>  } *address_cost_data;
>>>>
>>>> @@ -3240,9 +3240,9 @@ get_address_cost (bool symbol_present, bool var_present,
>>>>   data = VEC_index (address_cost_data, address_cost_data_list, data_index);
>>>>   if (!data)
>>>>     {
>>>> -      HOST_WIDE_INT i;
>>>> -      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>>>> -      HOST_WIDE_INT rat, off;
>>>> +      HOST_WIDEST_INT i;
>>>> +      HOST_WIDEST_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>>>> +      HOST_WIDEST_INT rat, off;
>>>>       int old_cse_not_expected, width;
>>>>       unsigned sym_p, var_p, off_p, rat_p, add_c;
>>>>       rtx seq, addr, base;
>>>>
>>>
>>
>>
>>
>> --
>> H.J.
>>
>
Xinliang David Li - July 30, 2010, 4:18 a.m.
The width is set to a value so that 1ll<<i is guaranteed to not
overflow HOST_WIDE_INT type. THe suffix is needed so that the
intermediate value does not get truncated when HOST_WIDE_INT is wider
than 32bit. Is there a portable way to represent the integer literal
with HOST_WIDE_TYPE?

Thanks,

David

On Thu, Jul 29, 2010 at 6:04 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> It looks strange:
>
> +      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
> +          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
>       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
> -      for (i = start; i <= 1 << 20; i <<= 1)
> +      for (i = 1; i < width; i++)
>        {
> -         XEXP (addr, 1) = gen_int_mode (i, address_mode);
> +          HOST_WIDE_INT offset = (1ll << i);
> +         XEXP (addr, 1) = gen_int_mode (offset, address_mode);
>          if (!memory_address_addr_space_p (mem_mode, addr, as))
>            break;
>        }
>
> HOST_WIDE_INT may be long or long long. "1ll" isn't always correct.
> I think width can be >= 31. Depending on HOST_WIDE_INT,
>
> HOST_WIDE_INT offset = -(1ll << i);
>
> may have different values. The whole function looks odd to me.
>
>
> H.J.
> ----
> On Thu, Jul 29, 2010 at 5:55 PM, Xinliang David Li <davidxl@google.com> wrote:
>> Please take a look at the following patch -- it is less conservative
>> than before and also does not lead to infinite loop (due to integer
>> overflow).
>>
>> Testing is under going. Ok for trunk after that is done?
>>
>> Thanks,
>>
>> David
>>
>> On Thu, Jul 29, 2010 at 9:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Thu, Jul 29, 2010 at 9:16 AM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Thu, Jul 29, 2010 at 6:00 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Thu, Jul 29, 2010 at 8:22 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>> On Wed, Jul 28, 2010 at 9:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>>> The attached patch should fix the problem -- it reverts a small part
>>>>>>> of the last patch that is needed for fixing sixtrack performance
>>>>>>> regression caused by wrong iv-use costs because address offset range
>>>>>>> is conservatively computed. I will revert the change first and
>>>>>>> investigate better fix (Suggestions are welcome).
>>>>>>>
>>>>>>
>>>>>> Since "gcc -m32" works on Linux/x86-64 and goes into an infinite loop,
>>>>>> it sounds like a HOST_WIDE_INT issue.
>>>>>>
>>>>>
>>>>> This patch fixed the infinite loop.
>>>>
>>>> That doesn't make sense.  Please use double_ints instead.
>>>>
>>>> Richard.
>>>
>>> I am not familiar wit this code. I will leave it to David.
>>>
>>>
>>> H.J.
>>> ---
>>>>>
>>>>> --
>>>>> H.J.
>>>>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>>>>> index 519f66e..44f2eb2 100644
>>>>> --- a/gcc/tree-ssa-loop-ivopts.c
>>>>> +++ b/gcc/tree-ssa-loop-ivopts.c
>>>>> @@ -3207,7 +3207,7 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT
>>>>> ratio, enum machine_mode mode,
>>>>>
>>>>>  typedef struct
>>>>>  {
>>>>> -  HOST_WIDE_INT min_offset, max_offset;
>>>>> +  HOST_WIDEST_INT min_offset, max_offset;
>>>>>   unsigned costs[2][2][2][2];
>>>>>  } *address_cost_data;
>>>>>
>>>>> @@ -3240,9 +3240,9 @@ get_address_cost (bool symbol_present, bool var_present,
>>>>>   data = VEC_index (address_cost_data, address_cost_data_list, data_index);
>>>>>   if (!data)
>>>>>     {
>>>>> -      HOST_WIDE_INT i;
>>>>> -      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>>>>> -      HOST_WIDE_INT rat, off;
>>>>> +      HOST_WIDEST_INT i;
>>>>> +      HOST_WIDEST_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>>>>> +      HOST_WIDEST_INT rat, off;
>>>>>       int old_cse_not_expected, width;
>>>>>       unsigned sym_p, var_p, off_p, rat_p, add_c;
>>>>>       rtx seq, addr, base;
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> H.J.
>>>
>>
>
>
>
> --
> H.J.
>
Jakub Jelinek - July 30, 2010, 7:10 a.m.
On Thu, Jul 29, 2010 at 09:18:07PM -0700, Xinliang David Li wrote:
> The width is set to a value so that 1ll<<i is guaranteed to not
> overflow HOST_WIDE_INT type. THe suffix is needed so that the
> intermediate value does not get truncated when HOST_WIDE_INT is wider
> than 32bit. Is there a portable way to represent the integer literal
> with HOST_WIDE_TYPE?

Sure:

(HOST_WIDE_INT) 1 << i

	Jakub
Xinliang David Li - July 30, 2010, 4:37 p.m.
On Fri, Jul 30, 2010 at 12:10 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Jul 29, 2010 at 09:18:07PM -0700, Xinliang David Li wrote:
>> The width is set to a value so that 1ll<<i is guaranteed to not
>> overflow HOST_WIDE_INT type. THe suffix is needed so that the
>> intermediate value does not get truncated when HOST_WIDE_INT is wider
>> than 32bit. Is there a portable way to represent the integer literal
>> with HOST_WIDE_TYPE?
>
> Sure:
>
> (HOST_WIDE_INT) 1 << i
>

Yes, of course ;)

David
>        Jakub
>

Patch

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 519f66e..44f2eb2 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -3207,7 +3207,7 @@  multiplier_allowed_in_address_p (HOST_WIDE_INT
ratio, enum machine_mode mode,

 typedef struct
 {
-  HOST_WIDE_INT min_offset, max_offset;
+  HOST_WIDEST_INT min_offset, max_offset;
   unsigned costs[2][2][2][2];
 } *address_cost_data;