Patchwork [wide-int] doloop fixes

login
register
mail settings
Submitter Richard Sandiford
Date Nov. 2, 2013, 11:06 a.m.
Message ID <87y557jc6e.fsf@talisman.default>
Download mbox | patch
Permalink /patch/287971/
State New
Headers show

Comments

Richard Sandiford - Nov. 2, 2013, 11:06 a.m.
The first part of this is a simple type mismatch -- get_max_loop_iterations
returns a widest_int aka max_wide_int -- and I'd have installed it as
obvious.  The second part isn't as obvious though.  The old code stored
the maximum iterations as:

  if (!get_max_loop_iterations (loop, &iter)
      || !iter.fits_shwi ())
    iterations_max = const0_rtx;
  else
    iterations_max = GEN_INT (iter.to_shwi ());

and the new code uses:

  if (!get_max_loop_iterations (loop, &iter)
      || !wi::fits_shwi_p (iter))
    iterations_max = const0_rtx;
  else
    iterations_max = immed_wide_int_const (iter, mode);

which includes an extra canonicalisation.  I agree it would be good to do
that in principle, but I'm not sure it copes correctly with the case where
the loop iterates 1 << GET_MODE_PRECISION (mode) times.  Plus I think the
real fix would be to avoid the host dependence altogether, i.e. get rid
of the fits_shwi_p too.

As it stands, this breaks bfin-elf's pattern, which has:

  /* Due to limitations in the hardware (an initial loop count of 0
     does not loop 2^32 times) we must avoid to generate a hardware
     loops when we cannot rule out this case.  */
  if (!flag_unsafe_loop_optimizations
      && (unsigned HOST_WIDE_INT) INTVAL (operands[2]) >= 0xFFFFFFFF)
    FAIL;

With the sign-extending conversion, this now triggers more often than
it was supposed to.

Since the old "GEN_INT (iter.to_shwi ());" works verbatim in wide-int too,
and since we still use that form in the doloop_begin code, I think we should
just back the immed_wide_int_const out.

Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  It fixes some
unwanted testsuite changes in bfin-elf.  OK to install?

Thanks,
Richard
Kenneth Zadeck - Nov. 2, 2013, 2 p.m.
On 11/02/2013 07:06 AM, Richard Sandiford wrote:
> The first part of this is a simple type mismatch -- get_max_loop_iterations
> returns a widest_int aka max_wide_int -- and I'd have installed it as
> obvious.  The second part isn't as obvious though.  The old code stored
> the maximum iterations as:
>
>    if (!get_max_loop_iterations (loop, &iter)
>        || !iter.fits_shwi ())
>      iterations_max = const0_rtx;
>    else
>      iterations_max = GEN_INT (iter.to_shwi ());
>
> and the new code uses:
>
>    if (!get_max_loop_iterations (loop, &iter)
>        || !wi::fits_shwi_p (iter))
>      iterations_max = const0_rtx;
>    else
>      iterations_max = immed_wide_int_const (iter, mode);
>
> which includes an extra canonicalisation.  I agree it would be good to do
> that in principle, but I'm not sure it copes correctly with the case where
> the loop iterates 1 << GET_MODE_PRECISION (mode) times.  Plus I think the
> real fix would be to avoid the host dependence altogether, i.e. get rid
> of the fits_shwi_p too.
>
> As it stands, this breaks bfin-elf's pattern, which has:
>
>    /* Due to limitations in the hardware (an initial loop count of 0
>       does not loop 2^32 times) we must avoid to generate a hardware
>       loops when we cannot rule out this case.  */
>    if (!flag_unsafe_loop_optimizations
>        && (unsigned HOST_WIDE_INT) INTVAL (operands[2]) >= 0xFFFFFFFF)
>      FAIL;
>
> With the sign-extending conversion, this now triggers more often than
> it was supposed to.
>
> Since the old "GEN_INT (iter.to_shwi ());" works verbatim in wide-int too,
> and since we still use that form in the doloop_begin code, I think we should
> just back the immed_wide_int_const out.
>
> Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  It fixes some
> unwanted testsuite changes in bfin-elf.  OK to install?
I dislike this a lot.     I think that this is a badly written pattern 
in the bfin port if it depends on the host size.   As machines get 
bigger, (they may not be getting faster but they are still getting 
bigger) having traps like this at the portable level will bite us.

in truth this code should really be iterations_max = 
immed_wide_int_const (iter, mode) with no tests at all.


> Thanks,
> Richard
>
>
> Index: gcc/loop-doloop.c
> ===================================================================
> --- gcc/loop-doloop.c	2013-11-02 10:49:37.463178153 +0000
> +++ gcc/loop-doloop.c	2013-11-02 10:49:55.927314661 +0000
> @@ -549,7 +549,7 @@ doloop_modify (struct loop *loop, struct
>     {
>       rtx init;
>       unsigned level = get_loop_level (loop) + 1;
> -    wide_int iter;
> +    widest_int iter;
>       rtx iter_rtx;
>   
>       if (!get_max_loop_iterations (loop, &iter)
> @@ -673,7 +673,7 @@ doloop_optimize (struct loop *loop)
>         || !wi::fits_shwi_p (iter))
>       iterations_max = const0_rtx;
>     else
> -    iterations_max = immed_wide_int_const (iter, mode);
> +    iterations_max = GEN_INT (iter.to_shwi ());
>     level = get_loop_level (loop) + 1;
>   
>     /* Generate looping insn.  If the pattern FAILs then give up trying
Richard Sandiford - Nov. 2, 2013, 2:47 p.m.
Kenneth Zadeck <zadeck@naturalbridge.com> writes:
> On 11/02/2013 07:06 AM, Richard Sandiford wrote:
>> The first part of this is a simple type mismatch -- get_max_loop_iterations
>> returns a widest_int aka max_wide_int -- and I'd have installed it as
>> obvious.  The second part isn't as obvious though.  The old code stored
>> the maximum iterations as:
>>
>>    if (!get_max_loop_iterations (loop, &iter)
>>        || !iter.fits_shwi ())
>>      iterations_max = const0_rtx;
>>    else
>>      iterations_max = GEN_INT (iter.to_shwi ());
>>
>> and the new code uses:
>>
>>    if (!get_max_loop_iterations (loop, &iter)
>>        || !wi::fits_shwi_p (iter))
>>      iterations_max = const0_rtx;
>>    else
>>      iterations_max = immed_wide_int_const (iter, mode);
>>
>> which includes an extra canonicalisation.  I agree it would be good to do
>> that in principle, but I'm not sure it copes correctly with the case where
>> the loop iterates 1 << GET_MODE_PRECISION (mode) times.  Plus I think the
>> real fix would be to avoid the host dependence altogether, i.e. get rid
>> of the fits_shwi_p too.
>>
>> As it stands, this breaks bfin-elf's pattern, which has:
>>
>>    /* Due to limitations in the hardware (an initial loop count of 0
>>       does not loop 2^32 times) we must avoid to generate a hardware
>>       loops when we cannot rule out this case.  */
>>    if (!flag_unsafe_loop_optimizations
>>        && (unsigned HOST_WIDE_INT) INTVAL (operands[2]) >= 0xFFFFFFFF)
>>      FAIL;
>>
>> With the sign-extending conversion, this now triggers more often than
>> it was supposed to.
>>
>> Since the old "GEN_INT (iter.to_shwi ());" works verbatim in wide-int too,
>> and since we still use that form in the doloop_begin code, I think we should
>> just back the immed_wide_int_const out.
>>
>> Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  It fixes some
>> unwanted testsuite changes in bfin-elf.  OK to install?
> I dislike this a lot.     I think that this is a badly written pattern 
> in the bfin port if it depends on the host size.   As machines get 
> bigger, (they may not be getting faster but they are still getting 
> bigger) having traps like this at the portable level will bite us.

But this isn't at the portable level, it's backend-specific code.
It knows that it's dealing with a 32-bit counter.

> in truth this code should really be iterations_max = 
> immed_wide_int_const (iter, mode) with no tests at all.

But like I say, that won't cope with loops that iterate
1 << GET_MODE_PRECISION (mode) times, even if that fits
in a signed HWI.  (That case could happen if the hardware
supports it and if the counter starts out as 0.)

The mainline code is self-consistent in that the number of
iterations is passed as a positive signed HWI, using 0 if the
value isn't known or is greater than HOST_WIDE_INT_MAX.
In the current interface, the CONST_INT doesn't really have a mode.
It's just used as a convenient way of passing a HWI to an expander.

I'm not saying that it's the ideal interface.  But if we want to change it,
let's do that separately, and test it on targets that have doloops.
Until then, the wide-int branch can provide the current interface
just as easily as mainline.

Thanks,
Richard
Kenneth Zadeck - Nov. 2, 2013, 3:03 p.m.
On 11/02/2013 10:47 AM, Richard Sandiford wrote:
> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>> On 11/02/2013 07:06 AM, Richard Sandiford wrote:
>>> The first part of this is a simple type mismatch -- get_max_loop_iterations
>>> returns a widest_int aka max_wide_int -- and I'd have installed it as
>>> obvious.  The second part isn't as obvious though.  The old code stored
>>> the maximum iterations as:
>>>
>>>     if (!get_max_loop_iterations (loop, &iter)
>>>         || !iter.fits_shwi ())
>>>       iterations_max = const0_rtx;
>>>     else
>>>       iterations_max = GEN_INT (iter.to_shwi ());
>>>
>>> and the new code uses:
>>>
>>>     if (!get_max_loop_iterations (loop, &iter)
>>>         || !wi::fits_shwi_p (iter))
>>>       iterations_max = const0_rtx;
>>>     else
>>>       iterations_max = immed_wide_int_const (iter, mode);
>>>
>>> which includes an extra canonicalisation.  I agree it would be good to do
>>> that in principle, but I'm not sure it copes correctly with the case where
>>> the loop iterates 1 << GET_MODE_PRECISION (mode) times.  Plus I think the
>>> real fix would be to avoid the host dependence altogether, i.e. get rid
>>> of the fits_shwi_p too.
>>>
>>> As it stands, this breaks bfin-elf's pattern, which has:
>>>
>>>     /* Due to limitations in the hardware (an initial loop count of 0
>>>        does not loop 2^32 times) we must avoid to generate a hardware
>>>        loops when we cannot rule out this case.  */
>>>     if (!flag_unsafe_loop_optimizations
>>>         && (unsigned HOST_WIDE_INT) INTVAL (operands[2]) >= 0xFFFFFFFF)
>>>       FAIL;
>>>
>>> With the sign-extending conversion, this now triggers more often than
>>> it was supposed to.
>>>
>>> Since the old "GEN_INT (iter.to_shwi ());" works verbatim in wide-int too,
>>> and since we still use that form in the doloop_begin code, I think we should
>>> just back the immed_wide_int_const out.
>>>
>>> Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  It fixes some
>>> unwanted testsuite changes in bfin-elf.  OK to install?
>> I dislike this a lot.     I think that this is a badly written pattern
>> in the bfin port if it depends on the host size.   As machines get
>> bigger, (they may not be getting faster but they are still getting
>> bigger) having traps like this at the portable level will bite us.
> But this isn't at the portable level, it's backend-specific code.
> It knows that it's dealing with a 32-bit counter.
yes, but the patch is dumbing down the portable part of the compiler.
>
>> in truth this code should really be iterations_max =
>> immed_wide_int_const (iter, mode) with no tests at all.
> But like I say, that won't cope with loops that iterate
> 1 << GET_MODE_PRECISION (mode) times, even if that fits
> in a signed HWI.  (That case could happen if the hardware
> supports it and if the counter starts out as 0.)
>
> The mainline code is self-consistent in that the number of
> iterations is passed as a positive signed HWI, using 0 if the
> value isn't known or is greater than HOST_WIDE_INT_MAX.
> In the current interface, the CONST_INT doesn't really have a mode.
> It's just used as a convenient way of passing a HWI to an expander.
>
> I'm not saying that it's the ideal interface.  But if we want to change it,
> let's do that separately, and test it on targets that have doloops.
> Until then, the wide-int branch can provide the current interface
> just as easily as mainline.
we do not have a predicate at the rtl level like we do at the tree level 
where we can ask if the value can be represented in a mode without 
loosing any information.   But the proper thing to do here is to check 
to see if iter fits in the mode, if it does, then generate the 
immed_wide_int_const and if not generate the 0.   In this way it does 
the right thing for loops that iterate 1<<GET_MODE_PRECISION (mode), 
i.e. it will produce a 0.   And that will be correct even if the mode is 
SI or TI.

>
> Thanks,
> Richard
Richard Sandiford - Nov. 2, 2013, 3:31 p.m.
Kenneth Zadeck <zadeck@naturalbridge.com> writes:
> On 11/02/2013 10:47 AM, Richard Sandiford wrote:
>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>> On 11/02/2013 07:06 AM, Richard Sandiford wrote:
>>>> The first part of this is a simple type mismatch -- get_max_loop_iterations
>>>> returns a widest_int aka max_wide_int -- and I'd have installed it as
>>>> obvious.  The second part isn't as obvious though.  The old code stored
>>>> the maximum iterations as:
>>>>
>>>>     if (!get_max_loop_iterations (loop, &iter)
>>>>         || !iter.fits_shwi ())
>>>>       iterations_max = const0_rtx;
>>>>     else
>>>>       iterations_max = GEN_INT (iter.to_shwi ());
>>>>
>>>> and the new code uses:
>>>>
>>>>     if (!get_max_loop_iterations (loop, &iter)
>>>>         || !wi::fits_shwi_p (iter))
>>>>       iterations_max = const0_rtx;
>>>>     else
>>>>       iterations_max = immed_wide_int_const (iter, mode);
>>>>
>>>> which includes an extra canonicalisation.  I agree it would be good to do
>>>> that in principle, but I'm not sure it copes correctly with the case where
>>>> the loop iterates 1 << GET_MODE_PRECISION (mode) times.  Plus I think the
>>>> real fix would be to avoid the host dependence altogether, i.e. get rid
>>>> of the fits_shwi_p too.
>>>>
>>>> As it stands, this breaks bfin-elf's pattern, which has:
>>>>
>>>>     /* Due to limitations in the hardware (an initial loop count of 0
>>>>        does not loop 2^32 times) we must avoid to generate a hardware
>>>>        loops when we cannot rule out this case.  */
>>>>     if (!flag_unsafe_loop_optimizations
>>>>         && (unsigned HOST_WIDE_INT) INTVAL (operands[2]) >= 0xFFFFFFFF)
>>>>       FAIL;
>>>>
>>>> With the sign-extending conversion, this now triggers more often than
>>>> it was supposed to.
>>>>
>>>> Since the old "GEN_INT (iter.to_shwi ());" works verbatim in wide-int too,
>>>> and since we still use that form in the doloop_begin code, I think we should
>>>> just back the immed_wide_int_const out.
>>>>
>>>> Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  It fixes some
>>>> unwanted testsuite changes in bfin-elf.  OK to install?
>>> I dislike this a lot.     I think that this is a badly written pattern
>>> in the bfin port if it depends on the host size.   As machines get
>>> bigger, (they may not be getting faster but they are still getting
>>> bigger) having traps like this at the portable level will bite us.
>> But this isn't at the portable level, it's backend-specific code.
>> It knows that it's dealing with a 32-bit counter.
> yes, but the patch is dumbing down the portable part of the compiler.
>>
>>> in truth this code should really be iterations_max =
>>> immed_wide_int_const (iter, mode) with no tests at all.
>> But like I say, that won't cope with loops that iterate
>> 1 << GET_MODE_PRECISION (mode) times, even if that fits
>> in a signed HWI.  (That case could happen if the hardware
>> supports it and if the counter starts out as 0.)
>>
>> The mainline code is self-consistent in that the number of
>> iterations is passed as a positive signed HWI, using 0 if the
>> value isn't known or is greater than HOST_WIDE_INT_MAX.
>> In the current interface, the CONST_INT doesn't really have a mode.
>> It's just used as a convenient way of passing a HWI to an expander.
>>
>> I'm not saying that it's the ideal interface.  But if we want to change it,
>> let's do that separately, and test it on targets that have doloops.
>> Until then, the wide-int branch can provide the current interface
>> just as easily as mainline.
> we do not have a predicate at the rtl level like we do at the tree level 
> where we can ask if the value can be represented in a mode without 
> loosing any information.   But the proper thing to do here is to check 
> to see if iter fits in the mode, if it does, then generate the 
> immed_wide_int_const and if not generate the 0.   In this way it does 
> the right thing for loops that iterate 1<<GET_MODE_PRECISION (mode), 
> i.e. it will produce a 0.   And that will be correct even if the mode is 
> SI or TI.

Well, like I say, I don't see the current rtx has having a mode,
even conceptually.  The CONST_INT is just a convenient way of passing
an iteration count to a function that only accepts rtxes as arguments.
If we could, we'd be passing the widest_int directly as a pointer or
reference.

But since I'm just repeating what I said above, I should probably bow
out of this one.

Note that the current wide-int code doesn't allow any more cases than
the current mainline code, it just changes their values (and so breaks
the ports that relied on the old values).  So I think we both agree that
the current wide-int code needs to change in some way.  And I'd prefer
to change it to follow the existing interface, rather than (a) combining
a subtle interface change with a big patch and (b) potentially putting
the merge back unnecessarily.

If you and Mike want to change the interface on wide-int first,
and if the maintainers agree that's OK, then I won't object.
But you'll need to test it on ports that have doloops.

Thanks,
Richard
Kenneth Zadeck - Nov. 2, 2013, 5:39 p.m.
On 11/02/2013 11:31 AM, Richard Sandiford wrote:
> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>> On 11/02/2013 10:47 AM, Richard Sandiford wrote:
>>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>>> On 11/02/2013 07:06 AM, Richard Sandiford wrote:
>>>>> The first part of this is a simple type mismatch -- get_max_loop_iterations
>>>>> returns a widest_int aka max_wide_int -- and I'd have installed it as
>>>>> obvious.  The second part isn't as obvious though.  The old code stored
>>>>> the maximum iterations as:
>>>>>
>>>>>      if (!get_max_loop_iterations (loop, &iter)
>>>>>          || !iter.fits_shwi ())
>>>>>        iterations_max = const0_rtx;
>>>>>      else
>>>>>        iterations_max = GEN_INT (iter.to_shwi ());
>>>>>
>>>>> and the new code uses:
>>>>>
>>>>>      if (!get_max_loop_iterations (loop, &iter)
>>>>>          || !wi::fits_shwi_p (iter))
>>>>>        iterations_max = const0_rtx;
>>>>>      else
>>>>>        iterations_max = immed_wide_int_const (iter, mode);
>>>>>
>>>>> which includes an extra canonicalisation.  I agree it would be good to do
>>>>> that in principle, but I'm not sure it copes correctly with the case where
>>>>> the loop iterates 1 << GET_MODE_PRECISION (mode) times.  Plus I think the
>>>>> real fix would be to avoid the host dependence altogether, i.e. get rid
>>>>> of the fits_shwi_p too.
>>>>>
>>>>> As it stands, this breaks bfin-elf's pattern, which has:
>>>>>
>>>>>      /* Due to limitations in the hardware (an initial loop count of 0
>>>>>         does not loop 2^32 times) we must avoid to generate a hardware
>>>>>         loops when we cannot rule out this case.  */
>>>>>      if (!flag_unsafe_loop_optimizations
>>>>>          && (unsigned HOST_WIDE_INT) INTVAL (operands[2]) >= 0xFFFFFFFF)
>>>>>        FAIL;
>>>>>
>>>>> With the sign-extending conversion, this now triggers more often than
>>>>> it was supposed to.
>>>>>
>>>>> Since the old "GEN_INT (iter.to_shwi ());" works verbatim in wide-int too,
>>>>> and since we still use that form in the doloop_begin code, I think we should
>>>>> just back the immed_wide_int_const out.
>>>>>
>>>>> Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  It fixes some
>>>>> unwanted testsuite changes in bfin-elf.  OK to install?
>>>> I dislike this a lot.     I think that this is a badly written pattern
>>>> in the bfin port if it depends on the host size.   As machines get
>>>> bigger, (they may not be getting faster but they are still getting
>>>> bigger) having traps like this at the portable level will bite us.
>>> But this isn't at the portable level, it's backend-specific code.
>>> It knows that it's dealing with a 32-bit counter.
>> yes, but the patch is dumbing down the portable part of the compiler.
>>>> in truth this code should really be iterations_max =
>>>> immed_wide_int_const (iter, mode) with no tests at all.
>>> But like I say, that won't cope with loops that iterate
>>> 1 << GET_MODE_PRECISION (mode) times, even if that fits
>>> in a signed HWI.  (That case could happen if the hardware
>>> supports it and if the counter starts out as 0.)
>>>
>>> The mainline code is self-consistent in that the number of
>>> iterations is passed as a positive signed HWI, using 0 if the
>>> value isn't known or is greater than HOST_WIDE_INT_MAX.
>>> In the current interface, the CONST_INT doesn't really have a mode.
>>> It's just used as a convenient way of passing a HWI to an expander.
>>>
>>> I'm not saying that it's the ideal interface.  But if we want to change it,
>>> let's do that separately, and test it on targets that have doloops.
>>> Until then, the wide-int branch can provide the current interface
>>> just as easily as mainline.
>> we do not have a predicate at the rtl level like we do at the tree level
>> where we can ask if the value can be represented in a mode without
>> loosing any information.   But the proper thing to do here is to check
>> to see if iter fits in the mode, if it does, then generate the
>> immed_wide_int_const and if not generate the 0.   In this way it does
>> the right thing for loops that iterate 1<<GET_MODE_PRECISION (mode),
>> i.e. it will produce a 0.   And that will be correct even if the mode is
>> SI or TI.
> Well, like I say, I don't see the current rtx has having a mode,
> even conceptually.  The CONST_INT is just a convenient way of passing
> an iteration count to a function that only accepts rtxes as arguments.
> If we could, we'd be passing the widest_int directly as a pointer or
> reference.
>
> But since I'm just repeating what I said above, I should probably bow
> out of this one.
you point seems to be that this is a place where the "assumption" that 
rtxes are signed bites us.
if mode happens to be SImode but the variable was unsigned and large, 
then we will sign extend this and get a pessimistic answer for this 
pattern.    on the other hand, if we do what you suggest, then we are 
forever limited by the host precision.

I see your point, but i do not know what to do.

kenny
> Note that the current wide-int code doesn't allow any more cases than
> the current mainline code, it just changes their values (and so breaks
> the ports that relied on the old values).  So I think we both agree that
> the current wide-int code needs to change in some way.  And I'd prefer
> to change it to follow the existing interface, rather than (a) combining
> a subtle interface change with a big patch and (b) potentially putting
> the merge back unnecessarily.
>
> If you and Mike want to change the interface on wide-int first,
> and if the maintainers agree that's OK, then I won't object.
> But you'll need to test it on ports that have doloops.
>
> Thanks,
> Richard
>
>
>
Richard Sandiford - Nov. 2, 2013, 5:51 p.m.
Kenneth Zadeck <zadeck@naturalbridge.com> writes:
> On 11/02/2013 11:31 AM, Richard Sandiford wrote:
>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>> On 11/02/2013 10:47 AM, Richard Sandiford wrote:
>>>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>>>> On 11/02/2013 07:06 AM, Richard Sandiford wrote:
>>>>>> The first part of this is a simple type mismatch --
>>>>>> get_max_loop_iterations
>>>>>> returns a widest_int aka max_wide_int -- and I'd have installed it as
>>>>>> obvious.  The second part isn't as obvious though.  The old code stored
>>>>>> the maximum iterations as:
>>>>>>
>>>>>>      if (!get_max_loop_iterations (loop, &iter)
>>>>>>          || !iter.fits_shwi ())
>>>>>>        iterations_max = const0_rtx;
>>>>>>      else
>>>>>>        iterations_max = GEN_INT (iter.to_shwi ());
>>>>>>
>>>>>> and the new code uses:
>>>>>>
>>>>>>      if (!get_max_loop_iterations (loop, &iter)
>>>>>>          || !wi::fits_shwi_p (iter))
>>>>>>        iterations_max = const0_rtx;
>>>>>>      else
>>>>>>        iterations_max = immed_wide_int_const (iter, mode);
>>>>>>
>>>>>> which includes an extra canonicalisation.  I agree it would be good to do
>>>>>> that in principle, but I'm not sure it copes correctly with the case where
>>>>>> the loop iterates 1 << GET_MODE_PRECISION (mode) times.  Plus I think the
>>>>>> real fix would be to avoid the host dependence altogether, i.e. get rid
>>>>>> of the fits_shwi_p too.
>>>>>>
>>>>>> As it stands, this breaks bfin-elf's pattern, which has:
>>>>>>
>>>>>>      /* Due to limitations in the hardware (an initial loop count of 0
>>>>>>         does not loop 2^32 times) we must avoid to generate a hardware
>>>>>>         loops when we cannot rule out this case.  */
>>>>>>      if (!flag_unsafe_loop_optimizations
>>>>>>          && (unsigned HOST_WIDE_INT) INTVAL (operands[2]) >= 0xFFFFFFFF)
>>>>>>        FAIL;
>>>>>>
>>>>>> With the sign-extending conversion, this now triggers more often than
>>>>>> it was supposed to.
>>>>>>
>>>>>> Since the old "GEN_INT (iter.to_shwi ());" works verbatim in wide-int too,
>>>>>> and since we still use that form in the doloop_begin code, I think we should
>>>>>> just back the immed_wide_int_const out.
>>>>>>
>>>>>> Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  It fixes some
>>>>>> unwanted testsuite changes in bfin-elf.  OK to install?
>>>>> I dislike this a lot.     I think that this is a badly written pattern
>>>>> in the bfin port if it depends on the host size.   As machines get
>>>>> bigger, (they may not be getting faster but they are still getting
>>>>> bigger) having traps like this at the portable level will bite us.
>>>> But this isn't at the portable level, it's backend-specific code.
>>>> It knows that it's dealing with a 32-bit counter.
>>> yes, but the patch is dumbing down the portable part of the compiler.
>>>>> in truth this code should really be iterations_max =
>>>>> immed_wide_int_const (iter, mode) with no tests at all.
>>>> But like I say, that won't cope with loops that iterate
>>>> 1 << GET_MODE_PRECISION (mode) times, even if that fits
>>>> in a signed HWI.  (That case could happen if the hardware
>>>> supports it and if the counter starts out as 0.)
>>>>
>>>> The mainline code is self-consistent in that the number of
>>>> iterations is passed as a positive signed HWI, using 0 if the
>>>> value isn't known or is greater than HOST_WIDE_INT_MAX.
>>>> In the current interface, the CONST_INT doesn't really have a mode.
>>>> It's just used as a convenient way of passing a HWI to an expander.
>>>>
>>>> I'm not saying that it's the ideal interface.  But if we want to change it,
>>>> let's do that separately, and test it on targets that have doloops.
>>>> Until then, the wide-int branch can provide the current interface
>>>> just as easily as mainline.
>>> we do not have a predicate at the rtl level like we do at the tree level
>>> where we can ask if the value can be represented in a mode without
>>> loosing any information.   But the proper thing to do here is to check
>>> to see if iter fits in the mode, if it does, then generate the
>>> immed_wide_int_const and if not generate the 0.   In this way it does
>>> the right thing for loops that iterate 1<<GET_MODE_PRECISION (mode),
>>> i.e. it will produce a 0.   And that will be correct even if the mode is
>>> SI or TI.
>> Well, like I say, I don't see the current rtx has having a mode,
>> even conceptually.  The CONST_INT is just a convenient way of passing
>> an iteration count to a function that only accepts rtxes as arguments.
>> If we could, we'd be passing the widest_int directly as a pointer or
>> reference.
>>
>> But since I'm just repeating what I said above, I should probably bow
>> out of this one.
> you point seems to be that this is a place where the "assumption" that 
> rtxes are signed bites us.
> if mode happens to be SImode but the variable was unsigned and large, 
> then we will sign extend this and get a pessimistic answer for this 
> pattern.

Right.

> on the other hand, if we do what you suggest, then we are 
> forever limited by the host precision.
>
> I see your point, but i do not know what to do.

Note that what I'm suggesting is simply restoring the mainline "else" code,
rather than something new.  wide-int doesn't change which of the if-else
arms are chosen, and if we keep the else arm as-is, existing code will
be happy.

"Forever" is a bit pessimistic (though probably true).  We're only
limited until somehow who cares about wide counters comes along and
changes things.  Or if you feel particularly motivated, you could do
it after the merge.  It just doesn't seem like something that needs to
be tackled here and now.

I mean, the current code is limited to CONST_INT -- and the doloop_end
pattern is specifically documented as taking a CONST_INT -- even though
we already have CONST_DOUBLE for wider types.  So the current set of rtxes
would already allow the pattern to pass wider iteration counts if necessary,
it's just that no port has needed them yet.

Thanks,
Richard
Kenneth Zadeck - Nov. 2, 2013, 6:06 p.m.
On 11/02/2013 01:51 PM, Richard Sandiford wrote:
> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>> On 11/02/2013 11:31 AM, Richard Sandiford wrote:
>>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>>> On 11/02/2013 10:47 AM, Richard Sandiford wrote:
>>>>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>>>>> On 11/02/2013 07:06 AM, Richard Sandiford wrote:
>>>>>>> The first part of this is a simple type mismatch --
>>>>>>> get_max_loop_iterations
>>>>>>> returns a widest_int aka max_wide_int -- and I'd have installed it as
>>>>>>> obvious.  The second part isn't as obvious though.  The old code stored
>>>>>>> the maximum iterations as:
>>>>>>>
>>>>>>>       if (!get_max_loop_iterations (loop, &iter)
>>>>>>>           || !iter.fits_shwi ())
>>>>>>>         iterations_max = const0_rtx;
>>>>>>>       else
>>>>>>>         iterations_max = GEN_INT (iter.to_shwi ());
>>>>>>>
>>>>>>> and the new code uses:
>>>>>>>
>>>>>>>       if (!get_max_loop_iterations (loop, &iter)
>>>>>>>           || !wi::fits_shwi_p (iter))
>>>>>>>         iterations_max = const0_rtx;
>>>>>>>       else
>>>>>>>         iterations_max = immed_wide_int_const (iter, mode);
>>>>>>>
>>>>>>> which includes an extra canonicalisation.  I agree it would be good to do
>>>>>>> that in principle, but I'm not sure it copes correctly with the case where
>>>>>>> the loop iterates 1 << GET_MODE_PRECISION (mode) times.  Plus I think the
>>>>>>> real fix would be to avoid the host dependence altogether, i.e. get rid
>>>>>>> of the fits_shwi_p too.
>>>>>>>
>>>>>>> As it stands, this breaks bfin-elf's pattern, which has:
>>>>>>>
>>>>>>>       /* Due to limitations in the hardware (an initial loop count of 0
>>>>>>>          does not loop 2^32 times) we must avoid to generate a hardware
>>>>>>>          loops when we cannot rule out this case.  */
>>>>>>>       if (!flag_unsafe_loop_optimizations
>>>>>>>           && (unsigned HOST_WIDE_INT) INTVAL (operands[2]) >= 0xFFFFFFFF)
>>>>>>>         FAIL;
>>>>>>>
>>>>>>> With the sign-extending conversion, this now triggers more often than
>>>>>>> it was supposed to.
>>>>>>>
>>>>>>> Since the old "GEN_INT (iter.to_shwi ());" works verbatim in wide-int too,
>>>>>>> and since we still use that form in the doloop_begin code, I think we should
>>>>>>> just back the immed_wide_int_const out.
>>>>>>>
>>>>>>> Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  It fixes some
>>>>>>> unwanted testsuite changes in bfin-elf.  OK to install?
>>>>>> I dislike this a lot.     I think that this is a badly written pattern
>>>>>> in the bfin port if it depends on the host size.   As machines get
>>>>>> bigger, (they may not be getting faster but they are still getting
>>>>>> bigger) having traps like this at the portable level will bite us.
>>>>> But this isn't at the portable level, it's backend-specific code.
>>>>> It knows that it's dealing with a 32-bit counter.
>>>> yes, but the patch is dumbing down the portable part of the compiler.
>>>>>> in truth this code should really be iterations_max =
>>>>>> immed_wide_int_const (iter, mode) with no tests at all.
>>>>> But like I say, that won't cope with loops that iterate
>>>>> 1 << GET_MODE_PRECISION (mode) times, even if that fits
>>>>> in a signed HWI.  (That case could happen if the hardware
>>>>> supports it and if the counter starts out as 0.)
>>>>>
>>>>> The mainline code is self-consistent in that the number of
>>>>> iterations is passed as a positive signed HWI, using 0 if the
>>>>> value isn't known or is greater than HOST_WIDE_INT_MAX.
>>>>> In the current interface, the CONST_INT doesn't really have a mode.
>>>>> It's just used as a convenient way of passing a HWI to an expander.
>>>>>
>>>>> I'm not saying that it's the ideal interface.  But if we want to change it,
>>>>> let's do that separately, and test it on targets that have doloops.
>>>>> Until then, the wide-int branch can provide the current interface
>>>>> just as easily as mainline.
>>>> we do not have a predicate at the rtl level like we do at the tree level
>>>> where we can ask if the value can be represented in a mode without
>>>> loosing any information.   But the proper thing to do here is to check
>>>> to see if iter fits in the mode, if it does, then generate the
>>>> immed_wide_int_const and if not generate the 0.   In this way it does
>>>> the right thing for loops that iterate 1<<GET_MODE_PRECISION (mode),
>>>> i.e. it will produce a 0.   And that will be correct even if the mode is
>>>> SI or TI.
>>> Well, like I say, I don't see the current rtx has having a mode,
>>> even conceptually.  The CONST_INT is just a convenient way of passing
>>> an iteration count to a function that only accepts rtxes as arguments.
>>> If we could, we'd be passing the widest_int directly as a pointer or
>>> reference.
>>>
>>> But since I'm just repeating what I said above, I should probably bow
>>> out of this one.
>> you point seems to be that this is a place where the "assumption" that
>> rtxes are signed bites us.
>> if mode happens to be SImode but the variable was unsigned and large,
>> then we will sign extend this and get a pessimistic answer for this
>> pattern.
> Right.
>
>> on the other hand, if we do what you suggest, then we are
>> forever limited by the host precision.
>>
>> I see your point, but i do not know what to do.
> Note that what I'm suggesting is simply restoring the mainline "else" code,
> rather than something new.  wide-int doesn't change which of the if-else
> arms are chosen, and if we keep the else arm as-is, existing code will
> be happy.
>
> "Forever" is a bit pessimistic (though probably true).  We're only
> limited until somehow who cares about wide counters comes along and
> changes things.  Or if you feel particularly motivated, you could do
> it after the merge.  It just doesn't seem like something that needs to
> be tackled here and now.
>
> I mean, the current code is limited to CONST_INT -- and the doloop_end
> pattern is specifically documented as taking a CONST_INT -- even though
> we already have CONST_DOUBLE for wider types.  So the current set of rtxes
> would already allow the pattern to pass wider iteration counts if necessary,
> it's just that no port has needed them yet.
>
> Thanks,
> Richard
you are correct, that for now it needs to be "as was".

Patch

Index: gcc/loop-doloop.c
===================================================================
--- gcc/loop-doloop.c	2013-11-02 10:49:37.463178153 +0000
+++ gcc/loop-doloop.c	2013-11-02 10:49:55.927314661 +0000
@@ -549,7 +549,7 @@  doloop_modify (struct loop *loop, struct
   {
     rtx init;
     unsigned level = get_loop_level (loop) + 1;
-    wide_int iter;
+    widest_int iter;
     rtx iter_rtx;
 
     if (!get_max_loop_iterations (loop, &iter)
@@ -673,7 +673,7 @@  doloop_optimize (struct loop *loop)
       || !wi::fits_shwi_p (iter))
     iterations_max = const0_rtx;
   else
-    iterations_max = immed_wide_int_const (iter, mode);
+    iterations_max = GEN_INT (iter.to_shwi ());
   level = get_loop_level (loop) + 1;
 
   /* Generate looping insn.  If the pattern FAILs then give up trying