diff mbox

Construct canonical scaled address expression in IVOPT

Message ID 001001ceb5e8$486bfd80$d943f880$@arm.com
State New
Headers show

Commit Message

Bin Cheng Sept. 20, 2013, 10 a.m. UTC
Hi,
For now IVOPT constructs scaled address expression in the form of
"scaled*index" and checks whether backend supports it. The problem is the
address expression is invalid on ARM, causing scaled expression disabled in
IVOPT on ARM.  This patch fixes the IVOPT part by constructing rtl address
expression like "index*scaled+base".

Hi Richard,
I thought about the suggestion constructing TARGET_MEM[.] and adding new
target hook to check whether backend supports such target memory accesses,
but still want to give this patch a try because:
1) RTL pattern "index*scaled+base" is some kind of canonical form of scaled
address expression and it works fine.
2) It won't save us any inconvenience by constructing TARGET_MEM node, on
contrary, we have to add new target hook checking whether scaled addressing
mode is supported, which in essence is nothing else than current
implementation.

Also "base+index*scaled" is re-structured to canonical form
"index*scaled+base", I constructed the latter form in patch.
Bootstrapped and tested on x86_64 and arm_a15. Is it OK?

Thanks.
bin


2013-09-20  Bin Cheng  <bin.cheng@arm.com>

	* tree-ssa-loop-ivopts.c (multiplier_allowed_in_address_p):
	Construct canonical scaled rtl address expression.

Comments

Richard Biener Sept. 23, 2013, 12:07 p.m. UTC | #1
On Fri, Sep 20, 2013 at 12:00 PM, bin.cheng <bin.cheng@arm.com> wrote:
> Hi,
> For now IVOPT constructs scaled address expression in the form of
> "scaled*index" and checks whether backend supports it. The problem is the
> address expression is invalid on ARM, causing scaled expression disabled in
> IVOPT on ARM.  This patch fixes the IVOPT part by constructing rtl address
> expression like "index*scaled+base".

-      addr = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX);
+      /* Construct address expression in the canonical form of
+ "base+index*scale" and call memory_address_addr_space_p
+ to see whether it is allowed by target processors.  */
+      index = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX);
       for (i = -MAX_RATIO; i <= MAX_RATIO; i++)
  {
-  XEXP (addr, 1) = gen_int_mode (i, address_mode);
+  if (i == 1)
+    continue;
+
+  XEXP (index, 1) = gen_int_mode (i, address_mode);
+  addr = gen_rtx_fmt_ee (PLUS, address_mode, index, reg1);
   if (memory_address_addr_space_p (mode, addr, as))
     bitmap_set_bit (valid_mult, i + MAX_RATIO);

so you now build reg1*scale+reg1 - which checks if offset and scale
work at the same time (and with the same value - given this is
really reg1*(scale+1)).  It might be that this was implicitely assumed
(or that no targets allow scale but no offset), but it's a change that
requires audit of behavior on more targets.

The above also builds more RTX waste which you can fix by re-using
the PLUS by building it up-front similar to the multiplication.  You also
miss the opportunity to have scale == 1 denote as to whether reg1 + reg2
is valid.  I would expect that many targets support reg1 * scale +
constant-offset
but not many reg1 * scale + reg2.

So no, the helper now checks sth completely different.  What's the problem
with arm supporting reg1 * scale?  Why shouldn't it being able to handle
the implicit zero offset?

Richard.

> Hi Richard,
> I thought about the suggestion constructing TARGET_MEM[.] and adding new
> target hook to check whether backend supports such target memory accesses,
> but still want to give this patch a try because:
> 1) RTL pattern "index*scaled+base" is some kind of canonical form of scaled
> address expression and it works fine.
> 2) It won't save us any inconvenience by constructing TARGET_MEM node, on
> contrary, we have to add new target hook checking whether scaled addressing
> mode is supported, which in essence is nothing else than current
> implementation.
>
> Also "base+index*scaled" is re-structured to canonical form
> "index*scaled+base", I constructed the latter form in patch.
> Bootstrapped and tested on x86_64 and arm_a15. Is it OK?
>
> Thanks.
> bin
>
>
> 2013-09-20  Bin Cheng  <bin.cheng@arm.com>
>
>         * tree-ssa-loop-ivopts.c (multiplier_allowed_in_address_p):
>         Construct canonical scaled rtl address expression.
Richard Earnshaw Sept. 23, 2013, 3:20 p.m. UTC | #2
On 23/09/13 13:07, Richard Biener wrote:
> What's the problem
> with arm supporting reg1 * scale?  Why shouldn't it being able to handle
> the implicit zero offset?

Something like "we don't have an instruction that can do that"...

Valid addresses are of the general form

address:=
     '[' base-reg ']'
  |  '[' base-reg ',' offset ']'
  |  '[' base-reg ',' offset-reg ',' addr-scale-op ']'

The base register is mandatory, the scale can only be applied when there
is both a base and an offset which is a register.

R.
Bin Cheng Sept. 24, 2013, 6:20 a.m. UTC | #3
> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: Monday, September 23, 2013 8:08 PM
> To: Bin Cheng
> Cc: GCC Patches
> Subject: Re: [PATCH]Construct canonical scaled address expression in IVOPT
> 
> On Fri, Sep 20, 2013 at 12:00 PM, bin.cheng <bin.cheng@arm.com> wrote:
> > Hi,
> > For now IVOPT constructs scaled address expression in the form of
> > "scaled*index" and checks whether backend supports it. The problem is
> > the address expression is invalid on ARM, causing scaled expression
> > disabled in IVOPT on ARM.  This patch fixes the IVOPT part by
> > constructing rtl address expression like "index*scaled+base".
> 
> -      addr = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX);
> +      /* Construct address expression in the canonical form of
> + "base+index*scale" and call memory_address_addr_space_p to see
> whether
> + it is allowed by target processors.  */
> +      index = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX);
>        for (i = -MAX_RATIO; i <= MAX_RATIO; i++)
>   {
> -  XEXP (addr, 1) = gen_int_mode (i, address_mode);
> +  if (i == 1)
> +    continue;
> +
> +  XEXP (index, 1) = gen_int_mode (i, address_mode);  addr =
> + gen_rtx_fmt_ee (PLUS, address_mode, index, reg1);
>    if (memory_address_addr_space_p (mode, addr, as))
>      bitmap_set_bit (valid_mult, i + MAX_RATIO);
> 
> so you now build reg1*scale+reg1 - which checks if offset and scale work
at
> the same time (and with the same value - given this is really
reg1*(scale+1)).
> It might be that this was implicitely assumed (or that no targets allow
scale
> but no offset), but it's a change that requires audit of behavior on more
> targets.
So literally, function multiplier_allowed_in_address_p should check whether
multiplier is allowed in any kind of addressing mode, like [reg*scale +
offset] and [reg*scale + reg].   Apparently it's infeasible to check every
possibility for each architecture, is it ok we at least check "index", then
"addr" if "index" is failed?  By "any kind of addressing modes", I mean
modes supported by function get_address_cost,  i.e., in form of "[base] +
[off] + [var] + (reg|reg*scale)".

> 
> The above also builds more RTX waste which you can fix by re-using the
PLUS
> by building it up-front similar to the multiplication.  You also miss the
Yes, this can be fixed.

> opportunity to have scale == 1 denote as to whether reg1 + reg2 is valid.
I
> would expect that many targets support reg1 * scale + constant-offset but
> not many reg1 * scale + reg2.
I thought scale==1 is unnecessary because the addressing mode degrades into
"reg" or "reg+reg".  Moreover, calls of multiplier_allowed_in_address_p in
both get_address_cost and get_computation_cost_at have scale other than 1.

> 
> So no, the helper now checks sth completely different.  What's the problem
> with arm supporting reg1 * scale?  Why shouldn't it being able to handle
the
> implicit zero offset?

As Richard clarified, ARM does not support scaled addressing mode without
base register.

Thanks.
bin
Richard Biener Sept. 24, 2013, 10:12 a.m. UTC | #4
On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng <bin.cheng@arm.com> wrote:
>
>
>> -----Original Message-----
>> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> Sent: Monday, September 23, 2013 8:08 PM
>> To: Bin Cheng
>> Cc: GCC Patches
>> Subject: Re: [PATCH]Construct canonical scaled address expression in IVOPT
>>
>> On Fri, Sep 20, 2013 at 12:00 PM, bin.cheng <bin.cheng@arm.com> wrote:
>> > Hi,
>> > For now IVOPT constructs scaled address expression in the form of
>> > "scaled*index" and checks whether backend supports it. The problem is
>> > the address expression is invalid on ARM, causing scaled expression
>> > disabled in IVOPT on ARM.  This patch fixes the IVOPT part by
>> > constructing rtl address expression like "index*scaled+base".
>>
>> -      addr = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX);
>> +      /* Construct address expression in the canonical form of
>> + "base+index*scale" and call memory_address_addr_space_p to see
>> whether
>> + it is allowed by target processors.  */
>> +      index = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX);
>>        for (i = -MAX_RATIO; i <= MAX_RATIO; i++)
>>   {
>> -  XEXP (addr, 1) = gen_int_mode (i, address_mode);
>> +  if (i == 1)
>> +    continue;
>> +
>> +  XEXP (index, 1) = gen_int_mode (i, address_mode);  addr =
>> + gen_rtx_fmt_ee (PLUS, address_mode, index, reg1);
>>    if (memory_address_addr_space_p (mode, addr, as))
>>      bitmap_set_bit (valid_mult, i + MAX_RATIO);
>>
>> so you now build reg1*scale+reg1 - which checks if offset and scale work
> at
>> the same time (and with the same value - given this is really
> reg1*(scale+1)).
>> It might be that this was implicitely assumed (or that no targets allow
> scale
>> but no offset), but it's a change that requires audit of behavior on more
>> targets.
> So literally, function multiplier_allowed_in_address_p should check whether
> multiplier is allowed in any kind of addressing mode, like [reg*scale +
> offset] and [reg*scale + reg].

Or even [reg*scale] (not sure about that).  But yes, at least reg*scale + offset
and reg*scale + reg.

>   Apparently it's infeasible to check every
> possibility for each architecture, is it ok we at least check "index", then
> "addr" if "index" is failed?  By "any kind of addressing modes", I mean
> modes supported by function get_address_cost,  i.e., in form of "[base] +
> [off] + [var] + (reg|reg*scale)".

I suppose so, but it of course depends on what IVOPTs uses the answer
for in the end.  Appearantly it doesn't distinguish between the various cases
even though TARGET_MEM_REF does support all the variants in question
(reg * scale, reg * scale + reg, reg * scale + const, reg * scale +
reg + const).

So the better answer may be to teach the costs about the differences?

>> The above also builds more RTX waste which you can fix by re-using the
> PLUS
>> by building it up-front similar to the multiplication.  You also miss the
> Yes, this can be fixed.
>
>> opportunity to have scale == 1 denote as to whether reg1 + reg2 is valid.
> I
>> would expect that many targets support reg1 * scale + constant-offset but
>> not many reg1 * scale + reg2.
> I thought scale==1 is unnecessary because the addressing mode degrades into
> "reg" or "reg+reg".  Moreover, calls of multiplier_allowed_in_address_p in
> both get_address_cost and get_computation_cost_at have scale other than 1.

Ok.

>>
>> So no, the helper now checks sth completely different.  What's the problem
>> with arm supporting reg1 * scale?  Why shouldn't it being able to handle
> the
>> implicit zero offset?
>
> As Richard clarified, ARM does not support scaled addressing mode without
> base register.

I see.

Richard.
Richard Earnshaw Sept. 24, 2013, 10:36 a.m. UTC | #5
On 24/09/13 11:12, Richard Biener wrote:
> Or even [reg*scale] (not sure about that).  But yes, at least reg*scale + offset
> and reg*scale + reg.

I can't conceive of a realistic case where one would want to scale the
base address.  Scaling the offset is fine, but never the base.  So
reg*scale+offset seems meaningless.

Base + Offset * Scale on the other hand makes much more sense.

Of course, this is all about terminology, so if you regard an immediate,
such as a symbol as an offset, then perhaps you have something close to
the original term, but I think then you've inverted things, since your
symbol is really the base, not the offset.

R.
Richard Biener Sept. 24, 2013, 10:45 a.m. UTC | #6
On Tue, Sep 24, 2013 at 12:36 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
> On 24/09/13 11:12, Richard Biener wrote:
>> Or even [reg*scale] (not sure about that).  But yes, at least reg*scale + offset
>> and reg*scale + reg.
>
> I can't conceive of a realistic case where one would want to scale the
> base address.  Scaling the offset is fine, but never the base.  So
> reg*scale+offset seems meaningless.
>
> Base + Offset * Scale on the other hand makes much more sense.
>
> Of course, this is all about terminology, so if you regard an immediate,
> such as a symbol as an offset, then perhaps you have something close to
> the original term, but I think then you've inverted things, since your
> symbol is really the base, not the offset.

Sure, this can't be the complete memory address - but that doesn't seem to be
what IVOPTs is testing.  Your example of $SYM + offset * scale is a good one
for example.  IVOPTs is merely asking whether it can use a scaled offset.
That the present test doesn't work for arm hints at that it doesn't build up
the correct RTL for the test, but also since this works for other targets which
probably also don't only have a scaled offset without a base it isn't all that
clear what a) the codes desire is, b) the correct solution is without
regressing on
other targets.

Btw, it should be reasonably possible to compute the whole
multiplier_allowed_in_address_p table for all primary and secondary archs
(simply build cross-cc1) and compare the results before / after a patch
candidate.  Querying both reg * scale and reg + reg * scale if the first
fails sounds like a good solution to me.

Richard.

> R.
>
Bin.Cheng Sept. 24, 2013, 11:40 a.m. UTC | #7
On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng <bin.cheng@arm.com> wrote:
>>
>>
>>> -----Original Message-----
>
> Or even [reg*scale] (not sure about that).  But yes, at least reg*scale + offset
> and reg*scale + reg.
>
>>   Apparently it's infeasible to check every
>> possibility for each architecture, is it ok we at least check "index", then
>> "addr" if "index" is failed?  By "any kind of addressing modes", I mean
>> modes supported by function get_address_cost,  i.e., in form of "[base] +
>> [off] + [var] + (reg|reg*scale)".
>
> I suppose so, but it of course depends on what IVOPTs uses the answer
> for in the end.  Appearantly it doesn't distinguish between the various cases
> even though TARGET_MEM_REF does support all the variants in question
> (reg * scale, reg * scale + reg, reg * scale + const, reg * scale +
> reg + const).
>
> So the better answer may be to teach the costs about the differences?
Ideally, IVOPT should be aware whether scaling is allowed in every
kind of addressing modes and account cost of multiplier accordingly.
For current code, there are two scenarios here
1) If target supports reg*scale+reg, but not reg*scale, in this case,
IVOPT considers multiplier is not allowed in any addressing mode and
account multiplier with high cost.  This is the problem arm having.
2) If target supports reg*scale, but not some kind of addressing mode
(saying reg*scale+reg), in this case, IVOPT still constructs various
scaled addressing mode in get_address_cost and depends on address_cost
to compute correct cost for that addressing expression.  I think this
happens to work even IVOPT doesn't know "reg*scale+reg" is actually
not supported.

>
>>> The above also builds more RTX waste which you can fix by re-using the
>> PLUS
>>> by building it up-front similar to the multiplication.  You also miss the
>> Yes, this can be fixed.
>>
>>> opportunity to have scale == 1 denote as to whether reg1 + reg2 is valid.
>> I
>>> would expect that many targets support reg1 * scale + constant-offset but
>>> not many reg1 * scale + reg2.
>> I thought scale==1 is unnecessary because the addressing mode degrades into
>> "reg" or "reg+reg".  Moreover, calls of multiplier_allowed_in_address_p in
>> both get_address_cost and get_computation_cost_at have scale other than 1.
>
> Ok.
>
>>>
>>> So no, the helper now checks sth completely different.  What's the problem
>>> with arm supporting reg1 * scale?  Why shouldn't it being able to handle
>> the
>>> implicit zero offset?
>>
>> As Richard clarified, ARM does not support scaled addressing mode without
>> base register.
>
> I see.
>
Also from the newer comments:

> Btw, it should be reasonably possible to compute the whole
> multiplier_allowed_in_address_p table for all primary and secondary archs
> (simply build cross-cc1) and compare the results before / after a patch
> candidate.  Querying both reg * scale and reg + reg * scale if the first
> fails sounds like a good solution to me.
I take this as we should do minimal change by checking reg + reg *
scale if reg * scale is failed,  right?

Thanks.
bin
Richard Biener Sept. 24, 2013, 11:58 a.m. UTC | #8
On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng <bin.cheng@arm.com> wrote:
>>>
>>>
>>>> -----Original Message-----
>>
>> Or even [reg*scale] (not sure about that).  But yes, at least reg*scale + offset
>> and reg*scale + reg.
>>
>>>   Apparently it's infeasible to check every
>>> possibility for each architecture, is it ok we at least check "index", then
>>> "addr" if "index" is failed?  By "any kind of addressing modes", I mean
>>> modes supported by function get_address_cost,  i.e., in form of "[base] +
>>> [off] + [var] + (reg|reg*scale)".
>>
>> I suppose so, but it of course depends on what IVOPTs uses the answer
>> for in the end.  Appearantly it doesn't distinguish between the various cases
>> even though TARGET_MEM_REF does support all the variants in question
>> (reg * scale, reg * scale + reg, reg * scale + const, reg * scale +
>> reg + const).
>>
>> So the better answer may be to teach the costs about the differences?
> Ideally, IVOPT should be aware whether scaling is allowed in every
> kind of addressing modes and account cost of multiplier accordingly.
> For current code, there are two scenarios here
> 1) If target supports reg*scale+reg, but not reg*scale, in this case,
> IVOPT considers multiplier is not allowed in any addressing mode and
> account multiplier with high cost.  This is the problem arm having.
> 2) If target supports reg*scale, but not some kind of addressing mode
> (saying reg*scale+reg), in this case, IVOPT still constructs various
> scaled addressing mode in get_address_cost and depends on address_cost
> to compute correct cost for that addressing expression.  I think this
> happens to work even IVOPT doesn't know "reg*scale+reg" is actually
> not supported.
>
>>
>>>> The above also builds more RTX waste which you can fix by re-using the
>>> PLUS
>>>> by building it up-front similar to the multiplication.  You also miss the
>>> Yes, this can be fixed.
>>>
>>>> opportunity to have scale == 1 denote as to whether reg1 + reg2 is valid.
>>> I
>>>> would expect that many targets support reg1 * scale + constant-offset but
>>>> not many reg1 * scale + reg2.
>>> I thought scale==1 is unnecessary because the addressing mode degrades into
>>> "reg" or "reg+reg".  Moreover, calls of multiplier_allowed_in_address_p in
>>> both get_address_cost and get_computation_cost_at have scale other than 1.
>>
>> Ok.
>>
>>>>
>>>> So no, the helper now checks sth completely different.  What's the problem
>>>> with arm supporting reg1 * scale?  Why shouldn't it being able to handle
>>> the
>>>> implicit zero offset?
>>>
>>> As Richard clarified, ARM does not support scaled addressing mode without
>>> base register.
>>
>> I see.
>>
> Also from the newer comments:
>
>> Btw, it should be reasonably possible to compute the whole
>> multiplier_allowed_in_address_p table for all primary and secondary archs
>> (simply build cross-cc1) and compare the results before / after a patch
>> candidate.  Querying both reg * scale and reg + reg * scale if the first
>> fails sounds like a good solution to me.
> I take this as we should do minimal change by checking reg + reg *
> scale if reg * scale is failed,  right?

Yes, you can share a single RTL expression for all this and I think
querying reg + reg * scale first makes sense (then fallback to reg * scale
for compatibility).

Richard.

> Thanks.
> bin
> --
> Best Regards.
Bin Cheng Sept. 26, 2013, 12:05 p.m. UTC | #9
> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: Tuesday, September 24, 2013 7:58 PM
> To: Bin.Cheng
> Cc: Bin Cheng; GCC Patches; Richard Earnshaw
> Subject: Re: [PATCH]Construct canonical scaled address expression in IVOPT
> 
> On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng <amker.cheng@gmail.com>
> wrote:
> > On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener
> > <richard.guenther@gmail.com> wrote:
> >> On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng <bin.cheng@arm.com> wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>
> >> Or even [reg*scale] (not sure about that).  But yes, at least
> >> reg*scale + offset and reg*scale + reg.
> >>
> >>>   Apparently it's infeasible to check every possibility for each
> >>> architecture, is it ok we at least check "index", then "addr" if
> >>> "index" is failed?  By "any kind of addressing modes", I mean modes
> >>> supported by function get_address_cost,  i.e., in form of "[base] +
> >>> [off] + [var] + (reg|reg*scale)".
> >>
> >> I suppose so, but it of course depends on what IVOPTs uses the answer
> >> for in the end.  Appearantly it doesn't distinguish between the
> >> various cases even though TARGET_MEM_REF does support all the
> >> variants in question (reg * scale, reg * scale + reg, reg * scale +
> >> const, reg * scale + reg + const).
> >>
> >> So the better answer may be to teach the costs about the differences?
> > Ideally, IVOPT should be aware whether scaling is allowed in every
> > kind of addressing modes and account cost of multiplier accordingly.
> > For current code, there are two scenarios here
> > 1) If target supports reg*scale+reg, but not reg*scale, in this case,
> > IVOPT considers multiplier is not allowed in any addressing mode and
> > account multiplier with high cost.  This is the problem arm having.
> > 2) If target supports reg*scale, but not some kind of addressing mode
> > (saying reg*scale+reg), in this case, IVOPT still constructs various
> > scaled addressing mode in get_address_cost and depends on address_cost
> > to compute correct cost for that addressing expression.  I think this
> > happens to work even IVOPT doesn't know "reg*scale+reg" is actually
> > not supported.
> >
> >>
> >>>> The above also builds more RTX waste which you can fix by re-using
> >>>> the
> >>> PLUS
> >>>> by building it up-front similar to the multiplication.  You also
> >>>> miss the
> >>> Yes, this can be fixed.
> >>>
> >>>> opportunity to have scale == 1 denote as to whether reg1 + reg2 is
valid.
> >>> I
> >>>> would expect that many targets support reg1 * scale +
> >>>> constant-offset but not many reg1 * scale + reg2.
> >>> I thought scale==1 is unnecessary because the addressing mode
> >>> degrades into "reg" or "reg+reg".  Moreover, calls of
> >>> multiplier_allowed_in_address_p in both get_address_cost and
> get_computation_cost_at have scale other than 1.
> >>
> >> Ok.
> >>
> >>>>
> >>>> So no, the helper now checks sth completely different.  What's the
> >>>> problem with arm supporting reg1 * scale?  Why shouldn't it being
> >>>> able to handle
> >>> the
> >>>> implicit zero offset?
> >>>
> >>> As Richard clarified, ARM does not support scaled addressing mode
> >>> without base register.
> >>
> >> I see.
> >>
> > Also from the newer comments:
> >
> >> Btw, it should be reasonably possible to compute the whole
> >> multiplier_allowed_in_address_p table for all primary and secondary
> >> archs (simply build cross-cc1) and compare the results before / after
> >> a patch candidate.  Querying both reg * scale and reg + reg * scale
> >> if the first fails sounds like a good solution to me.
> > I take this as we should do minimal change by checking reg + reg *
> > scale if reg * scale is failed,  right?
> 
> Yes, you can share a single RTL expression for all this and I think
querying reg
> + reg * scale first makes sense (then fallback to reg * scale for
compatibility).
> 
I updated the patch as discussed.  Please review.
It bootstraps and passes regression test on x86/x86_64, but fails
tree-ssa/scev-4.c on arm cortex-a15.
Hi Richard, I investigated the failure and found out it reveals two other
problems in IVOPT we have discussed.
For scev-4.c like:

typedef struct {
	int x;
	int y;
} S;
int *a_p;
S a[1000];
f(int k)
{
	int i;

	for (i=k; i<1000; i+=k) {
		a_p = &a[i].y;
		*a_p = 100;
        }
}

The iv candidates and uses are like:

use 0
  generic
  in statement a_p.0_5 = &a[k_11].y;

  at position 
  type int *
  base (int *) &a + ((sizetype) k_3(D) * 8 + 4)
  step (sizetype) k_3(D) * 8
  base object (void *) &a
  related candidates 
use 1
  address
  in statement MEM[(int *)&a][k_11].y = 100;

  at position MEM[(int *)&a][k_11].y
  type int *
  base &MEM[(int *)&a][k_3(D)].y
  step (sizetype) k_3(D) * 8
  base object (void *) &a
  related candidates

candidate 4 (important)
  depends on 1
  original biv
  type int
  base k_3(D)
  step k_3(D)

candidate 7
  depends on 1
  var_before ivtmp.12
  var_after ivtmp.12
  incremented before exit test
  type unsigned int
  base (unsigned int) ((int *) &a + (sizetype) k_3(D) * 8)
  step (unsigned int) ((sizetype) k_3(D) * 8)
  base object (void *) &a
Candidate 7 is related to use 0

Problem 1) When computing cost of use 1 using cand 4, the call of
strip_offset (&MEM[(int *)&a][k_3(D)].y, &off1) computes off1 == 0, which is
actually 4.
This can be fixed my previous patch which still in re-working.
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01749.html

Problem 2) It allocates iv with not simplified base like &MEM[(int
*)&a][k_3(D)].y,  while cost computation functions like ptr_difference_const
can't handle such complex address properly (also force_expr_to_var_cost
etc.).  In this case, it can't understand that "((int *) &a + (sizetype)
k_3(D) * 8)" and "&MEM[(int *)&a][k_3(D)]" are actually equal.
I think it would substantially simplify computation of cost in IVOPT if we
can allocate iv with simplified base expression.  I have another patch for
this and will send it for review.

BTW, the failure can be fixed by the offset fixation patch.

So is it OK to applied this updated patch?

Thanks.
bin
diff mbox

Patch

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 202599)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -3134,15 +3134,22 @@  multiplier_allowed_in_address_p (HOST_WIDE_INT rat
     {
       enum machine_mode address_mode = targetm.addr_space.address_mode (as);
       rtx reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
-      rtx addr;
+      rtx addr, index;
       HOST_WIDE_INT i;
 
       valid_mult = sbitmap_alloc (2 * MAX_RATIO + 1);
       bitmap_clear (valid_mult);
-      addr = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX);
+      /* Construct address expression in the canonical form of
+	 "base+index*scale" and call memory_address_addr_space_p
+	 to see whether it is allowed by target processors.  */
+      index = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX);
       for (i = -MAX_RATIO; i <= MAX_RATIO; i++)
 	{
-	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
+	  if (i == 1)
+	    continue;
+
+	  XEXP (index, 1) = gen_int_mode (i, address_mode);
+	  addr = gen_rtx_fmt_ee (PLUS, address_mode, index, reg1);
 	  if (memory_address_addr_space_p (mode, addr, as))
 	    bitmap_set_bit (valid_mult, i + MAX_RATIO);
 	}