diff mbox

Compute, cache and use cost of auto-increment rtx patterns in IVOPT

Message ID 001001ced90e$4ecd9af0$ec68d0d0$@arm.com
State New
Headers show

Commit Message

Bin Cheng Nov. 4, 2013, 3:31 a.m. UTC
Hi,

The IVOPT in GCC has a problem that it does not use cost of auto-increment
address expression in accounting, while it retreats to cost of address
expression if auto-increment addressing mode is unavailable.
For example, on ARM target:
1) the cost of "[reg]" (which is 6) is used for address expression "[reg],
#off";
2) the cost of "[reg+off]" (which is 2) is used for address expression
"[reg, #off]!"; 

This causes:
1) cost of non-auto increment address expression is used for auto-increment
address expression;
2) different address costs are used for pre/post increment address
expressions.
This patch fixes the problem by computing, caching and using the cost of
auto-increment address expressions.

Bootstrap and test on x86/arm.  Is it OK?

2013-11-01  Bin Cheng  <bin.cheng@arm.com>

	* tree-ssa-loop-ivopts.c (enum ainc_type): New.
	(address_cost_data): New field.
	(get_address_cost): Compute auto-increment rtx cost in ainc_costs.
	Use ainc_costs for auto-increment rtx patterns.
	Cleanup TWS.

Comments

Richard Biener Nov. 4, 2013, 11:38 a.m. UTC | #1
On Mon, Nov 4, 2013 at 4:31 AM, bin.cheng <bin.cheng@arm.com> wrote:
> Hi,
>
> The IVOPT in GCC has a problem that it does not use cost of auto-increment
> address expression in accounting, while it retreats to cost of address
> expression if auto-increment addressing mode is unavailable.
> For example, on ARM target:
> 1) the cost of "[reg]" (which is 6) is used for address expression "[reg],
> #off";
> 2) the cost of "[reg+off]" (which is 2) is used for address expression
> "[reg, #off]!";
>
> This causes:
> 1) cost of non-auto increment address expression is used for auto-increment
> address expression;
> 2) different address costs are used for pre/post increment address
> expressions.
> This patch fixes the problem by computing, caching and using the cost of
> auto-increment address expressions.
>
> Bootstrap and test on x86/arm.  Is it OK?

But don't you need to adjust

static bool
determine_use_iv_cost_address (struct ivopts_data *data,
                               struct iv_use *use, struct iv_cand *cand)
{
  bitmap depends_on;
  bool can_autoinc;
  int inv_expr_id = -1;
  comp_cost cost = get_computation_cost (data, use, cand, true, &depends_on,
                                         &can_autoinc, &inv_expr_id);

  if (cand->ainc_use == use)
    {
      if (can_autoinc)
        cost.cost -= cand->cost_step;

this which seems to try to compensate for your issue?

Or maybe I don't understand.

CCing Bernd who implemented this IIRC.

Richard.

> 2013-11-01  Bin Cheng  <bin.cheng@arm.com>
>
>         * tree-ssa-loop-ivopts.c (enum ainc_type): New.
>         (address_cost_data): New field.
>         (get_address_cost): Compute auto-increment rtx cost in ainc_costs.
>         Use ainc_costs for auto-increment rtx patterns.
>         Cleanup TWS.
Bin.Cheng Nov. 4, 2013, 12:56 p.m. UTC | #2
On Mon, Nov 4, 2013 at 7:38 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Nov 4, 2013 at 4:31 AM, bin.cheng <bin.cheng@arm.com> wrote:
>> Hi,
>>
>> The IVOPT in GCC has a problem that it does not use cost of auto-increment
>> address expression in accounting, while it retreats to cost of address
>> expression if auto-increment addressing mode is unavailable.
>> For example, on ARM target:
>> 1) the cost of "[reg]" (which is 6) is used for address expression "[reg],
>> #off";
>> 2) the cost of "[reg+off]" (which is 2) is used for address expression
>> "[reg, #off]!";
>>
>> This causes:
>> 1) cost of non-auto increment address expression is used for auto-increment
>> address expression;
>> 2) different address costs are used for pre/post increment address
>> expressions.
>> This patch fixes the problem by computing, caching and using the cost of
>> auto-increment address expressions.
>>
>> Bootstrap and test on x86/arm.  Is it OK?
>
> But don't you need to adjust
>
> static bool
> determine_use_iv_cost_address (struct ivopts_data *data,
>                                struct iv_use *use, struct iv_cand *cand)
> {
>   bitmap depends_on;
>   bool can_autoinc;
>   int inv_expr_id = -1;
>   comp_cost cost = get_computation_cost (data, use, cand, true, &depends_on,
>                                          &can_autoinc, &inv_expr_id);
>
>   if (cand->ainc_use == use)
>     {
>       if (can_autoinc)
>         cost.cost -= cand->cost_step;
>
> this which seems to try to compensate for your issue?
That's where problem gets complicated depending on how backend defines
address cost.  For back ends define cost according to the true cost of
addressing mode approximately, the address cost of auto-increment
addressing mode doesn't take the saved stepping instruction into
consideration, so the code is necessary.
Moreover, according to gcc internal's description about
TARGET_ADDRESS_COST, RISC machines may define different address cost
for addressing modes which actually have equal execution on
micro-architecture level (like ARM for now).  The problems are:
1) Though address costs are defined in this "discriminated" way, it's
unlikely to have the saved stepping instruction considered either.
The address cost of auto-increment address expression shouldn't go so
far.
2) We think the "discriminated" address cost model is established
before gimple pass and is outdated.  The true micro-architecture
address cost (or cost normalized with COSTS_N_INSNS) should be used in
GCC nowadays.  The rtl passes like fwprop_addr which use address cost
as heuristic information should be refined... but that's totally
another problem (am investigating it).

So overall, I think the stepping cost should to be subtracted here.

>
> Or maybe I don't understand.
>
> CCing Bernd who implemented this IIRC.
Any suggestions appreciated.

Thanks.
bin
Bin Cheng Nov. 12, 2013, 7:08 a.m. UTC | #3
Ping in this one.
Hi Bernd, could you please help us on this one?
Please ignore the previous ping message because I used the wrong email
account.

Sorry for the inconvenience.

Thanks,
bin

> -----Original Message-----
> From: Bin.Cheng [mailto:amker.cheng@gmail.com]
> Sent: Monday, November 04, 2013 8:56 PM
> To: Richard Biener
> Cc: Bin Cheng; Bernd Schmidt; GCC Patches
> Subject: Re: [PATCH GCC]Compute, cache and use cost of auto-increment rtx
> patterns in IVOPT
> 
> On Mon, Nov 4, 2013 at 7:38 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
> > On Mon, Nov 4, 2013 at 4:31 AM, bin.cheng <bin.cheng@arm.com> wrote:
> >> Hi,
> >>
> >> The IVOPT in GCC has a problem that it does not use cost of
> >> auto-increment address expression in accounting, while it retreats to
> >> cost of address expression if auto-increment addressing mode is
> unavailable.
> >> For example, on ARM target:
> >> 1) the cost of "[reg]" (which is 6) is used for address expression
> >> "[reg], #off";
> >> 2) the cost of "[reg+off]" (which is 2) is used for address
> >> expression "[reg, #off]!";
> >>
> >> This causes:
> >> 1) cost of non-auto increment address expression is used for
> >> auto-increment address expression;
> >> 2) different address costs are used for pre/post increment address
> >> expressions.
> >> This patch fixes the problem by computing, caching and using the cost
> >> of auto-increment address expressions.
> >>
> >> Bootstrap and test on x86/arm.  Is it OK?
> >
> > But don't you need to adjust
> >
> > static bool
> > determine_use_iv_cost_address (struct ivopts_data *data,
> >                                struct iv_use *use, struct iv_cand
> > *cand) {
> >   bitmap depends_on;
> >   bool can_autoinc;
> >   int inv_expr_id = -1;
> >   comp_cost cost = get_computation_cost (data, use, cand, true,
> &depends_on,
> >                                          &can_autoinc, &inv_expr_id);
> >
> >   if (cand->ainc_use == use)
> >     {
> >       if (can_autoinc)
> >         cost.cost -= cand->cost_step;
> >
> > this which seems to try to compensate for your issue?
> That's where problem gets complicated depending on how backend defines
> address cost.  For back ends define cost according to the true cost of
> addressing mode approximately, the address cost of auto-increment
> addressing mode doesn't take the saved stepping instruction into
> consideration, so the code is necessary.
> Moreover, according to gcc internal's description about
> TARGET_ADDRESS_COST, RISC machines may define different address cost
> for addressing modes which actually have equal execution on micro-
> architecture level (like ARM for now).  The problems are:
> 1) Though address costs are defined in this "discriminated" way, it's
unlikely
> to have the saved stepping instruction considered either.
> The address cost of auto-increment address expression shouldn't go so far.
> 2) We think the "discriminated" address cost model is established before
> gimple pass and is outdated.  The true micro-architecture address cost (or
> cost normalized with COSTS_N_INSNS) should be used in GCC nowadays.
> The rtl passes like fwprop_addr which use address cost as heuristic
> information should be refined... but that's totally another problem (am
> investigating it).
> 
> So overall, I think the stepping cost should to be subtracted here.
> 
> >
> > Or maybe I don't understand.
> >
> > CCing Bernd who implemented this IIRC.
> Any suggestions appreciated.
> 
> Thanks.
> bin
> 
> --
> Best Regards.
Bin.Cheng Nov. 18, 2013, 10:15 a.m. UTC | #4
Ping^2

Thanks,
bin

On Tue, Nov 12, 2013 at 3:08 PM, bin.cheng <bin.cheng@arm.com> wrote:
> Ping in this one.
> Hi Bernd, could you please help us on this one?

> Sorry for the inconvenience.
>
> Thanks,
> bin
>
>> -----Original Message-----
>> From: Bin.Cheng [mailto:amker.cheng@gmail.com]
>> Sent: Monday, November 04, 2013 8:56 PM
>> To: Richard Biener
>> Cc: Bin Cheng; Bernd Schmidt; GCC Patches
>> Subject: Re: [PATCH GCC]Compute, cache and use cost of auto-increment rtx
>> patterns in IVOPT
>>
>> On Mon, Nov 4, 2013 at 7:38 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>> > On Mon, Nov 4, 2013 at 4:31 AM, bin.cheng <bin.cheng@arm.com> wrote:
>> >> Hi,
>> >>
>> >> The IVOPT in GCC has a problem that it does not use cost of
>> >> auto-increment address expression in accounting, while it retreats to
>> >> cost of address expression if auto-increment addressing mode is
>> unavailable.
>> >> For example, on ARM target:
>> >> 1) the cost of "[reg]" (which is 6) is used for address expression
>> >> "[reg], #off";
>> >> 2) the cost of "[reg+off]" (which is 2) is used for address
>> >> expression "[reg, #off]!";
>> >>
>> >> This causes:
>> >> 1) cost of non-auto increment address expression is used for
>> >> auto-increment address expression;
>> >> 2) different address costs are used for pre/post increment address
>> >> expressions.
>> >> This patch fixes the problem by computing, caching and using the cost
>> >> of auto-increment address expressions.
>> >>
>> >> Bootstrap and test on x86/arm.  Is it OK?
>> >
>> > But don't you need to adjust
>> >
>> > static bool
>> > determine_use_iv_cost_address (struct ivopts_data *data,
>> >                                struct iv_use *use, struct iv_cand
>> > *cand) {
>> >   bitmap depends_on;
>> >   bool can_autoinc;
>> >   int inv_expr_id = -1;
>> >   comp_cost cost = get_computation_cost (data, use, cand, true,
>> &depends_on,
>> >                                          &can_autoinc, &inv_expr_id);
>> >
>> >   if (cand->ainc_use == use)
>> >     {
>> >       if (can_autoinc)
>> >         cost.cost -= cand->cost_step;
>> >
>> > this which seems to try to compensate for your issue?
>> That's where problem gets complicated depending on how backend defines
>> address cost.  For back ends define cost according to the true cost of
>> addressing mode approximately, the address cost of auto-increment
>> addressing mode doesn't take the saved stepping instruction into
>> consideration, so the code is necessary.
>> Moreover, according to gcc internal's description about
>> TARGET_ADDRESS_COST, RISC machines may define different address cost
>> for addressing modes which actually have equal execution on micro-
>> architecture level (like ARM for now).  The problems are:
>> 1) Though address costs are defined in this "discriminated" way, it's
> unlikely
>> to have the saved stepping instruction considered either.
>> The address cost of auto-increment address expression shouldn't go so far.
>> 2) We think the "discriminated" address cost model is established before
>> gimple pass and is outdated.  The true micro-architecture address cost (or
>> cost normalized with COSTS_N_INSNS) should be used in GCC nowadays.
>> The rtl passes like fwprop_addr which use address cost as heuristic
>> information should be refined... but that's totally another problem (am
>> investigating it).
>>
>> So overall, I think the stepping cost should to be subtracted here.
>>
>> >
>> > Or maybe I don't understand.
>> >
>> > CCing Bernd who implemented this IIRC.
>> Any suggestions appreciated.
>>
>> Thanks.
>> bin
>>
>> --
>> Best Regards.
>
>
>
>
Bernd Schmidt Nov. 18, 2013, 12:04 p.m. UTC | #5
On 11/04/2013 04:31 AM, bin.cheng wrote:
> 2013-11-01  Bin Cheng  <bin.cheng@arm.com>
> 
> 	* tree-ssa-loop-ivopts.c (enum ainc_type): New.
> 	(address_cost_data): New field.
> 	(get_address_cost): Compute auto-increment rtx cost in ainc_costs.
> 	Use ainc_costs for auto-increment rtx patterns.
> 	Cleanup TWS.

I think this is fine. I'd just like to see AINC_NUM gone and its use
replaced by AIC_NONE, we don't really need two separate enum codes for that.


Bernd
diff mbox

Patch

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 204117)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -3185,10 +3185,21 @@  multiplier_allowed_in_address_p (HOST_WIDE_INT rat
 
    TODO -- there must be some better way.  This all is quite crude.  */
 
+enum ainc_type
+{
+  AINC_PRE_INC,		/* Pre increment.  */
+  AINC_PRE_DEC,		/* Pre decrement.  */
+  AINC_POST_INC,	/* Post increment.  */
+  AINC_POST_DEC,	/* Post decrement.  */
+  AINC_NUM,		/* Number of auto increment types.  */
+  AINC_NONE
+};
+
 typedef struct address_cost_data_s
 {
   HOST_WIDE_INT min_offset, max_offset;
   unsigned costs[2][2][2][2];
+  unsigned ainc_costs[AINC_NUM];
 } *address_cost_data;
 
 
@@ -3206,6 +3217,7 @@  get_address_cost (bool symbol_present, bool var_pr
   static bool has_preinc[MAX_MACHINE_MODE], has_postinc[MAX_MACHINE_MODE];
   static bool has_predec[MAX_MACHINE_MODE], has_postdec[MAX_MACHINE_MODE];
   unsigned cost, acost, complexity;
+  enum ainc_type autoinc_type;
   bool offset_p, ratio_p, autoinc;
   HOST_WIDE_INT s_offset, autoinc_offset, msize;
   unsigned HOST_WIDE_INT mask;
@@ -3277,33 +3289,49 @@  get_address_cost (bool symbol_present, bool var_pr
       reg0 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 2);
 
-      if (USE_LOAD_PRE_DECREMENT (mem_mode) 
+      if (USE_LOAD_PRE_DECREMENT (mem_mode)
 	  || USE_STORE_PRE_DECREMENT (mem_mode))
 	{
 	  addr = gen_rtx_PRE_DEC (address_mode, reg0);
 	  has_predec[mem_mode]
 	    = memory_address_addr_space_p (mem_mode, addr, as);
+
+	  if (has_predec[mem_mode])
+	    data->ainc_costs[AINC_PRE_DEC]
+	      = address_cost (addr, mem_mode, as, speed);
 	}
-      if (USE_LOAD_POST_DECREMENT (mem_mode) 
+      if (USE_LOAD_POST_DECREMENT (mem_mode)
 	  || USE_STORE_POST_DECREMENT (mem_mode))
 	{
 	  addr = gen_rtx_POST_DEC (address_mode, reg0);
 	  has_postdec[mem_mode]
 	    = memory_address_addr_space_p (mem_mode, addr, as);
+
+	  if (has_postdec[mem_mode])
+	    data->ainc_costs[AINC_POST_DEC]
+	      = address_cost (addr, mem_mode, as, speed);
 	}
-      if (USE_LOAD_PRE_INCREMENT (mem_mode) 
+      if (USE_LOAD_PRE_INCREMENT (mem_mode)
 	  || USE_STORE_PRE_DECREMENT (mem_mode))
 	{
 	  addr = gen_rtx_PRE_INC (address_mode, reg0);
 	  has_preinc[mem_mode]
 	    = memory_address_addr_space_p (mem_mode, addr, as);
+
+	  if (has_preinc[mem_mode])
+	    data->ainc_costs[AINC_PRE_INC]
+	      = address_cost (addr, mem_mode, as, speed);
 	}
-      if (USE_LOAD_POST_INCREMENT (mem_mode) 
+      if (USE_LOAD_POST_INCREMENT (mem_mode)
 	  || USE_STORE_POST_INCREMENT (mem_mode))
 	{
 	  addr = gen_rtx_POST_INC (address_mode, reg0);
 	  has_postinc[mem_mode]
 	    = memory_address_addr_space_p (mem_mode, addr, as);
+
+	  if (has_postinc[mem_mode])
+	    data->ainc_costs[AINC_POST_INC]
+	      = address_cost (addr, mem_mode, as, speed);
 	}
       for (i = 0; i < 16; i++)
 	{
@@ -3429,22 +3457,32 @@  get_address_cost (bool symbol_present, bool var_pr
   s_offset = offset;
 
   autoinc = false;
+  autoinc_type = AINC_NONE;
   msize = GET_MODE_SIZE (mem_mode);
   autoinc_offset = offset;
   if (stmt_after_inc)
     autoinc_offset += ratio * cstep;
   if (symbol_present || var_present || ratio != 1)
     autoinc = false;
-  else if ((has_postinc[mem_mode] && autoinc_offset == 0
+  else
+    {
+      if (has_postinc[mem_mode] && autoinc_offset == 0
+	  && msize == cstep)
+	autoinc_type = AINC_POST_INC;
+      else if (has_postdec[mem_mode] && autoinc_offset == 0
+	       && msize == -cstep)
+	autoinc_type = AINC_POST_DEC;
+      else if (has_preinc[mem_mode] && autoinc_offset == msize
 	       && msize == cstep)
-	   || (has_postdec[mem_mode] && autoinc_offset == 0
+	autoinc_type = AINC_PRE_INC;
+      else if (has_predec[mem_mode] && autoinc_offset == -msize
 	       && msize == -cstep)
-	   || (has_preinc[mem_mode] && autoinc_offset == msize
-	       && msize == cstep)
-	   || (has_predec[mem_mode] && autoinc_offset == -msize
-	       && msize == -cstep))
-    autoinc = true;
+	autoinc_type = AINC_PRE_DEC;
 
+      if (autoinc_type != AINC_NONE)
+	autoinc = true;
+    }
+
   cost = 0;
   offset_p = (s_offset != 0
 	      && data->min_offset <= s_offset
@@ -3460,7 +3498,10 @@  get_address_cost (bool symbol_present, bool var_pr
 
   if (may_autoinc)
     *may_autoinc = autoinc;
-  acost = data->costs[symbol_present][var_present][offset_p][ratio_p];
+  if (autoinc)
+    acost = data->ainc_costs[autoinc_type];
+  else
+    acost = data->costs[symbol_present][var_present][offset_p][ratio_p];
   complexity = (symbol_present != 0) + (var_present != 0) + offset_p + ratio_p;
   return new_cost (cost + acost, complexity);
 }