diff mbox

[RFC] Request for comments on ivopts patch

Message ID 26dad41f-3c44-4b56-8ea1-ca3b7adbc053@BAMAIL02.ba.imgtec.org
State New
Headers show

Commit Message

Steve Ellcey Dec. 8, 2015, 6:56 p.m. UTC
I have an ivopts optimization question/proposal.  When compiling the
attached program the ivopts pass prefers the original ivs over new ivs
and that causes us to generate less efficient code on MIPS.  It may
affect other platforms too.

The Source code is a C strcmp:

int strcmp (const char *p1, const char *p2)
{
  const unsigned char *s1 = (const unsigned char *) p1;
  const unsigned char *s2 = (const unsigned char *) p2;
  unsigned char c1, c2;
  do {
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0') return c1 - c2;
  } while (c1 == c2);
  return c1 - c2;
}

Currently the code prefers the original ivs and so it generates
code that increments s1 and s2 before doing the loads (and uses
a -1 offset):

  <bb 3>:
  # s1_1 = PHI <p1_4(D)(2), s1_6(6)>
  # s2_2 = PHI <p2_5(D)(2), s2_9(6)>
  s1_6 = s1_1 + 1;
  c1_8 = MEM[base: s1_6, offset: 4294967295B];
  s2_9 = s2_2 + 1;
  c2_10 = MEM[base: s2_9, offset: 4294967295B];
  if (c1_8 == 0)
    goto <bb 4>;
  else
    goto <bb 5>;

If I remove the cost increment for non-original ivs then GCC
does the loads before the increments:

 <bb 3>:
  # ivtmp.6_17 = PHI <ivtmp.6_24(2), ivtmp.6_14(6)>
  # ivtmp.7_21 = PHI <ivtmp.7_22(2), ivtmp.7_23(6)>
  _25 = (void *) ivtmp.6_17;
  c1_8 = MEM[base: _25, offset: 0B];
  _26 = (void *) ivtmp.7_21;
  c2_10 = MEM[base: _26, offset: 0B];
  if (c1_8 == 0)
    goto <bb 4>;
  else
    goto <bb 5>;
.
.
  <bb 5>:
  ivtmp.6_14 = ivtmp.6_17 + 1;
  ivtmp.7_23 = ivtmp.7_21 + 1;
  if (c1_8 == c2_10)
    goto <bb 6>;
  else
    goto <bb 7>;


This second case (without the preference for the original IV)
generates better code on MIPS because the final assembly 
has the increment instructions between the loads and the tests
of the values being loaded and so there is no delay (or less delay)
between the load and use.  It seems like this could easily be 
the case for other platforms too so I was wondering what people
thought of this patch:


2015-12-08  Steve Ellcey  <sellcey@imgtec.com>

	* tree-ssa-loop-ivopts.c (determine_iv_cost): Remove preference for
	original ivs.

Comments

Richard Biener Dec. 9, 2015, 10:24 a.m. UTC | #1
On Tue, Dec 8, 2015 at 7:56 PM, Steve Ellcey <sellcey@imgtec.com> wrote:
> I have an ivopts optimization question/proposal.  When compiling the
> attached program the ivopts pass prefers the original ivs over new ivs
> and that causes us to generate less efficient code on MIPS.  It may
> affect other platforms too.
>
> The Source code is a C strcmp:
>
> int strcmp (const char *p1, const char *p2)
> {
>   const unsigned char *s1 = (const unsigned char *) p1;
>   const unsigned char *s2 = (const unsigned char *) p2;
>   unsigned char c1, c2;
>   do {
>       c1 = (unsigned char) *s1++;
>       c2 = (unsigned char) *s2++;
>       if (c1 == '\0') return c1 - c2;
>   } while (c1 == c2);
>   return c1 - c2;
> }
>
> Currently the code prefers the original ivs and so it generates
> code that increments s1 and s2 before doing the loads (and uses
> a -1 offset):
>
>   <bb 3>:
>   # s1_1 = PHI <p1_4(D)(2), s1_6(6)>
>   # s2_2 = PHI <p2_5(D)(2), s2_9(6)>
>   s1_6 = s1_1 + 1;
>   c1_8 = MEM[base: s1_6, offset: 4294967295B];
>   s2_9 = s2_2 + 1;
>   c2_10 = MEM[base: s2_9, offset: 4294967295B];
>   if (c1_8 == 0)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
>
> If I remove the cost increment for non-original ivs then GCC
> does the loads before the increments:
>
>  <bb 3>:
>   # ivtmp.6_17 = PHI <ivtmp.6_24(2), ivtmp.6_14(6)>
>   # ivtmp.7_21 = PHI <ivtmp.7_22(2), ivtmp.7_23(6)>
>   _25 = (void *) ivtmp.6_17;
>   c1_8 = MEM[base: _25, offset: 0B];
>   _26 = (void *) ivtmp.7_21;
>   c2_10 = MEM[base: _26, offset: 0B];
>   if (c1_8 == 0)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
> .
> .
>   <bb 5>:
>   ivtmp.6_14 = ivtmp.6_17 + 1;
>   ivtmp.7_23 = ivtmp.7_21 + 1;
>   if (c1_8 == c2_10)
>     goto <bb 6>;
>   else
>     goto <bb 7>;
>
>
> This second case (without the preference for the original IV)
> generates better code on MIPS because the final assembly
> has the increment instructions between the loads and the tests
> of the values being loaded and so there is no delay (or less delay)
> between the load and use.  It seems like this could easily be
> the case for other platforms too so I was wondering what people
> thought of this patch:

You don't comment on the comment you remove ... debugging
programs is also important!

So if then the cost of both cases should be distinguished
somewhere else, like granting a bonus for increment before
exit test or so.

Richard.

> 2015-12-08  Steve Ellcey  <sellcey@imgtec.com>
>
>         * tree-ssa-loop-ivopts.c (determine_iv_cost): Remove preference for
>         original ivs.
>
>
> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
> index 98dc451..26daabc 100644
> --- a/gcc/tree-ssa-loop-ivopts.c
> +++ b/gcc/tree-ssa-loop-ivopts.c
> @@ -5818,14 +5818,6 @@ determine_iv_cost (struct ivopts_data *data, struct iv_cand *cand)
>
>    cost = cost_step + adjust_setup_cost (data, cost_base.cost);
>
> -  /* Prefer the original ivs unless we may gain something by replacing it.
> -     The reason is to make debugging simpler; so this is not relevant for
> -     artificial ivs created by other optimization passes.  */
> -  if (cand->pos != IP_ORIGINAL
> -      || !SSA_NAME_VAR (cand->var_before)
> -      || DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before)))
> -    cost++;
> -
>    /* Prefer not to insert statements into latch unless there are some
>       already (so that we do not create unnecessary jumps).  */
>    if (cand->pos == IP_END
diff mbox

Patch

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 98dc451..26daabc 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5818,14 +5818,6 @@  determine_iv_cost (struct ivopts_data *data, struct iv_cand *cand)
 
   cost = cost_step + adjust_setup_cost (data, cost_base.cost);
 
-  /* Prefer the original ivs unless we may gain something by replacing it.
-     The reason is to make debugging simpler; so this is not relevant for
-     artificial ivs created by other optimization passes.  */
-  if (cand->pos != IP_ORIGINAL
-      || !SSA_NAME_VAR (cand->var_before)
-      || DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before)))
-    cost++;
-
   /* Prefer not to insert statements into latch unless there are some
      already (so that we do not create unnecessary jumps).  */
   if (cand->pos == IP_END