Patchwork [RFC] make lra.c:check_rtl set maybe_hot_insn_p

login
register
mail settings
Submitter Steven Bosscher
Date Nov. 9, 2013, 7:01 p.m.
Message ID <CABu31nPu+1QxCQDv71C=3V_qS61B3GkAd37Eh7WTZMqBTann_w@mail.gmail.com>
Download mbox | patch
Permalink /patch/290006/
State New
Headers show

Comments

Steven Bosscher - Nov. 9, 2013, 7:01 p.m.
Hello,

This patch is necessary to make ARM pass the test suite with LRA
enabled. The symptom is recog failing to recognize a store_minmaxsi
insn, see:
 http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00725.html

But I am not sure if that's also the root cause of the problem, or
whether the ARM back end should not let recognition of insn patterns
be dependent on the state of the profile flags.

The pattern for store_minmaxsi (in arm.md) is:

(define_insn "*store_minmaxsi"
  [(set (match_operand:SI 0 "memory_operand" "=m")
        (match_operator:SI 3 "minmax_operator"
         [(match_operand:SI 1 "s_register_operand" "r")
          (match_operand:SI 2 "s_register_operand" "r")]))
   (clobber (reg:CC CC_REGNUM))]
  "TARGET_32BIT && optimize_insn_for_size_p()"
  "*
  operands[3] = gen_rtx_fmt_ee (minmax_code (operands[3]), SImode,
                                operands[1], operands[2]);
  output_asm_insn (\"cmp\\t%1, %2\", operands);
  if (TARGET_THUMB2)
    output_asm_insn (\"ite\t%d3\", operands);
  output_asm_insn (\"str%d3\\t%1, %0\", operands);
  output_asm_insn (\"str%D3\\t%2, %0\", operands);
  return \"\";
  "
  [(set_attr "conds" "clob")
   (set (attr "length")
        (if_then_else (eq_attr "is_thumb" "yes")
                      (const_int 14)
                      (const_int 12)))
   (set_attr "type" "store1")]
)


Note the insn condition uses optimize_insn_for_size_p(). This means
the pattern can be valid or invalid dependent on the context where the
insn appears: in hot or cold code. IMHO this behavior should not be
allowed. The back end cannot expect the middle end to know at all
times the context that the insn appears in, and more importantly
whether a pattern is valid or not is independent of where the insn
appears: That is a *cost* issue!

It seems to me that the ARM back end should be fixed here, not LRA's check_rtl.

Comments&thoughts?

Ciao!
Steven
* lra.c (check_rtl): Call rtl_profile_for_bb so that recog
	can use cfun->maybe_hot_insn_p.
Richard Guenther - Nov. 10, 2013, 1:17 p.m.
Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>Hello,
>
>This patch is necessary to make ARM pass the test suite with LRA
>enabled. The symptom is recog failing to recognize a store_minmaxsi
>insn, see:
> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00725.html
>
>But I am not sure if that's also the root cause of the problem, or
>whether the ARM back end should not let recognition of insn patterns
>be dependent on the state of the profile flags.
>
>The pattern for store_minmaxsi (in arm.md) is:
>
>(define_insn "*store_minmaxsi"
>  [(set (match_operand:SI 0 "memory_operand" "=m")
>        (match_operator:SI 3 "minmax_operator"
>         [(match_operand:SI 1 "s_register_operand" "r")
>          (match_operand:SI 2 "s_register_operand" "r")]))
>   (clobber (reg:CC CC_REGNUM))]
>  "TARGET_32BIT && optimize_insn_for_size_p()"
>  "*
>  operands[3] = gen_rtx_fmt_ee (minmax_code (operands[3]), SImode,
>                                operands[1], operands[2]);
>  output_asm_insn (\"cmp\\t%1, %2\", operands);
>  if (TARGET_THUMB2)
>    output_asm_insn (\"ite\t%d3\", operands);
>  output_asm_insn (\"str%d3\\t%1, %0\", operands);
>  output_asm_insn (\"str%D3\\t%2, %0\", operands);
>  return \"\";
>  "
>  [(set_attr "conds" "clob")
>   (set (attr "length")
>        (if_then_else (eq_attr "is_thumb" "yes")
>                      (const_int 14)
>                      (const_int 12)))
>   (set_attr "type" "store1")]
>)
>
>
>Note the insn condition uses optimize_insn_for_size_p(). This means
>the pattern can be valid or invalid dependent on the context where the
>insn appears: in hot or cold code. IMHO this behavior should not be
>allowed. The back end cannot expect the middle end to know at all
>times the context that the insn appears in, and more importantly
>whether a pattern is valid or not is independent of where the insn
>appears: That is a *cost* issue!
>
>It seems to me that the ARM back end should be fixed here, not LRA's
>check_rtl.
>
>Comments&thoughts?

The intent is to allow this also to control combine and split, but certainly making insns invalid after the fact is bad.  think of sinking a previously hot insn into a cold path...

Honza, how was this supposed to work?

Richard.

>Ciao!
>Steven
Yvan Roux - Nov. 15, 2013, 2:19 p.m.
Hi,

I'm agree.  I looked at the ARM backend and it occurs that the usage
of optimize_insn_for_size_p() was added to only use store_minmax in
cold path because of some performance issue.  But in any case its
usage doesn't shrink the number of instruction, if we are in ARM mode
3 are needed : 1 comparison, 1 conditional move and 1 store in O1, O2
and O3 and 1 comparison and 2 conditional sotres in Os :

cmp     ip, r0
movlt   r0, ip
str        r0, [lr, r3]

or

cmp   r5, r4
strle   r5, [lr, r3]
strgt   r4, [lr, r3]

and in Thumb mode 4 are needed in both cases because of the it
insertion.  Thus I think that this insn can be removed. Is it ok for
you ?

Thanks
Yvan


On 10 November 2013 14:17, Richard Biener <richard.guenther@gmail.com> wrote:
> Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>>Hello,
>>
>>This patch is necessary to make ARM pass the test suite with LRA
>>enabled. The symptom is recog failing to recognize a store_minmaxsi
>>insn, see:
>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00725.html
>>
>>But I am not sure if that's also the root cause of the problem, or
>>whether the ARM back end should not let recognition of insn patterns
>>be dependent on the state of the profile flags.
>>
>>The pattern for store_minmaxsi (in arm.md) is:
>>
>>(define_insn "*store_minmaxsi"
>>  [(set (match_operand:SI 0 "memory_operand" "=m")
>>        (match_operator:SI 3 "minmax_operator"
>>         [(match_operand:SI 1 "s_register_operand" "r")
>>          (match_operand:SI 2 "s_register_operand" "r")]))
>>   (clobber (reg:CC CC_REGNUM))]
>>  "TARGET_32BIT && optimize_insn_for_size_p()"
>>  "*
>>  operands[3] = gen_rtx_fmt_ee (minmax_code (operands[3]), SImode,
>>                                operands[1], operands[2]);
>>  output_asm_insn (\"cmp\\t%1, %2\", operands);
>>  if (TARGET_THUMB2)
>>    output_asm_insn (\"ite\t%d3\", operands);
>>  output_asm_insn (\"str%d3\\t%1, %0\", operands);
>>  output_asm_insn (\"str%D3\\t%2, %0\", operands);
>>  return \"\";
>>  "
>>  [(set_attr "conds" "clob")
>>   (set (attr "length")
>>        (if_then_else (eq_attr "is_thumb" "yes")
>>                      (const_int 14)
>>                      (const_int 12)))
>>   (set_attr "type" "store1")]
>>)
>>
>>
>>Note the insn condition uses optimize_insn_for_size_p(). This means
>>the pattern can be valid or invalid dependent on the context where the
>>insn appears: in hot or cold code. IMHO this behavior should not be
>>allowed. The back end cannot expect the middle end to know at all
>>times the context that the insn appears in, and more importantly
>>whether a pattern is valid or not is independent of where the insn
>>appears: That is a *cost* issue!
>>
>>It seems to me that the ARM back end should be fixed here, not LRA's
>>check_rtl.
>>
>>Comments&thoughts?
>
> The intent is to allow this also to control combine and split, but certainly making insns invalid after the fact is bad.  think of sinking a previously hot insn into a cold path...
>
> Honza, how was this supposed to work?
>
> Richard.
>
>>Ciao!
>>Steven
>
>
Richard Earnshaw - Nov. 15, 2013, 2:22 p.m.
On 15/11/13 14:19, Yvan Roux wrote:
> Hi,
> 
> I'm agree.  I looked at the ARM backend and it occurs that the usage
> of optimize_insn_for_size_p() was added to only use store_minmax in
> cold path because of some performance issue.  But in any case its
> usage doesn't shrink the number of instruction, if we are in ARM mode
> 3 are needed : 1 comparison, 1 conditional move and 1 store in O1, O2
> and O3 and 1 comparison and 2 conditional sotres in Os :
> 
> cmp     ip, r0
> movlt   r0, ip
> str        r0, [lr, r3]
> 

Sometimes 4 will be needed, since both original register values may
remain live.

However, I'm inclined to agree that while it should be possible to
decide at the *function* level whether or not an insn is valid, doing so
at the block level is probably unsafe.

R.

> or
> 
> cmp   r5, r4
> strle   r5, [lr, r3]
> strgt   r4, [lr, r3]
> 
> and in Thumb mode 4 are needed in both cases because of the it
> insertion.  Thus I think that this insn can be removed. Is it ok for
> you ?
> 
> Thanks
> Yvan
> 
> 
> On 10 November 2013 14:17, Richard Biener <richard.guenther@gmail.com> wrote:
>> Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>>> Hello,
>>>
>>> This patch is necessary to make ARM pass the test suite with LRA
>>> enabled. The symptom is recog failing to recognize a store_minmaxsi
>>> insn, see:
>>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00725.html
>>>
>>> But I am not sure if that's also the root cause of the problem, or
>>> whether the ARM back end should not let recognition of insn patterns
>>> be dependent on the state of the profile flags.
>>>
>>> The pattern for store_minmaxsi (in arm.md) is:
>>>
>>> (define_insn "*store_minmaxsi"
>>>  [(set (match_operand:SI 0 "memory_operand" "=m")
>>>        (match_operator:SI 3 "minmax_operator"
>>>         [(match_operand:SI 1 "s_register_operand" "r")
>>>          (match_operand:SI 2 "s_register_operand" "r")]))
>>>   (clobber (reg:CC CC_REGNUM))]
>>>  "TARGET_32BIT && optimize_insn_for_size_p()"
>>>  "*
>>>  operands[3] = gen_rtx_fmt_ee (minmax_code (operands[3]), SImode,
>>>                                operands[1], operands[2]);
>>>  output_asm_insn (\"cmp\\t%1, %2\", operands);
>>>  if (TARGET_THUMB2)
>>>    output_asm_insn (\"ite\t%d3\", operands);
>>>  output_asm_insn (\"str%d3\\t%1, %0\", operands);
>>>  output_asm_insn (\"str%D3\\t%2, %0\", operands);
>>>  return \"\";
>>>  "
>>>  [(set_attr "conds" "clob")
>>>   (set (attr "length")
>>>        (if_then_else (eq_attr "is_thumb" "yes")
>>>                      (const_int 14)
>>>                      (const_int 12)))
>>>   (set_attr "type" "store1")]
>>> )
>>>
>>>
>>> Note the insn condition uses optimize_insn_for_size_p(). This means
>>> the pattern can be valid or invalid dependent on the context where the
>>> insn appears: in hot or cold code. IMHO this behavior should not be
>>> allowed. The back end cannot expect the middle end to know at all
>>> times the context that the insn appears in, and more importantly
>>> whether a pattern is valid or not is independent of where the insn
>>> appears: That is a *cost* issue!
>>>
>>> It seems to me that the ARM back end should be fixed here, not LRA's
>>> check_rtl.
>>>
>>> Comments&thoughts?
>>
>> The intent is to allow this also to control combine and split, but certainly making insns invalid after the fact is bad.  think of sinking a previously hot insn into a cold path...
>>
>> Honza, how was this supposed to work?
>>
>> Richard.
>>
>>> Ciao!
>>> Steven
>>
>>
>

Patch

Index: lra.c
===================================================================
--- lra.c	(revision 204618)
+++ lra.c	(working copy)
@@ -2022,26 +2022,32 @@  check_rtl (bool final_p)
 
   lra_assert (! final_p || reload_completed);
   FOR_EACH_BB (bb)
-    FOR_BB_INSNS (bb, insn)
-    if (NONDEBUG_INSN_P (insn)
-	&& GET_CODE (PATTERN (insn)) != USE
-	&& GET_CODE (PATTERN (insn)) != CLOBBER
-	&& GET_CODE (PATTERN (insn)) != ASM_INPUT)
-      {
-	if (final_p)
-	  {
-	    extract_insn (insn);
-	    lra_assert (constrain_operands (1));
-	    continue;
-	  }
-	/* LRA code is based on assumption that all addresses can be
-	   correctly decomposed.  LRA can generate reloads for
-	   decomposable addresses.  The decomposition code checks the
-	   correctness of the addresses.  So we don't need to check
-	   the addresses here.  */
-	if (insn_invalid_p (insn, false))
-	  fatal_insn_not_found (insn);
-      }
+    {
+      /* Whether an insn is recognized or not may depend on the hotness of
+	 the insn: Some back ends use it as a layman's "enable" attribute.  */
+      rtl_profile_for_bb (bb);
+
+      FOR_BB_INSNS (bb, insn)
+	if (NONDEBUG_INSN_P (insn)
+	    && GET_CODE (PATTERN (insn)) != USE
+	    && GET_CODE (PATTERN (insn)) != CLOBBER
+	    && GET_CODE (PATTERN (insn)) != ASM_INPUT)
+	  {
+	    if (final_p)
+	      {
+		extract_insn (insn);
+		lra_assert (constrain_operands (1));
+		continue;
+	      }
+	    /* LRA code is based on assumption that all addresses can be
+	       correctly decomposed.  LRA can generate reloads for
+	       decomposable addresses.  The decomposition code checks the
+	       correctness of the addresses.  So we don't need to check
+	       the addresses here.  */
+	    if (insn_invalid_p (insn, false))
+	      fatal_insn_not_found (insn);
+	  }
+    }
 }
 #endif /* #ifdef ENABLE_CHECKING */