diff mbox

Improve prepare_shrink_wrap to sink more instructions

Message ID 5416F8AA.7060206@arm.com
State New
Headers show

Commit Message

Jiong Wang Sept. 15, 2014, 2:33 p.m. UTC
On 11/09/14 21:39, Jeff Law wrote:
> On 09/08/14 07:57, Jiong Wang wrote:
>>> Conceptually OK.  Some questions/concerns about the implementation.
>> Hi Jeff,
>>
>>     thanks very much for your review.
>>> It seems to me that what you're trying to describe on the RHS is
>>> REG_P || CONSTANT_P
>>     yes.
>>
>>     and actually I am trying to detect all rtx which contains any number
>>     of RTX_CONST_OBJs and no more than one REG.
>>> Note that CONSTANT_P will catch things like (high (symbol_ref)) and
>>> (lo_sum (reg) (symbol_ref)) which are often used to build addresses on
>>> some targets.
>>>
>>> With that in mind, rather than using a for_each_rtx callback, test
>>> if (REG_P (src) || CONSTANT_P (src))
>>>
>>> Note that SRC, when it is a CONSTANT_P,  may have a variety of forms
>>> with embedded registers.  You'll need to verify that all of those
>>> registers are not assigned after the insn with the CONSTANT_P source
>>> operand.  Right now you only perform that check when SRC is a REG_P.
>>     I am using the for_each_rtx because I want to scan "src" so that
>> any sub-operands are checked,  the number of REG and non-constant
>> objects are record in "reg_found" and "nonconst_found".  the embedded
>> register found also record in the "reg" field of the structure
>> "rtx_search_arg",
> Constants in this context are going to satisfy CONSTANT_P, you don't
> need to manually verify them.  It will include simple constants and
> constants which are built up out of multiple instructions (symbolic
> constants in particular).

Jeff,

  thanks, I partially understand your meaning here.

   take the function "ira_implicitly_set_insn_hard_regs" in ira-lives.c for example,

   when generating address rtl, gcc will automatically generate "const" operator to prefix
   the address expression, like the following. so a simple CONSTANT_P check is enough in
   case there is no embedded register.

   (insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94])
         (const:DI (plus:DI (symbol_ref:DI ("recog_data") [flags 0x40]  <var_decl 0x2b2c2ff91510 recog_data>)
                 (const_int 480 [0x1e0])))) -1


   but for architecture like aarch64, the following instruction sequences to forming address
   may be generated

(insn 73 14 74 4 (set (reg/f:DI 20 x20 [99])
         (high:DI (symbol_ref:DI ("global_a") [flags 0xc0]  <var_decl 0x7ff755a60900 stats>))) 35 {*movdi_aarch64}
      (expr_list:REG_EQUIV (high:DI (symbol_ref:DI ("global_a") [flags 0xc0]  <var_decl 0x7ff755a60900 stats>))
         (nil)))

(insn 17 30 25 5 (set (reg/f:DI 4 x4 [83])
         (lo_sum:DI (reg/f:DI 20 x20 [99])
             (symbol_ref:DI ("global_a") [flags 0xc0]  <var_decl 0x7ff755a60900 stats>))) {add_losym_di}
      (expr_list:REG_EQUIV (symbol_ref:DI ("global_a") [flags 0xc0]  <var_decl 0x7ff755a60900 stats>)
         (nil)))

  while CONSTANT_P could not catch the latter lo_sum case, as the RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ,
  so the

  the manually verification

       FOR_EACH_SUBRTX_VAR (iter, array, src, ALL)
         {
           rtx x = *iter;
           if (REG_P (x))
             {
               reg_num++;
               src_inner = x;
             }
           else if (!CONSTANT_P (x) && OBJECT_P (x))
             nonconstobj_num++;
         }

  is to catch cases like the lo_sum insn above which contains no more than 1 register and only contains
  RTX_CONST_OBJ.

> I suspect you still need the callback to verify the # of registers is
> just 1 so that the later tests work.  However, please don't use
> for_each_rtx, we're moving away from that to a more efficient walker
> FOR_EACH_SUBRTX.

  
   fixed, patch updated, please review.

   thanks.

   BTW, when I verifying this patch on gilbc x86-64 build, I found there may be another hiding
   issue when calculating live_in when splitting edge in move_insn_for_shrink_wrap, I will reply
   on that historical email thread directly.

Regards,
Jiong

>
> Jeff
>
>
>

Comments

Jeff Law Sept. 19, 2014, 8:43 p.m. UTC | #1
On 09/15/14 08:33, Jiong Wang wrote:
>
> Jeff,
>
>   thanks, I partially understand your meaning here.
>
>    take the function "ira_implicitly_set_insn_hard_regs" in ira-lives.c
> for example,
>
>    when generating address rtl, gcc will automatically generate "const"
> operator to prefix
>    the address expression, like the following. so a simple CONSTANT_P
> check is enough in
>    case there is no embedded register.
>
>    (insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94])
>          (const:DI (plus:DI (symbol_ref:DI ("recog_data") [flags 0x40]
> <var_decl 0x2b2c2ff91510 recog_data>)
>                  (const_int 480 [0x1e0])))) -1
>
>
>    but for architecture like aarch64, the following instruction
> sequences to forming address
>    may be generated
>
> (insn 73 14 74 4 (set (reg/f:DI 20 x20 [99])
>          (high:DI (symbol_ref:DI ("global_a") [flags 0xc0]  <var_decl
> 0x7ff755a60900 stats>))) 35 {*movdi_aarch64}
>       (expr_list:REG_EQUIV (high:DI (symbol_ref:DI ("global_a") [flags
> 0xc0]  <var_decl 0x7ff755a60900 stats>))
>          (nil)))
>
> (insn 17 30 25 5 (set (reg/f:DI 4 x4 [83])
>          (lo_sum:DI (reg/f:DI 20 x20 [99])
>              (symbol_ref:DI ("global_a") [flags 0xc0]  <var_decl
> 0x7ff755a60900 stats>))) {add_losym_di}
>       (expr_list:REG_EQUIV (symbol_ref:DI ("global_a") [flags 0xc0]
> <var_decl 0x7ff755a60900 stats>)
>          (nil)))
>
>   while CONSTANT_P could not catch the latter lo_sum case, as the
> RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ,
Hmm, it's been ~15 years since I regularly worked on a target that uses 
HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST 
as well, but reading the docs for CONST, that clearly isn't the case.

Sorry for that.  Can you (re) send your current patch for this for review?

Jeff
Andrew Pinski Oct. 2, 2014, 8:49 a.m. UTC | #2
On Fri, Sep 19, 2014 at 1:43 PM, Jeff Law <law@redhat.com> wrote:
> On 09/15/14 08:33, Jiong Wang wrote:
>>
>>
>> Jeff,
>>
>>   thanks, I partially understand your meaning here.
>>
>>    take the function "ira_implicitly_set_insn_hard_regs" in ira-lives.c
>> for example,
>>
>>    when generating address rtl, gcc will automatically generate "const"
>> operator to prefix
>>    the address expression, like the following. so a simple CONSTANT_P
>> check is enough in
>>    case there is no embedded register.
>>
>>    (insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94])
>>          (const:DI (plus:DI (symbol_ref:DI ("recog_data") [flags 0x40]
>> <var_decl 0x2b2c2ff91510 recog_data>)
>>                  (const_int 480 [0x1e0])))) -1
>>
>>
>>    but for architecture like aarch64, the following instruction
>> sequences to forming address
>>    may be generated
>>
>> (insn 73 14 74 4 (set (reg/f:DI 20 x20 [99])
>>          (high:DI (symbol_ref:DI ("global_a") [flags 0xc0]  <var_decl
>> 0x7ff755a60900 stats>))) 35 {*movdi_aarch64}
>>       (expr_list:REG_EQUIV (high:DI (symbol_ref:DI ("global_a") [flags
>> 0xc0]  <var_decl 0x7ff755a60900 stats>))
>>          (nil)))
>>
>> (insn 17 30 25 5 (set (reg/f:DI 4 x4 [83])
>>          (lo_sum:DI (reg/f:DI 20 x20 [99])
>>              (symbol_ref:DI ("global_a") [flags 0xc0]  <var_decl
>> 0x7ff755a60900 stats>))) {add_losym_di}
>>       (expr_list:REG_EQUIV (symbol_ref:DI ("global_a") [flags 0xc0]
>> <var_decl 0x7ff755a60900 stats>)
>>          (nil)))
>>
>>   while CONSTANT_P could not catch the latter lo_sum case, as the
>> RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ,
>
> Hmm, it's been ~15 years since I regularly worked on a target that uses
> HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST as
> well, but reading the docs for CONST, that clearly isn't the case.

Could we add a check for lo_sum since it is an RTX_OBJ rather than
RTX_COMM_ARITH or RTX_BIN_ARITH?
I am testing the patch for that to fix the above issue.  It shows up
with the testcase Jiong added but only with -mabi=ilp32 enabled.

Thanks,
Andrew Pinski

>
> Sorry for that.  Can you (re) send your current patch for this for review?
>
> Jeff
Jiong Wang Oct. 2, 2014, 5:37 p.m. UTC | #3
On 02/10/14 09:49, Andrew Pinski wrote:
> On Fri, Sep 19, 2014 at 1:43 PM, Jeff Law <law@redhat.com> wrote:
>> On 09/15/14 08:33, Jiong Wang wrote:
>>>
>>> Jeff,
>>>
>>>    thanks, I partially understand your meaning here.
>>>
>>>     take the function "ira_implicitly_set_insn_hard_regs" in ira-lives.c
>>> for example,
>>>
>>>     when generating address rtl, gcc will automatically generate "const"
>>> operator to prefix
>>>     the address expression, like the following. so a simple CONSTANT_P
>>> check is enough in
>>>     case there is no embedded register.
>>>
>>>     (insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94])
>>>           (const:DI (plus:DI (symbol_ref:DI ("recog_data") [flags 0x40]
>>> <var_decl 0x2b2c2ff91510 recog_data>)
>>>                   (const_int 480 [0x1e0])))) -1
>>>
>>>
>>>     but for architecture like aarch64, the following instruction
>>> sequences to forming address
>>>     may be generated
>>>
>>> (insn 73 14 74 4 (set (reg/f:DI 20 x20 [99])
>>>           (high:DI (symbol_ref:DI ("global_a") [flags 0xc0]  <var_decl
>>> 0x7ff755a60900 stats>))) 35 {*movdi_aarch64}
>>>        (expr_list:REG_EQUIV (high:DI (symbol_ref:DI ("global_a") [flags
>>> 0xc0]  <var_decl 0x7ff755a60900 stats>))
>>>           (nil)))
>>>
>>> (insn 17 30 25 5 (set (reg/f:DI 4 x4 [83])
>>>           (lo_sum:DI (reg/f:DI 20 x20 [99])
>>>               (symbol_ref:DI ("global_a") [flags 0xc0]  <var_decl
>>> 0x7ff755a60900 stats>))) {add_losym_di}
>>>        (expr_list:REG_EQUIV (symbol_ref:DI ("global_a") [flags 0xc0]
>>> <var_decl 0x7ff755a60900 stats>)
>>>           (nil)))
>>>
>>>    while CONSTANT_P could not catch the latter lo_sum case, as the
>>> RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ,
>> Hmm, it's been ~15 years since I regularly worked on a target that uses
>> HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST as
>> well, but reading the docs for CONST, that clearly isn't the case.
> Could we add a check for lo_sum since it is an RTX_OBJ rather than
> RTX_COMM_ARITH or RTX_BIN_ARITH?

thanks, agree, that's exactly what I want to catch, while missed it during the patch re-write. I was been stupid!
anyway, I think we need to solve pr63404, https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02568.html firstly, as it's a correctness regression.

--
Jiong

> I am testing the patch for that to fix the above issue.  It shows up
> with the testcase Jiong added but only with -mabi=ilp32 enabled.
>
> Thanks,
> Andrew Pinski
>
>> Sorry for that.  Can you (re) send your current patch for this for review?
>>
>> Jeff
diff mbox

Patch

diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index fd24135..739e957 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -53,6 +53,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "bb-reorder.h"
 #include "shrink-wrap.h"
 #include "regcprop.h"
+#include "rtl-iter.h"
 
 #ifdef HAVE_simple_return
 
@@ -169,7 +170,9 @@  move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 {
   rtx set, src, dest;
   bitmap live_out, live_in, bb_uses, bb_defs;
-  unsigned int i, dregno, end_dregno, sregno, end_sregno;
+  unsigned int i, dregno, end_dregno;
+  unsigned int sregno = FIRST_PSEUDO_REGISTER;
+  unsigned int end_sregno = FIRST_PSEUDO_REGISTER;
   basic_block next_block;
   edge live_edge;
 
@@ -179,7 +182,34 @@  move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
     return false;
   src = SET_SRC (set);
   dest = SET_DEST (set);
-  if (!REG_P (dest) || !REG_P (src)
+
+  if (!REG_P (src))
+    {
+      unsigned int reg_num = 0;
+      unsigned int nonconstobj_num = 0;
+      rtx src_inner = NULL_RTX;
+
+      subrtx_var_iterator::array_type array;
+      FOR_EACH_SUBRTX_VAR (iter, array, src, ALL)
+	{
+	  rtx x = *iter;
+	  if (REG_P (x))
+	    {
+	      reg_num++;
+	      src_inner = x;
+	    }
+	  else if (!CONSTANT_P (x) && OBJECT_P (x))
+	    nonconstobj_num++;
+	}
+
+      if (nonconstobj_num > 0
+	  || reg_num > 1)
+	src = NULL_RTX;
+      else if (reg_num == 1)
+	src = src_inner;
+    }
+
+  if (!REG_P (dest) || src == NULL_RTX
       /* STACK or FRAME related adjustment might be part of prologue.
 	 So keep them in the entry block.  */
       || dest == stack_pointer_rtx
@@ -188,10 +218,13 @@  move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
     return false;
 
   /* Make sure that the source register isn't defined later in BB.  */
-  sregno = REGNO (src);
-  end_sregno = END_REGNO (src);
-  if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno))
-    return false;
+  if (REG_P (src))
+    {
+      sregno = REGNO (src);
+      end_sregno = END_REGNO (src);
+      if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno))
+	return false;
+    }
 
   /* Make sure that the destination register isn't referenced later in BB.  */
   dregno = REGNO (dest);