diff mbox

[combine] Try REG_EQUAL for nonzero_bits

Message ID 000001d00546$33445430$99ccfc90$@arm.com
State New
Headers show

Commit Message

Zhenqiang Chen Nov. 21, 2014, 4:47 a.m. UTC
Hi,

The patch tries to use REG_EQUAL to get more precise info for nonzero_bits,
which helps to remove unnecessary zero_extend.

Here is an example when compiling Coremark, we have rtx like,

(insn 1244 386 388 47 (set (reg:SI 263 [ D.5767 ])
        (reg:SI 384 [ D.5767 ])) 786 {*thumb2_movsi_insn}
     (expr_list:REG_EQUAL (zero_extend:SI (mem:QI (reg/v/f:SI 271 [ memblock
]) [0 *memblock_13(D)+0 S1 A8]))
        (nil)))

from "reg:SI 384", we can only know it is a 32-bit value. But from
REG_EQUAL, we can know it is an 8-bit value. Then for the following rtx seq,

(insn 409 407 410 50 (set (reg:SI 308)
        (plus:SI (reg:SI 263 [ D.5767 ])
            (const_int -48 [0xffffffffffffffd0]))) core_state.c:170 4
{*arm_addsi3}
     (nil))
(insn 410 409 411 50 (set (reg:SI 309)
        (zero_extend:SI (subreg:QI (reg:SI 308) 0))) core_state.c:170 812
{thumb2_zero_extendqisi2_v6}
     (expr_list:REG_DEAD (reg:SI 308)
        (nil)))

the zero_extend for r309 can be optimized by combine pass.

Bootstrap and no make check regression on X86-64.
No make check regression on Cortex-M4 qemu.
No Spec2K INT regression on X86-64 and Cortex-A15 with -O3.
Coremark on Cortex-M7 is 0.3% better.
Coremark on Cortex-M4 is 0.07% regression due to alignment change.
No Coremark change on Corter-M0 and Cortex-A15.

Unfortunately I failed to generate a meaningful small case for it. So no
test case is included in the patch.

Ok for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-11-21  Zhenqiang Chen  <zhenqiang.chen@arm.com>

	* combine.c (set_nonzero_bits_and_sign_copies): Try REG_EQUAL note.

 	  if (rsp->sign_bit_copies == 0
 	      || rsp->sign_bit_copies > num)

Comments

Eric Botcazou Nov. 22, 2014, 10:15 a.m. UTC | #1
> The patch tries to use REG_EQUAL to get more precise info for nonzero_bits,
> which helps to remove unnecessary zero_extend.
> 
> Here is an example when compiling Coremark, we have rtx like,
> 
> (insn 1244 386 388 47 (set (reg:SI 263 [ D.5767 ])
>         (reg:SI 384 [ D.5767 ])) 786 {*thumb2_movsi_insn}
>      (expr_list:REG_EQUAL (zero_extend:SI (mem:QI (reg/v/f:SI 271 [ memblock
> ]) [0 *memblock_13(D)+0 S1 A8]))
>         (nil)))
> 
> from "reg:SI 384", we can only know it is a 32-bit value. But from
> REG_EQUAL, we can know it is an 8-bit value. Then for the following rtx seq,
> 
> (insn 409 407 410 50 (set (reg:SI 308)
>         (plus:SI (reg:SI 263 [ D.5767 ])
>             (const_int -48 [0xffffffffffffffd0]))) core_state.c:170 4
> {*arm_addsi3}
>      (nil))
> (insn 410 409 411 50 (set (reg:SI 309)
>         (zero_extend:SI (subreg:QI (reg:SI 308) 0))) core_state.c:170 812
> {thumb2_zero_extendqisi2_v6}
>      (expr_list:REG_DEAD (reg:SI 308)
>         (nil)))
> 
> the zero_extend for r309 can be optimized by combine pass.

This sounds like a good idea.

> 2014-11-21  Zhenqiang Chen  <zhenqiang.chen@arm.com>
> 
> 	* combine.c (set_nonzero_bits_and_sign_copies): Try REG_EQUAL note.
> 
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 6a7d16b..68a883b 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -1713,7 +1713,15 @@ set_nonzero_bits_and_sign_copies (rtx x, const_rtx
> set, void *data)
> 
>  	  /* Don't call nonzero_bits if it cannot change anything.  */
>  	  if (rsp->nonzero_bits != ~(unsigned HOST_WIDE_INT) 0)
> -	    rsp->nonzero_bits |= nonzero_bits (src, nonzero_bits_mode);
> +	    {
> +	      rtx reg_equal = insn ? find_reg_note (insn, REG_EQUAL,
> NULL_RTX)
> +				     : NULL_RTX;
> +	      if (reg_equal)
> +		rsp->nonzero_bits |= nonzero_bits (XEXP (reg_equal, 0),
> +						   nonzero_bits_mode);
> +	      else
> +		rsp->nonzero_bits |= nonzero_bits (src, nonzero_bits_mode);
> +	    }
>  	  num = num_sign_bit_copies (SET_SRC (set), GET_MODE (x));
>  	  if (rsp->sign_bit_copies == 0
> 
>  	      || rsp->sign_bit_copies > num)

Use find_reg_equal_equiv_note instead.  Are you sure that this won't yield 
inferior results in very peculiar cases?  IOW, why not use both sources?

Note that 'src' is massaged just above if SHORT_IMMEDIATES_SIGN_EXTEND is 
defined so we should probably do it for the note datum too, for example by 
factoring the code into a function and invoking it.

Why not do the same for num_sign_bit_copies?
Zhenqiang Chen Nov. 24, 2014, 8:21 a.m. UTC | #2
> -----Original Message-----
> From: Eric Botcazou [mailto:ebotcazou@adacore.com]
> Sent: Saturday, November 22, 2014 6:15 PM
> To: Zhenqiang Chen
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH, combine] Try REG_EQUAL for nonzero_bits
> 
> > The patch tries to use REG_EQUAL to get more precise info for
> > nonzero_bits, which helps to remove unnecessary zero_extend.
> >
> > Here is an example when compiling Coremark, we have rtx like,
> >
> > (insn 1244 386 388 47 (set (reg:SI 263 [ D.5767 ])
> >         (reg:SI 384 [ D.5767 ])) 786 {*thumb2_movsi_insn}
> >      (expr_list:REG_EQUAL (zero_extend:SI (mem:QI (reg/v/f:SI 271 [
> > memblock
> > ]) [0 *memblock_13(D)+0 S1 A8]))
> >         (nil)))
> >
> > from "reg:SI 384", we can only know it is a 32-bit value. But from
> > REG_EQUAL, we can know it is an 8-bit value. Then for the following
> > rtx seq,
> >
> > (insn 409 407 410 50 (set (reg:SI 308)
> >         (plus:SI (reg:SI 263 [ D.5767 ])
> >             (const_int -48 [0xffffffffffffffd0]))) core_state.c:170 4
> > {*arm_addsi3}
> >      (nil))
> > (insn 410 409 411 50 (set (reg:SI 309)
> >         (zero_extend:SI (subreg:QI (reg:SI 308) 0))) core_state.c:170
> > 812 {thumb2_zero_extendqisi2_v6}
> >      (expr_list:REG_DEAD (reg:SI 308)
> >         (nil)))
> >
> > the zero_extend for r309 can be optimized by combine pass.
> 
> This sounds like a good idea.
> 
> > 2014-11-21  Zhenqiang Chen  <zhenqiang.chen@arm.com>
> >
> > 	* combine.c (set_nonzero_bits_and_sign_copies): Try REG_EQUAL
> note.
> >
> > diff --git a/gcc/combine.c b/gcc/combine.c index 6a7d16b..68a883b
> > 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -1713,7 +1713,15 @@ set_nonzero_bits_and_sign_copies (rtx x,
> > const_rtx set, void *data)
> >
> >  	  /* Don't call nonzero_bits if it cannot change anything.  */
> >  	  if (rsp->nonzero_bits != ~(unsigned HOST_WIDE_INT) 0)
> > -	    rsp->nonzero_bits |= nonzero_bits (src, nonzero_bits_mode);
> > +	    {
> > +	      rtx reg_equal = insn ? find_reg_note (insn, REG_EQUAL,
> > NULL_RTX)
> > +				     : NULL_RTX;
> > +	      if (reg_equal)
> > +		rsp->nonzero_bits |= nonzero_bits (XEXP (reg_equal, 0),
> > +						   nonzero_bits_mode);
> > +	      else
> > +		rsp->nonzero_bits |= nonzero_bits (src,
> nonzero_bits_mode);
> > +	    }
> >  	  num = num_sign_bit_copies (SET_SRC (set), GET_MODE (x));
> >  	  if (rsp->sign_bit_copies == 0
> >
> >  	      || rsp->sign_bit_copies > num)
> 
> Use find_reg_equal_equiv_note instead.  Are you sure that this won't yield
> inferior results in very peculiar cases?  IOW, why not use both sources?

Thanks for the comments. I will compare the two nonzero_bits from src and
REG_EQUAL. Then select the smaller one.
 
> Note that 'src' is massaged just above if SHORT_IMMEDIATES_SIGN_EXTEND
> is defined so we should probably do it for the note datum too, for example
> by factoring the code into a function and invoking it.
> 
> Why not do the same for num_sign_bit_copies?

Do you know why it use " SET_SRC (set)" other than "src" for
num_sign_bit_copies?

If it is "src", I should do the same for num_sign_bit_copies with REG_EQUAL
info.

Thanks!
-Zhenqiang
Eric Botcazou Nov. 24, 2014, 9:41 a.m. UTC | #3
> Thanks for the comments. I will compare the two nonzero_bits from src and
> REG_EQUAL. Then select the smaller one.

They are masks so you can simply AND them before ORing the result.

> Do you know why it use " SET_SRC (set)" other than "src" for
> num_sign_bit_copies?
> 
> If it is "src", I should do the same for num_sign_bit_copies with REG_EQUAL
> info.

Probably historical reasons, let's not try to change that now.  You can apply 
the same treatment to num_sign_bit_copies (you will need a comparison here) 
while preserving the "src" vs "SET_SRC (set)" discrepancy.
diff mbox

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index 6a7d16b..68a883b 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -1713,7 +1713,15 @@  set_nonzero_bits_and_sign_copies (rtx x, const_rtx
set, void *data)
 
 	  /* Don't call nonzero_bits if it cannot change anything.  */
 	  if (rsp->nonzero_bits != ~(unsigned HOST_WIDE_INT) 0)
-	    rsp->nonzero_bits |= nonzero_bits (src, nonzero_bits_mode);
+	    {
+	      rtx reg_equal = insn ? find_reg_note (insn, REG_EQUAL,
NULL_RTX)
+				     : NULL_RTX;
+	      if (reg_equal)
+		rsp->nonzero_bits |= nonzero_bits (XEXP (reg_equal, 0),
+						   nonzero_bits_mode);
+	      else
+		rsp->nonzero_bits |= nonzero_bits (src, nonzero_bits_mode);
+	    }
 	  num = num_sign_bit_copies (SET_SRC (set), GET_MODE (x));