Message ID | 20111012223450.GA11516@intel.com |
---|---|
State | New |
Headers | show |
> X86 backend doesn't accept the new expression as valid address while > (zero_extend:DI) works just fine. This patches keeps ZERO_EXTEND > when zero-extending address to Pmode. It reduces number of lea from > 24173 to 21428 in x32 libgfortran.so. Does it make any senses? I'd be inclined to have the x86 backend accept combine's canonicalized form rather than doing a patch such as this.
On Wed, Oct 12, 2011 at 3:40 PM, Richard Kenner <kenner@vlsi1.ultra.nyu.edu> wrote: >> X86 backend doesn't accept the new expression as valid address while >> (zero_extend:DI) works just fine. This patches keeps ZERO_EXTEND >> when zero-extending address to Pmode. It reduces number of lea from >> 24173 to 21428 in x32 libgfortran.so. Does it make any senses? > > I'd be inclined to have the x86 backend accept combine's canonicalized > form rather than doing a patch such as this. > The address format generated by combine is very unusual in 2 aspecst: 1. The placement of subreg in (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) isn't supported by x86 backend. 2. The biggest problem is optimizing mask 0xffffffff to 0xfffffffc by keeping track of non-zero bits in registers. X86 backend doesn't have such information to know ADDR & 0xfffffffc == ADDR & 0xffffffff.
> 1. The placement of subreg in > > (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) > (const_int 4 [0x4])) 0) > (subreg:DI (reg:SI 106) 0)) > > isn't supported by x86 backend. That's easy to fix. > 2. The biggest problem is optimizing mask 0xffffffff to > 0xfffffffc by keeping track of non-zero bits in registers. > X86 backend doesn't have such information to know > ADDR & 0xfffffffc == ADDR & 0xffffffff. But this indeed isn't. I withdraw my comment. I still don't like the patch, but I'm no longer as familiar with the code as I used to be so can't suggest a replacement. Let's see what others think about it.
On 10/13/2011 01:04 AM, Richard Kenner wrote: > > I still don't like the patch, but I'm no longer as familiar with the code > as I used to be so can't suggest a replacement. Let's see what others > think about it. Same here, I don't like it but I hardly see any alternative. The only possibility could be to prevent calling expand_compound_operation completely for addresses. Richard, what do you think? Don't worry, combine hasn't changed much since your days. :) Paolo
> Same here, I don't like it but I hardly see any alternative. The only > possibility could be to prevent calling expand_compound_operation > completely for addresses. Richard, what do you think? Don't worry, > combine hasn't changed much since your days. :) The problem wasn't potential changes to combine, but my memory ... But now I refreshed it and understand the issue. expand_compound_operation and make_compond_operation were meant to be used as a pair. You first do the former, then see if the result can be simplified, then call the latter in case it couldn't. In the SET case, we call the latter in simplify_set. But there's also this code in combine_simplify_rtx: case MEM: /* Ensure that our address has any ASHIFTs converted to MULT in case address-recognizing predicates are called later. */ temp = make_compound_operation (XEXP (x, 0), MEM); SUBST (XEXP (x, 0), temp); break; THAT'S the code that should do the transformation that this patch contains. So I'd suggest doing some debugging and seeing why it isn't. This could just be a bug in make_compound_operation not handling the SUBREG.
On 10/13/2011 02:51 PM, Richard Kenner wrote: > case MEM: > /* Ensure that our address has any ASHIFTs converted to MULT in case > address-recognizing predicates are called later. */ > temp = make_compound_operation (XEXP (x, 0), MEM); > SUBST (XEXP (x, 0), temp); > break; > > THAT'S the code that should do the transformation that this patch contains. > So I'd suggest doing some debugging and seeing why it isn't. This could > just be a bug in make_compound_operation not handling the SUBREG. Or being fooled by the 0xfffffffc masking, perhaps. Paolo
> Or being fooled by the 0xfffffffc masking, perhaps.
No, I'm pretty sure that's NOT the case. The *whole point* of the
routine is to deal with that masking.
On Thu, Oct 13, 2011 at 7:14 AM, Richard Kenner <kenner@vlsi1.ultra.nyu.edu> wrote: >> Or being fooled by the 0xfffffffc masking, perhaps. > > No, I'm pretty sure that's NOT the case. The *whole point* of the > routine is to deal with that masking. > I got (gdb) step make_compound_operation (x=0x7ffff139c4c8, in_code=MEM) at /export/gnu/import/git/gcc/gcc/combine.c:7572 7572 enum rtx_code code = GET_CODE (x); (gdb) call debug_rtx (x) (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) (const_int 4294967292 [0xfffffffc])) and it produces (gdb) call debug_rtx (x) (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) (const_int 4294967292 [0xfffffffc])) at the end. make_compound_operation doesn't know how to restore ZERO_EXTEND. BTW, there is a small testcase at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696 You can reproduce it on Linux/x86-64.
> at the end. make_compound_operation doesn't know how to > restore ZERO_EXTEND. It does in general. See make_extraction, which it calls. The question is why it doesn't in this case. That's the bug.
On Thu, Oct 13, 2011 at 9:11 AM, Richard Kenner <kenner@vlsi1.ultra.nyu.edu> wrote: >> at the end. make_compound_operation doesn't know how to >> restore ZERO_EXTEND. > > It does in general. See make_extraction, which it calls. The question is > why it doesn't in this case. That's the bug. > It never calls make_extraction. There are several cases handled for AND operation. But (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) (const_int 4294967292 [0xfffffffc])) isn't one of them.
> It never calls make_extraction. There are several cases handled > for AND operation. But > > (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) > (const_int 4 [0x4])) 0) > (subreg:DI (reg:SI 106) 0)) > (const_int 4294967292 [0xfffffffc])) > > isn't one of them. Yes, clearly. Otherwise it would work! The correct fix for this problem is to make it to do that. That's where this needs to be fixed: in make_compound_operation.
On 10/13/2011 06:35 PM, Richard Kenner wrote: >> It never calls make_extraction. There are several cases handled >> for AND operation. But >> >> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) >> (const_int 4 [0x4])) 0) >> (subreg:DI (reg:SI 106) 0)) >> (const_int 4294967292 [0xfffffffc])) >> >> isn't one of them. > > Yes, clearly. Otherwise it would work! The correct fix for this problem > is to make it to do that. That's where this needs to be fixed: in > make_compound_operation. An and:DI is cheaper than a zero_extend:DI of an and:SI. So GCC is correct in not doing this transformation. I think adding a case to make_compound_operation that simply undoes the transformation (without calling make_extraction) is fine if you guard it with if (in_code == MEM). Paolo
> An and:DI is cheaper than a zero_extend:DI of an and:SI.
That depends strongly on the constants and whether the machine is 32-bit
or 64-bit.
But that's irrelevant in this case since the and:SI will be removed (it
reflects what already been done).
On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini <bonzini@gnu.org> wrote: > On 10/13/2011 06:35 PM, Richard Kenner wrote: >>> >>> It never calls make_extraction. There are several cases handled >>> for AND operation. But >>> >>> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) >>> (const_int 4 [0x4])) 0) >>> (subreg:DI (reg:SI 106) 0)) >>> (const_int 4294967292 [0xfffffffc])) >>> >>> isn't one of them. >> >> Yes, clearly. Otherwise it would work! The correct fix for this problem >> is to make it to do that. That's where this needs to be fixed: in >> make_compound_operation. > > An and:DI is cheaper than a zero_extend:DI of an and:SI. So GCC is correct > in not doing this transformation. I think adding a case to > make_compound_operation that simply undoes the transformation (without > calling make_extraction) is fine if you guard it with if (in_code == MEM). > We first expand zero_extend:DI address to and:DI and then try to restore zero_extend:DI. Why do we do this transformation to begin with?
On Thu, Oct 13, 2011 at 19:19, H.J. Lu <hjl.tools@gmail.com> wrote: > On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini <bonzini@gnu.org> wrote: >> On 10/13/2011 06:35 PM, Richard Kenner wrote: >>>> >>>> It never calls make_extraction. There are several cases handled >>>> for AND operation. But >>>> >>>> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) >>>> (const_int 4 [0x4])) 0) >>>> (subreg:DI (reg:SI 106) 0)) >>>> (const_int 4294967292 [0xfffffffc])) >>>> >>>> isn't one of them. >>> >>> Yes, clearly. Otherwise it would work! The correct fix for this problem >>> is to make it to do that. That's where this needs to be fixed: in >>> make_compound_operation. >> >> An and:DI is cheaper than a zero_extend:DI of an and:SI. So GCC is correct >> in not doing this transformation. I think adding a case to >> make_compound_operation that simply undoes the transformation (without >> calling make_extraction) is fine if you guard it with if (in_code == MEM). >> > > We first expand zero_extend:DI address to and:DI and then try > to restore zero_extend:DI. Why do we do this transformation > to begin with? Because outside of a MEM it may be beneficial _not_ to restore zero_extend:DI in this case (depending on rtx_costs). Paolo
On Thu, Oct 13, 2011 at 10:21 AM, Paolo Bonzini <bonzini@gnu.org> wrote: > On Thu, Oct 13, 2011 at 19:19, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini <bonzini@gnu.org> wrote: >>> On 10/13/2011 06:35 PM, Richard Kenner wrote: >>>>> >>>>> It never calls make_extraction. There are several cases handled >>>>> for AND operation. But >>>>> >>>>> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) >>>>> (const_int 4 [0x4])) 0) >>>>> (subreg:DI (reg:SI 106) 0)) >>>>> (const_int 4294967292 [0xfffffffc])) >>>>> >>>>> isn't one of them. >>>> >>>> Yes, clearly. Otherwise it would work! The correct fix for this problem >>>> is to make it to do that. That's where this needs to be fixed: in >>>> make_compound_operation. >>> >>> An and:DI is cheaper than a zero_extend:DI of an and:SI. So GCC is correct >>> in not doing this transformation. I think adding a case to >>> make_compound_operation that simply undoes the transformation (without >>> calling make_extraction) is fine if you guard it with if (in_code == MEM). >>> >> >> We first expand zero_extend:DI address to and:DI and then try >> to restore zero_extend:DI. Why do we do this transformation >> to begin with? > > Because outside of a MEM it may be beneficial _not_ to restore > zero_extend:DI in this case (depending on rtx_costs). > Why do we do it for MEM then?
On Thu, Oct 13, 2011 at 19:06, Richard Kenner <kenner@vlsi1.ultra.nyu.edu> wrote: >> An and:DI is cheaper than a zero_extend:DI of an and:SI. > > That depends strongly on the constants and whether the machine is 32-bit > or 64-bit. Yes, the rtx_costs take care of that. > But that's irrelevant in this case since the and:SI will be removed (it > reflects what already been done). Do you refer to this in make_extraction: /* See if this can be done without an extraction. We never can if the width of the field is not the same as that of some integer mode. For registers, we can only avoid the extraction if the position is at the low-order bit and this is either not in the destination or we have the appropriate STRICT_LOW_PART operation available. */ and this call to force_to_mode in particular: new_rtx = force_to_mode (inner, tmode, len >= HOST_BITS_PER_WIDE_INT ? ~(unsigned HOST_WIDE_INT) 0 : ((unsigned HOST_WIDE_INT) 1 << len) - 1, 0); and from there the call to simplify_and_const_int that does this: if (constop == nonzero) return varop; ? Then indeed it should work if you call make_extraction more greedily than what we do now (which is, just if the constant is one less than a power of two). The answer to H.J.'s "Why do we do it for MEM then?" is simply "because no one ever thought about not doing it" (because there are no other POINTERS_EXTEND_UNSIGNED == 1 machines). In fact it may even be advantageous to do it in general, even if in_code != MEM. Only experimentation can tell. Paolo
> We first expand zero_extend:DI address to and:DI and then try > to restore zero_extend:DI. Why do we do this transformation > to begin with? Suppose there were an outer AND that duplicated what this one did. Then when you combine those two, you merge it to one AND. Then make_compound_operation puts it back. The net result is to eliminate the outer AND. There are lots of similar sorts of things. As I said, the strategy there was to convert extractions and expansions into the corresponding logical and shift operations, see if they can merge with something outside (which is similarly converted), then convert the result (possibly merged) back. This, for example, is the code that will remove nested SIGN_EXTENDs.
> The answer to H.J.'s "Why do we do it for MEM then?" is simply > "because no one ever thought about not doing it" No, that's false. The same expand_compound_operation / make_compound_operation pair is present in the MEM case as in the SET case. It's just that there's some bug here that's noticable in not making proper MEMs that doesn't show up in the SET case because of the way the insns are structured.
diff --git a/gcc/combine.c b/gcc/combine.c index 6c3b17c..45180e5 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -5078,6 +5078,23 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy) } } } +#ifdef POINTERS_EXTEND_UNSIGNED + else if (POINTERS_EXTEND_UNSIGNED > 0 + && code == MEM + && GET_CODE (XEXP (x, 0)) == ZERO_EXTEND + && GET_MODE (XEXP (x, 0)) == Pmode) + { + /* If an address is zero-extended to Pmode, replace FROM with + TO while keeping ZERO_EXTEND. */ + new_rtx = subst (XEXP (XEXP (x, 0), 0), from, to, 0, 0, + unique_copy); + /* Drop ZERO_EXTEND on constant. */ + if (CONST_INT_P (new_rtx)) + SUBST (XEXP (x, 0), new_rtx); + else + SUBST (XEXP (XEXP (x, 0), 0), new_rtx); + } +#endif else { len = GET_RTX_LENGTH (code);