Message ID | 20110214185738.GA13004@intel.com |
---|---|
State | New |
Headers | show |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/14/11 11:57, H.J. Lu wrote: > Hi, > > cant_combine_insn_p won't check ZERO_EXTEND and SIGN_EXTEND. This > patch helps cant_combine_insn_p by copying the arg in the incoming mode > and extending the copy instead of the arg. OK for trunk when stage 1 > is open? Why not fix the problem in combine? IIRC the whole point behind the code you're twiddling in function.c is to avoid useless copies due to extending arguments. jeff -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNWX0DAAoJEBRtltQi2kC7VhsH/0H+h943mRzeY4xmCOf5RSu0 NI2f0zTAewlfcpqR8eaXSy8A1AYipUnS0NcS2oN7NO5v5A65YdSkGK+SWoqdKls1 eSnvr98AIFTy+ih4CKNxh3xkWNiCdG0LAfZed8zVb4cLPMBDbN5U/kojMHvw2865 pbr/+7QkJ6XNCc2R8oTuVsvxrFn998HhSgFFA2fH2Cs+PCyyg4U3V9agmr4LSsKf dDoUvg8cGmtsyU/+kDldJ/dA3nzkBRGBrPUWLwOL8B9FFK6e3kcwfi6CegXrZRC4 qaR3Z+bIMeUjM1OPn5vwsp7ks/whtHMY33flwmLYijklaNEdt9FoXYa4BrMRnUc= =KOsD -----END PGP SIGNATURE-----
On Mon, Feb 14, 2011 at 11:05 AM, Jeff Law <law@redhat.com> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 02/14/11 11:57, H.J. Lu wrote: >> Hi, >> >> cant_combine_insn_p won't check ZERO_EXTEND and SIGN_EXTEND. This >> patch helps cant_combine_insn_p by copying the arg in the incoming mode >> and extending the copy instead of the arg. OK for trunk when stage 1 >> is open? > Why not fix the problem in combine? IIRC the whole point behind the > code you're twiddling in function.c is to avoid useless copies due to > extending arguments. See: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47725#c4
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/14/11 12:07, H.J. Lu wrote: > On Mon, Feb 14, 2011 at 11:05 AM, Jeff Law <law@redhat.com> wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On 02/14/11 11:57, H.J. Lu wrote: >>> Hi, >>> >>> cant_combine_insn_p won't check ZERO_EXTEND and SIGN_EXTEND. This >>> patch helps cant_combine_insn_p by copying the arg in the incoming mode >>> and extending the copy instead of the arg. OK for trunk when stage 1 >>> is open? >> Why not fix the problem in combine? IIRC the whole point behind the >> code you're twiddling in function.c is to avoid useless copies due to >> extending arguments. > > See: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47725#c4 So all you're doing is trading one performance issue for another. I still think you're better off tackling this in combine -- IMHO the impact will be less there. jeff -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNWX5yAAoJEBRtltQi2kC7+nkH/0LmUtL2A/ziaKzCLy70QnU8 GS9uTj3wPE/h9tCzDrM1c3JZQSv1po1+wgTmoxgfmWYLSNUHi4b6OLWHJtTquca2 qKuSdak9r/MgJ2k79lQO8UQ3EKyZiVOd+aLumag4pmLTNHRwxMZ+BeL7X8AX2BRt wcPv8EqLAaiFyW19cWatm9/C3JA2kAf9ayBvMCpLS4Wyqp9qdRt/IaOmO9GHmw3n Uj0uZ+8apTu4TJ/4ZWT0+SZsLQkXHnnV40VUK08RObuU5QQBTLJ2dznQCjyJpvdV olmgmOQxox+grc5pQKYs9Tvq99brniO7/czHEi7v96iRmh2LHCkgXyQp3NmShqM= =1+XC -----END PGP SIGNATURE-----
> So all you're doing is trading one performance issue for another.
If adding a copy to a new pseudo was really a performance issue, we would have
many issues thoughout the compiler. AFAIK it's a classical trick.
On 02/14/2011 08:21 PM, Eric Botcazou wrote: >> So all you're doing is trading one performance issue for another. > > If adding a copy to a new pseudo was really a performance issue, we would have > many issues thoughout the compiler. AFAIK it's a classical trick. The insns we're dealing with here can potentially get REG_EQUIV notes attached to them, so they're a bit special. At least it needs to be verified that the optimization in IRA which moves such insns before their only use still triggers (PR42235). I agree with Jeff that combine would be the correct place to fix this. At least it takes class_likely_spilled_p into account, so it will restrict only those machines where extending the lifetime of hard regs is dangerous. Bernd
> I agree with Jeff that combine would be the correct place to fix this. > At least it takes class_likely_spilled_p into account, so it will > restrict only those machines where extending the lifetime of hard regs > is dangerous. OK, but I don't see how copying to a new pseudo would interfere with that.
On Mon, Feb 14, 2011 at 11:44 AM, Bernd Schmidt <bernds@codesourcery.com> wrote: > On 02/14/2011 08:21 PM, Eric Botcazou wrote: >>> So all you're doing is trading one performance issue for another. >> >> If adding a copy to a new pseudo was really a performance issue, we would have >> many issues thoughout the compiler. AFAIK it's a classical trick. > > The insns we're dealing with here can potentially get REG_EQUIV notes > attached to them, so they're a bit special. At least it needs to be > verified that the optimization in IRA which moves such insns before > their only use still triggers (PR42235). > > I agree with Jeff that combine would be the correct place to fix this. > At least it takes class_likely_spilled_p into account, so it will > restrict only those machines where extending the lifetime of hard regs > is dangerous. > Hi Jeff, Bernd, Does my patch at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47725#c3 do what you suggested?
On 02/14/2011 08:50 PM, H.J. Lu wrote: > > Does my patch at > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47725#c3 > > do what you suggested? Yes. That's how I'd fix it. Concerns about optimization seem to be misplaced as well; I've run this through my collection of input files and did not find a case where code generation changed. So, OK to install. Bernd
On Tue, Feb 15, 2011 at 7:50 AM, Bernd Schmidt <bernds@codesourcery.com> wrote: > On 02/14/2011 08:50 PM, H.J. Lu wrote: >> >> Does my patch at >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47725#c3 >> >> do what you suggested? > > Yes. That's how I'd fix it. Concerns about optimization seem to be > misplaced as well; I've run this through my collection of input files > and did not find a case where code generation changed. So, OK to install. > I checked it into trunk and reverted the function.c change on x32 branch. Thanks.
> Yes. That's how I'd fix it. Concerns about optimization seem to be > misplaced as well; I've run this through my collection of input files > and did not find a case where code generation changed. So, OK to install. I disagree, this isn't a regression so this isn't suitable for stage 4 as per http://gcc.gnu.org/ml/gcc/2011-02/msg00250.html Did you look at the code generated on the x32 branch?
On Tue, Feb 15, 2011 at 9:33 AM, Eric Botcazou <ebotcazou@adacore.com> wrote: >> Yes. That's how I'd fix it. Concerns about optimization seem to be >> misplaced as well; I've run this through my collection of input files >> and did not find a case where code generation changed. So, OK to install. > > I disagree, this isn't a regression so this isn't suitable for stage 4 as per > http://gcc.gnu.org/ml/gcc/2011-02/msg00250.html I can revert the change and recheck it in in stage 1. > Did you look at the code generated on the x32 branch? > I didn't check the detail. But I think combine.c change is better.
> I can revert the change and recheck it in in stage 1. Yes, thanks in advance. > I didn't check the detail. But I think combine.c change is better. I still disagree, but let's discuss this again when stage 1 reopens.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/15/11 11:08, H.J. Lu wrote: > On Tue, Feb 15, 2011 at 9:33 AM, Eric Botcazou <ebotcazou@adacore.com> wrote: >>> Yes. That's how I'd fix it. Concerns about optimization seem to be >>> misplaced as well; I've run this through my collection of input files >>> and did not find a case where code generation changed. So, OK to install. >> >> I disagree, this isn't a regression so this isn't suitable for stage 4 as per >> http://gcc.gnu.org/ml/gcc/2011-02/msg00250.html > > I can revert the change and recheck it in in stage 1. Probably wise since we're in stage4 right now. jeff -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNWutXAAoJEBRtltQi2kC7u14H/10dBjwL3lReIbfH6f9pELBQ 2l/uK9W7djHYBAUUxhzA5UJ/Guy7JHUgJl/CCwVx5+XVGqGY8SfeZO3g4Rmz7Cd9 qjAIul4Vkr/lOiw1xGSq2UHTrVoPxllO8oNJY3m1cwDmz0VF8svLcsglAjbdqGwa 9mQl17gyAeQrlziIDqB7nA7MZvxesXwUEkCQKZJgT0uOQ/SrkpndyemHlYjqeiRd aTqCg/b2Ihr/1BjljzQOQbwuiSMGg/Q17I9Ji0gEshTomjXs9eFTTgYlcCai2pc+ FWLS8yIUT8PVMDz2Lhj9k9pKo8nUpGBipORPEG3wEeWToMsVHkhghQsOpmFPrXM= =QxMB -----END PGP SIGNATURE-----
On 02/15/2011 06:33 PM, Eric Botcazou wrote: >> Yes. That's how I'd fix it. Concerns about optimization seem to be >> misplaced as well; I've run this through my collection of input files >> and did not find a case where code generation changed. So, OK to install. > > I disagree, this isn't a regression so this isn't suitable for stage 4 as per > http://gcc.gnu.org/ml/gcc/2011-02/msg00250.html This type of extension from a hard register is new in 4.6. There may well be other ports that fail in the same way that HJ has discovered. Bernd
On 02/14/2011 08:46 PM, Eric Botcazou wrote: >> I agree with Jeff that combine would be the correct place to fix this. >> At least it takes class_likely_spilled_p into account, so it will >> restrict only those machines where extending the lifetime of hard regs >> is dangerous. > > OK, but I don't see how copying to a new pseudo would interfere with that. For one thing, the set no longer matches the REG_EQUIV note we make here, and that does seem to interfere with the optimization. I've tested both patches on ARM, -march=armv7-a. The combiner patch produced no code changes. The function.c patch produced regressions (increased register pressure). Both results are as expected. To put it another way: the combiner change is conservatively correct, and necessary if we're going to have extends of hard registers in the RTL. The function.c change is demonstrably incorrect as shown by the ARM codegen regressions. Bernd
On Tue, Feb 15, 2011 at 2:55 PM, Bernd Schmidt <bernds@codesourcery.com> wrote: > On 02/14/2011 08:46 PM, Eric Botcazou wrote: >>> I agree with Jeff that combine would be the correct place to fix this. >>> At least it takes class_likely_spilled_p into account, so it will >>> restrict only those machines where extending the lifetime of hard regs >>> is dangerous. >> >> OK, but I don't see how copying to a new pseudo would interfere with that. > > For one thing, the set no longer matches the REG_EQUIV note we make > here, and that does seem to interfere with the optimization. I've tested > both patches on ARM, -march=armv7-a. The combiner patch produced no code > changes. The function.c patch produced regressions (increased register > pressure). Both results are as expected. > > To put it another way: the combiner change is conservatively correct, > and necessary if we're going to have extends of hard registers in the > RTL. The function.c change is demonstrably incorrect as shown by the ARM > codegen regressions. > I checked in my patch into trunk. Thanks.
On Thu, Mar 17, 2011 at 5:30 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Tue, Feb 15, 2011 at 2:55 PM, Bernd Schmidt <bernds@codesourcery.com> wrote: >> On 02/14/2011 08:46 PM, Eric Botcazou wrote: >>>> I agree with Jeff that combine would be the correct place to fix this. >>>> At least it takes class_likely_spilled_p into account, so it will >>>> restrict only those machines where extending the lifetime of hard regs >>>> is dangerous. >>> >>> OK, but I don't see how copying to a new pseudo would interfere with that. >> >> For one thing, the set no longer matches the REG_EQUIV note we make >> here, and that does seem to interfere with the optimization. I've tested >> both patches on ARM, -march=armv7-a. The combiner patch produced no code >> changes. The function.c patch produced regressions (increased register >> pressure). Both results are as expected. >> >> To put it another way: the combiner change is conservatively correct, >> and necessary if we're going to have extends of hard registers in the >> RTL. The function.c change is demonstrably incorrect as shown by the ARM >> codegen regressions. >> > > I checked in my patch into trunk. > I noticed that for x32, all pointers passed in registers are zero-extended by caller. If we can fix http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48085 by avoiding zero-extension in callee, this issue won't happen for x32. I will revert the combine change for now and try to implement this approach. Thanks.
On Thu, Mar 17, 2011 at 9:00 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Thu, Mar 17, 2011 at 5:30 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Tue, Feb 15, 2011 at 2:55 PM, Bernd Schmidt <bernds@codesourcery.com> wrote: >>> On 02/14/2011 08:46 PM, Eric Botcazou wrote: >>>>> I agree with Jeff that combine would be the correct place to fix this. >>>>> At least it takes class_likely_spilled_p into account, so it will >>>>> restrict only those machines where extending the lifetime of hard regs >>>>> is dangerous. >>>> >>>> OK, but I don't see how copying to a new pseudo would interfere with that. >>> >>> For one thing, the set no longer matches the REG_EQUIV note we make >>> here, and that does seem to interfere with the optimization. I've tested >>> both patches on ARM, -march=armv7-a. The combiner patch produced no code >>> changes. The function.c patch produced regressions (increased register >>> pressure). Both results are as expected. >>> >>> To put it another way: the combiner change is conservatively correct, >>> and necessary if we're going to have extends of hard registers in the >>> RTL. The function.c change is demonstrably incorrect as shown by the ARM >>> codegen regressions. >>> >> >> I checked in my patch into trunk. >> > > I noticed that for x32, all pointers passed in registers are zero-extended > by caller. If we can fix > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48085 > > by avoiding zero-extension in callee, this issue won't happen for x32. I will > revert the combine change for now and try to implement this approach. > The issues are 1. When passing a 32bit integer parameter in a register, hardware always zero-extends it to 64bit. I can update x32 psABI to document it, which costs nothing to implement in GCC. 2. assign_parm_setup_reg zero-extends 32bit pointer in register to 64bit, which is redundant. It leads 2 problems: 1. Redundant zero-extension at function entry. 2. combine doesn't check zero-extension on hard register and leads to internal compiler error. Is there a way to avoid redundant zero-extension at function entry to solve both problems? Thanks.
> It leads 2 problems: > > 1. Redundant zero-extension at function entry. > 2. combine doesn't check zero-extension on hard register and leads to > internal compiler error. > > Is there a way to avoid redundant zero-extension at function entry to > solve both problems? Eliminating the redundant extension in the callee seems indeed to be appealing. You need to find out who decides that the incoming parameter needs to be zero- extended. Is that the call to promote_function_mode in assign_parm_setup_reg?
On Tue, Mar 22, 2011 at 1:05 PM, Eric Botcazou <ebotcazou@adacore.com> wrote: >> It leads 2 problems: >> >> 1. Redundant zero-extension at function entry. >> 2. combine doesn't check zero-extension on hard register and leads to >> internal compiler error. >> >> Is there a way to avoid redundant zero-extension at function entry to >> solve both problems? > > Eliminating the redundant extension in the callee seems indeed to be appealing. > You need to find out who decides that the incoming parameter needs to be zero- > extended. Is that the call to promote_function_mode in assign_parm_setup_reg? > It is: op0 = parmreg; op1 = validated_mem; if (icode != CODE_FOR_nothing && insn_data[icode].operand[0].predicate (op0, promoted_nominal_mode) && insn_data[icode].operand[1].predicate (op1, data->passed_mode)) { enum rtx_code code = unsignedp ? ZERO_EXTEND : SIGN_EXTEND; rtx insn, insns; HARD_REG_SET hardregs; start_sequence (); insn = gen_extend_insn (op0, op1, promoted_nominal_mode, data->passed_mode, unsignedp); emit_insn (insn); insns = get_insns (); in assign_parm_setup_reg.
> It is: > > op0 = parmreg; > op1 = validated_mem; > if (icode != CODE_FOR_nothing > && insn_data[icode].operand[0].predicate (op0, > promoted_nominal_mode) && insn_data[icode].operand[1].predicate (op1, > data->passed_mode)) { > enum rtx_code code = unsignedp ? ZERO_EXTEND : SIGN_EXTEND; > rtx insn, insns; > HARD_REG_SET hardregs; > > start_sequence (); > insn = gen_extend_insn (op0, op1, promoted_nominal_mode, > data->passed_mode, unsignedp); > emit_insn (insn); > insns = get_insns (); Sure, but why is need_conversion set to true?
On Wed, Mar 23, 2011 at 1:22 AM, Eric Botcazou <ebotcazou@adacore.com> wrote: >> It is: >> >> op0 = parmreg; >> op1 = validated_mem; >> if (icode != CODE_FOR_nothing >> && insn_data[icode].operand[0].predicate (op0, >> promoted_nominal_mode) && insn_data[icode].operand[1].predicate (op1, >> data->passed_mode)) { >> enum rtx_code code = unsignedp ? ZERO_EXTEND : SIGN_EXTEND; >> rtx insn, insns; >> HARD_REG_SET hardregs; >> >> start_sequence (); >> insn = gen_extend_insn (op0, op1, promoted_nominal_mode, >> data->passed_mode, unsignedp); >> emit_insn (insn); >> insns = get_insns (); > > Sure, but why is need_conversion set to true? > Pointer is promoted to Pmode from ptr_mode.
> Pointer is promoted to Pmode from ptr_mode.
Indeed. However the problem is the 2 in assign_parm_setup_reg:
/* Store the parm in a pseudoregister during the function, but we may
need to do it in a wider mode. Using 2 here makes the result
consistent with promote_decl_mode and thus expand_expr_real_1. */
promoted_nominal_mode
= promote_function_mode (data->nominal_type, data->nominal_mode, &unsignedp,
TREE_TYPE (current_function_decl), 2);
which is supposed to match the 2 in promote_decl_mode:
if (TREE_CODE (decl) == RESULT_DECL
|| TREE_CODE (decl) == PARM_DECL)
pmode = promote_function_mode (type, mode, &unsignedp,
TREE_TYPE (current_function_decl), 2);
else
pmode = promote_mode (type, mode, &unsignedp);
but doesn't match the 0 in assign_parm_find_data_types:
promoted_mode = promote_function_mode (passed_type, passed_mode, &unsignedp,
TREE_TYPE (current_function_decl), 0);
so you get the redundant extension in the callee. The solution is to define
the promote_function_mode hook for x86 to something like:
static enum machine_mode
ix86_promote_function_mode (const_tree type,
enum machine_mode mode,
int *punsignedp,
const_tree fntype ATTRIBUTE_UNUSED,
int for_return ATTRIBUTE_UNUSED)
{
if (POINTER_TYPE_P (type))
{
*punsignedp = POINTERS_EXTEND_UNSIGNED;
return Pmode;
}
return mode;
}
Index: gcc/testsuite/ChangeLog.x32 =================================================================== --- gcc/testsuite/ChangeLog.x32 (revision 170144) +++ gcc/testsuite/ChangeLog.x32 (working copy) @@ -1,5 +1,10 @@ 2011-02-13 H.J. Lu <hongjiu.lu@intel.com> + PR middle-end/47725 + * gcc.dg/torture/pr47725.c: New. + +2011-02-13 H.J. Lu <hongjiu.lu@intel.com> + PR target/47715 * gcc.target/i386/pr47715-3.c: New. Index: gcc/testsuite/gcc.dg/torture/pr47725.c =================================================================== --- gcc/testsuite/gcc.dg/torture/pr47725.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr47725.c (revision 0) @@ -0,0 +1,16 @@ +/* { dg-do compile } */ + +struct _Unwind_Context +{ + void *reg[17]; + void *ra; +}; +extern void bar (struct _Unwind_Context *); +void +__frame_state_for (void *pc_target) +{ + struct _Unwind_Context context; + __builtin_memset (&context, 0, sizeof (struct _Unwind_Context)); + context.ra = pc_target; + bar (&context); +} Index: gcc/function.c =================================================================== --- gcc/function.c (revision 170144) +++ gcc/function.c (working copy) @@ -3004,6 +3004,14 @@ assign_parm_setup_reg (struct assign_par HARD_REG_SET hardregs; start_sequence (); + if (REG_P (op1) && REGNO (op1) < FIRST_PSEUDO_REGISTER) + { + /* We must copy the hard register first before extending + it. Otherwise, combine won't see the hard register. */ + rtx copy = gen_reg_rtx (data->passed_mode); + emit_move_insn (copy, op1); + op1 = copy; + } insn = gen_extend_insn (op0, op1, promoted_nominal_mode, data->passed_mode, unsignedp); emit_insn (insn); Index: gcc/ChangeLog.x32 =================================================================== --- gcc/ChangeLog.x32 (revision 170146) +++ gcc/ChangeLog.x32 (working copy) @@ -1,3 +1,9 @@ +2011-02-14 H.J. Lu <hongjiu.lu@intel.com> + + PR middle-end/47725 + * function.c (assign_parm_setup_reg): Copy the hard register + first before extending it. + 2011-02-13 H.J. Lu <hongjiu.lu@intel.com> PR target/47715