diff mbox

patch for elimination to SP when it is changed in RTL (PR57293)

Message ID 529D1B4E.5070304@redhat.com
State New
Headers show

Commit Message

Vladimir Makarov Dec. 2, 2013, 11:44 p.m. UTC
On 12/1/2013, 7:57 AM, James Greenhalgh wrote:
> On Thu, Nov 28, 2013 at 10:11:26PM +0000, Vladimir Makarov wrote:
>> Committed as rev. 205498.
>>
>>    2013-11-28  Vladimir Makarov<vmakarov@redhat.com>
>>
>> 	PR target/57293
>> 	* ira.h (ira_setup_eliminable_regset): Remove parameter.
>> 	* ira.c (ira_setup_eliminable_regset): Ditto.  Add
>> 	SUPPORTS_STACK_ALIGNMENT for crtl->stack_realign_needed.
>> 	Don't call lra_init_elimination.
>> 	(ira): Call ira_setup_eliminable_regset without arguments.
>> 	* loop-invariant.c (calculate_loop_reg_pressure): Remove argument
>> 	from ira_setup_eliminable_regset call.
>> 	* gcse.c (calculate_bb_reg_pressure): Ditto.
>> 	* haifa-sched.c (sched_init): Ditto.
>> 	* lra.h (lra_init_elimination): Remove the prototype.
>> 	* lra-int.h (lra_insn_recog_data): New member sp_offset.  Move
>> 	used_insn_alternative upper.
>> 	(lra_eliminate_regs_1): Add one more parameter.
>> 	(lra-eliminate): Ditto.
>> 	* lra.c (lra_invalidate_insn_data): Set sp_offset.
>> 	(setup_sp_offset): New.
>> 	(lra_process_new_insns): Call setup_sp_offset.
>> 	(lra): Add argument to lra_eliminate calls.
>> 	* lra-constraints.c (get_equiv_substitution): Rename to get_equiv.
>> 	(get_equiv_with_elimination): New.
>> 	(process_addr_reg): Call get_equiv_with_elimination instead of
>> 	get_equiv_substitution.
>> 	(equiv_address_substitution): Ditto.
>> 	(loc_equivalence_change_p): Ditto.
>> 	(loc_equivalence_callback, lra_constraints): Ditto.
>> 	(curr_insn_transform): Ditto.  Print the sp offset
>> 	(process_alt_operands): Prevent stack pointer reloads.
>> 	(lra_constraints): Remove one argument from lra_eliminate call.
>> 	Move it up.  Mark used hard regs bfore it.  Use
>> 	get_equiv_with_elimination instead of get_equiv_substitution.
>> 	* lra-eliminations.c (lra_eliminate_regs_1): Add parameter and
>> 	assert for param values combination.  Use sp offset.  Add argument
>> 	to lra_eliminate_regs_1 calls.
>> 	(lra_eliminate_regs): Add argument to lra_eliminate_regs_1 call.
>> 	(curr_sp_change): New static var.
>> 	(mark_not_eliminable): Add parameter.  Update curr_sp_change.
>> 	Don't prevent elimination to sp if we can calculate its change.
>> 	Pass the argument to mark_not_eliminable calls.
>> 	(eliminate_regs_in_insn): Add a parameter.  Use sp offset.  Add
>> 	argument to lra_eliminate_regs_1 call.
>> 	(update_reg_eliminate): Move calculation of hard regs for spill
>> 	lower.  Switch off lra_in_progress temporarily to generate regs
>> 	involved into elimination.
>> 	(lra_init_elimination): Rename to init_elimination.  Make it
>> 	static.  Set up insn sp offset, check the offsets at the end of
>> 	BBs.
>> 	(process_insn_for_elimination): Add parameter.  Pass its value to
>> 	eliminate_regs_in_insn.
>> 	(lra_eliminate): : Add parameter.  Pass its value to
>> 	process_insn_for_elimination.  Add assert for param values
>> 	combination.  Call init_elimination.  Don't update offsets in
>> 	equivalence substitutions.
>> 	* lra-spills.c (assign_mem_slot): Don't call lra_eliminate_regs_1
>> 	for created stack slot.
>> 	(remove_pseudos): Call lra_eliminate_regs_1 before changing memory
>> 	onto stack slot.
>>
>> 2013-11-28  Vladimir Makarov<vmakarov@redhat.com>
>>
>> 	PR target/57293
>> 	* gcc.target/i386/pr57293.c: New.
>
> Hi Vlad,
>
> This patch seems to cause some problems for AArch64. I see an assert
> triggering when building libgloss:
>
> /work/gcc-clean/build-aarch64-none-elf/obj/gcc1/gcc/xgcc -B/work/gcc-clean/build-aarch64-none-elf/obj/gcc1/gcc/ -B/work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/newlib/ -isystem /work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/newlib/targ-include -isystem /work/gcc-clean/src/binutils/newlib/libc/include -B/work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/libgloss/aarch64 -L/work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/libgloss/libnosys -L/work/gcc-clean/src/binutils/libgloss/aarch64 -L/work/gcc-clean/build-aarch64-none-elf/obj/binutils/./ld    -O2 -g -O2 -g -I. -I/work/gcc-clean/src/binutils/libgloss/aarch64/.. -DARM_RDI_MONITOR -o rdimon-_exit.o -c /work/gcc-clean/src/binutils/libgloss/aarch64/_exit.c
> /work/gcc-clean/src/binutils/libgloss/aarch64/_exit.c: In function '_exit':
> /work/gcc-clean/src/binutils/libgloss/aarch64/_exit.c:41:1: internal compiler error: in update_reg_eliminate, at lra-eliminations.c:1157
>   }
>   ^
>


   First of all, it is a bad situation for code performance when IRA
decides that it can use frame pointer for allocation, and after that
LRA/reload decides that frame pointer can not be used and spills all
pseudos assigned to FP.  The generated code will be much worse than
one generated if we decided not to use FP from the IRA start.

   Therefore I decided to put an assert for checking the situation.  It
was triggered by your case.  I think the problem in the following
code.

aarch64_can_eliminate (const int from, const int to)
{
   if (frame_pointer_needed)
      ...
   else
     {
       ...
       if (from == FRAME_POINTER_REGNUM && to == STACK_POINTER_REGNUM
	  && df_regs_ever_live_p (LR_REGNUM)
	  && faked_omit_frame_pointer)
	return false;
     }
   return true;
}


and

static bool
aarch64_frame_pointer_required (void)
{
   ...
   if (flag_omit_frame_pointer && !faked_omit_frame_pointer)
     return false;
   else if (flag_omit_leaf_frame_pointer)
     return !crtl->is_leaf;
   return true;
}

   IRA calls hook frame_pointer_required and it returns false.  After
that LRA calls can_eliminate hook and it returns false which means
that fp can not be used for allocation and we should spill all pseudos
assigned to it.

   So the problem can be solved by modifying frame_pointer_required
hook.

   I also don't like

       if (from == FRAME_POINTER_REGNUM && to == STACK_POINTER_REGNUM
           && df_regs_ever_live_p (LR_REGNUM)
           && faked_omit_frame_pointer)
         return false;

   It means we cannot eliminate FP by SP but we still can eliminate AP
by SP, for example.  I think it is wrong.

So the following patch solves the problem and improves generated code.

If somebody with the rights approves, I can commit it tomorrow.

2013-12-02  Vladimir Makarov  <vmakarov@redhat.com>

	* config/aarch64/aarch64.c (aarch64_frame_pointer_required): Check
	LR_REGNUM.
	(aarch64_can_eliminate): Don't check elimination source when
	frame_pointer_requred is false.

Comments

Jeff Law Dec. 3, 2013, 5:12 a.m. UTC | #1
On 12/02/13 16:44, Vladimir Makarov wrote:
>
>
>    First of all, it is a bad situation for code performance when IRA
> decides that it can use frame pointer for allocation, and after that
> LRA/reload decides that frame pointer can not be used and spills all
> pseudos assigned to FP.  The generated code will be much worse than
> one generated if we decided not to use FP from the IRA start.
Yup.  Actually, I think we had the same problem with the old 
local/global/reload allocator as well.

I don't recall the specifics, but I think the problem was global thought 
it could eliminate FP, but reload didn't and as a result code generation 
suffered.

I don't recall ever auditing ports to see if they were vulnerable to 
this class of problem.  So there may be others that will might trigger 
the assert.

>
> 2013-12-02  Vladimir Makarov  <vmakarov@redhat.com>
>
>      * config/aarch64/aarch64.c (aarch64_frame_pointer_required): Check
>      LR_REGNUM.
>      (aarch64_can_eliminate): Don't check elimination source when
>      frame_pointer_requred is false.
s/frame_pointer_requred/frame_pointer_required in the ChangeLog entry.

jeff
Marcus Shawcroft Dec. 3, 2013, 11:54 a.m. UTC | #2
On 2 December 2013 23:44, Vladimir Makarov <vmakarov@redhat.com> wrote:

> If somebody with the rights approves, I can commit it tomorrow.
>
> 2013-12-02  Vladimir Makarov  <vmakarov@redhat.com>
>
>         * config/aarch64/aarch64.c (aarch64_frame_pointer_required): Check
>         LR_REGNUM.
>         (aarch64_can_eliminate): Don't check elimination source when
>         frame_pointer_requred is false.
>


This is fine with me, go ahead and commit it.  Thanks /Marcus
Vladimir Makarov Dec. 3, 2013, 3:39 p.m. UTC | #3
On 12/3/2013, 6:54 AM, Marcus Shawcroft wrote:
> On 2 December 2013 23:44, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>> If somebody with the rights approves, I can commit it tomorrow.
>>
>> 2013-12-02  Vladimir Makarov  <vmakarov@redhat.com>
>>
>>          * config/aarch64/aarch64.c (aarch64_frame_pointer_required): Check
>>          LR_REGNUM.
>>          (aarch64_can_eliminate): Don't check elimination source when
>>          frame_pointer_requred is false.
>>
>
>
> This is fine with me, go ahead and commit it.  Thanks /Marcus
>
Committed as rev. 205637 with changelog fix of a typo found by Jeff.
Yvan Roux Dec. 11, 2013, 10:35 a.m. UTC | #4
Hi Vladimir,

I've some regressions on ARM after this SP elimination patch, and they
are execution failures.  Here is the list:

g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
gcc.c-torture/execute/va-arg-22.c  -O2
gcc.dg/atomic/c11-atomic-exec-5.c  -O0
gfortran.dg/direct_io_12.f90  -O[23]
gfortran.dg/elemental_dependency_1.f90  -O2
gfortran.dg/matmul_2.f90  -O2
gfortran.dg/matmul_6.f90  -O2
gfortran.dg/mvbits_7.f90  -O3
gfortran.dg/unlimited_polymorphic_1.f03  -O3

I reduced and looked at var-arg-22.c and the issue is that in
lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
we try to do here is to change the pseudo 195 of the insn 118 below :

(insn 118 114 112 8 (set (reg:DI 195)
        (unspec:DI [
                (mem:DI (plus:SI (reg/f:SI 215)
                        (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
+ 64B]+8 S8 A8])
            ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
     (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
                (const_int 8 [0x8])) [7 a35+8 S8 A32])
        (nil)))

with its equivalent (x arg of lra_eliminate_regs_1):

(mem/c:DI (plus:SI (reg/f:SI 102 sfp)
        (const_int 76 [0x4c])) [7 a35+8 S8 A32])

lra_eliminate_regs_1 is called with full_p = true (it is not really
clear for what it means), but in the PLUS switch case, we have offset
= 0xb (given by ep->offset) and  as lra_get_insn_recog_data
(insn)->sp_offset value is 0, we will indeed add 0xb to the original
0x4c offset.

So, here I don't get if it is the sp_offset value of the
lra_insn_recog_data element which is not well updated or if lra_
eliminate_regs_1 has to be called with update_p and not full_p (which
fixed the value in that particular case).  Is it more obvious for you
?

Thanks
Yvan


On 3 December 2013 16:39, Vladimir Makarov <vmakarov@redhat.com> wrote:
> On 12/3/2013, 6:54 AM, Marcus Shawcroft wrote:
>>
>> On 2 December 2013 23:44, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>
>>> If somebody with the rights approves, I can commit it tomorrow.
>>>
>>> 2013-12-02  Vladimir Makarov  <vmakarov@redhat.com>
>>>
>>>          * config/aarch64/aarch64.c (aarch64_frame_pointer_required):
>>> Check
>>>          LR_REGNUM.
>>>          (aarch64_can_eliminate): Don't check elimination source when
>>>          frame_pointer_requred is false.
>>>
>>
>>
>> This is fine with me, go ahead and commit it.  Thanks /Marcus
>>
> Committed as rev. 205637 with changelog fix of a typo found by Jeff.
>
Ramana Radhakrishnan Dec. 11, 2013, 11:07 a.m. UTC | #5
Yvan,

On Wed, Dec 11, 2013 at 10:35 AM, Yvan Roux <yvan.roux@linaro.org> wrote:
> Hi Vladimir,
>
> I've some regressions on ARM after this SP elimination patch, and they
> are execution failures.  Here is the list:

Pragmatically, I think it's time we turned LRA on by default now that
we are in stage3 and that would help with getting more issues out of
the auto-testers quicker than anything else. Given we are now well
into stage3, we should make sure that the LRA support gets as much
testing as it can get in the run-up to the release.

Can you prepare a patch for this please ?

regards
Ramana



>
> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
> gcc.c-torture/execute/va-arg-22.c  -O2
> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
> gfortran.dg/direct_io_12.f90  -O[23]
> gfortran.dg/elemental_dependency_1.f90  -O2
> gfortran.dg/matmul_2.f90  -O2
> gfortran.dg/matmul_6.f90  -O2
> gfortran.dg/mvbits_7.f90  -O3
> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>
> I reduced and looked at var-arg-22.c and the issue is that in
> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
> we try to do here is to change the pseudo 195 of the insn 118 below :
>
> (insn 118 114 112 8 (set (reg:DI 195)
>         (unspec:DI [
>                 (mem:DI (plus:SI (reg/f:SI 215)
>                         (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
> + 64B]+8 S8 A8])
>             ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>      (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>                 (const_int 8 [0x8])) [7 a35+8 S8 A32])
>         (nil)))
>
> with its equivalent (x arg of lra_eliminate_regs_1):
>
> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>         (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>
> lra_eliminate_regs_1 is called with full_p = true (it is not really
> clear for what it means), but in the PLUS switch case, we have offset
> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
> 0x4c offset.
>
> So, here I don't get if it is the sp_offset value of the
> lra_insn_recog_data element which is not well updated or if lra_
> eliminate_regs_1 has to be called with update_p and not full_p (which
> fixed the value in that particular case).  Is it more obvious for you
> ?
>
> Thanks
> Yvan
>
>
> On 3 December 2013 16:39, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> On 12/3/2013, 6:54 AM, Marcus Shawcroft wrote:
>>>
>>> On 2 December 2013 23:44, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>>
>>>> If somebody with the rights approves, I can commit it tomorrow.
>>>>
>>>> 2013-12-02  Vladimir Makarov  <vmakarov@redhat.com>
>>>>
>>>>          * config/aarch64/aarch64.c (aarch64_frame_pointer_required):
>>>> Check
>>>>          LR_REGNUM.
>>>>          (aarch64_can_eliminate): Don't check elimination source when
>>>>          frame_pointer_requred is false.
>>>>
>>>
>>>
>>> This is fine with me, go ahead and commit it.  Thanks /Marcus
>>>
>> Committed as rev. 205637 with changelog fix of a typo found by Jeff.
>>
Yvan Roux Dec. 11, 2013, 11:30 a.m. UTC | #6
> Pragmatically, I think it's time we turned LRA on by default now that
> we are in stage3 and that would help with getting more issues out of
> the auto-testers quicker than anything else. Given we are now well
> into stage3, we should make sure that the LRA support gets as much
> testing as it can get in the run-up to the release.

I agree.

> Can you prepare a patch for this please ?

I'll post the patch on the list.

Thanks,
Yvan
Vladimir Makarov Dec. 11, 2013, 6:25 p.m. UTC | #7
On 12/11/2013, 5:35 AM, Yvan Roux wrote:
> Hi Vladimir,
>
> I've some regressions on ARM after this SP elimination patch, and they
> are execution failures.  Here is the list:
>
> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
> gcc.c-torture/execute/va-arg-22.c  -O2
> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
> gfortran.dg/direct_io_12.f90  -O[23]
> gfortran.dg/elemental_dependency_1.f90  -O2
> gfortran.dg/matmul_2.f90  -O2
> gfortran.dg/matmul_6.f90  -O2
> gfortran.dg/mvbits_7.f90  -O3
> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>
> I reduced and looked at var-arg-22.c and the issue is that in
> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
> we try to do here is to change the pseudo 195 of the insn 118 below :
>
> (insn 118 114 112 8 (set (reg:DI 195)
>          (unspec:DI [
>                  (mem:DI (plus:SI (reg/f:SI 215)
>                          (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
> + 64B]+8 S8 A8])
>              ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>       (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>                  (const_int 8 [0x8])) [7 a35+8 S8 A32])
>          (nil)))
>
> with its equivalent (x arg of lra_eliminate_regs_1):
>
> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>          (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>
> lra_eliminate_regs_1 is called with full_p = true (it is not really
> clear for what it means),

It means we use full offset between the regs, otherwise we use change in 
the full offset from the previous iteration (it can be changed as we 
reserve stack memory for spilled pseudos and the reservation can be done 
several times).  As equiv value is stored as it was before any 
elimination, we need always to use full offset to make elimination.

  but in the PLUS switch case, we have offset
> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
> 0x4c offset.
>

0 value is suspicious because it is default.  We might have not set up 
it from neighbor insns.


> So, here I don't get if it is the sp_offset value of the
> lra_insn_recog_data element which is not well updated or if lra_
> eliminate_regs_1 has to be called with update_p and not full_p (which
> fixed the value in that particular case).  Is it more obvious for you
> ?
>

Yvan, could you send me the reduced preprocessed case and the options 
for cc1 to reproduce it.
Yvan Roux Dec. 11, 2013, 6:59 p.m. UTC | #8
On 11 December 2013 19:25, Vladimir Makarov <vmakarov@redhat.com> wrote:
> On 12/11/2013, 5:35 AM, Yvan Roux wrote:
>>
>> Hi Vladimir,
>>
>> I've some regressions on ARM after this SP elimination patch, and they
>> are execution failures.  Here is the list:
>>
>> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
>> gcc.c-torture/execute/va-arg-22.c  -O2
>> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
>> gfortran.dg/direct_io_12.f90  -O[23]
>> gfortran.dg/elemental_dependency_1.f90  -O2
>> gfortran.dg/matmul_2.f90  -O2
>> gfortran.dg/matmul_6.f90  -O2
>> gfortran.dg/mvbits_7.f90  -O3
>> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>>
>> I reduced and looked at var-arg-22.c and the issue is that in
>> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
>> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
>> we try to do here is to change the pseudo 195 of the insn 118 below :
>>
>> (insn 118 114 112 8 (set (reg:DI 195)
>>          (unspec:DI [
>>                  (mem:DI (plus:SI (reg/f:SI 215)
>>                          (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
>> + 64B]+8 S8 A8])
>>              ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>>       (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>>                  (const_int 8 [0x8])) [7 a35+8 S8 A32])
>>          (nil)))
>>
>> with its equivalent (x arg of lra_eliminate_regs_1):
>>
>> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>>          (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>>
>> lra_eliminate_regs_1 is called with full_p = true (it is not really
>> clear for what it means),
>
>
> It means we use full offset between the regs, otherwise we use change in the
> full offset from the previous iteration (it can be changed as we reserve
> stack memory for spilled pseudos and the reservation can be done several
> times).  As equiv value is stored as it was before any elimination, we need
> always to use full offset to make elimination.

Ok thanks it's clearer.

>  but in the PLUS switch case, we have offset
>>
>> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
>> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
>> 0x4c offset.
>>
>
> 0 value is suspicious because it is default.  We might have not set up it
> from neighbor insns.
>
>
>
>> So, here I don't get if it is the sp_offset value of the
>> lra_insn_recog_data element which is not well updated or if lra_
>> eliminate_regs_1 has to be called with update_p and not full_p (which
>> fixed the value in that particular case).  Is it more obvious for you
>> ?
>>
>
> Yvan, could you send me the reduced preprocessed case and the options for
> cc1 to reproduce it.


Here is cc1 command line :

cc1 -quiet -march=armv7-a -mtune=cortex-a15 -mfloat-abi=hard
-mfpu=neon -mthumb  v2.c  -O2

I use a native build on a chromebook, but it's reproducible with a
cross compiler.

With the attached test case the issue is when processing insn 118.

Thanks,
Yvan
Vladimir Makarov Dec. 12, 2013, 7:18 p.m. UTC | #9
On 12/11/2013, 1:59 PM, Yvan Roux wrote:
> On 11 December 2013 19:25, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> On 12/11/2013, 5:35 AM, Yvan Roux wrote:
>>>
>>> Hi Vladimir,
>>>
>>> I've some regressions on ARM after this SP elimination patch, and they
>>> are execution failures.  Here is the list:
>>>
>>> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
>>> gcc.c-torture/execute/va-arg-22.c  -O2
>>> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
>>> gfortran.dg/direct_io_12.f90  -O[23]
>>> gfortran.dg/elemental_dependency_1.f90  -O2
>>> gfortran.dg/matmul_2.f90  -O2
>>> gfortran.dg/matmul_6.f90  -O2
>>> gfortran.dg/mvbits_7.f90  -O3
>>> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>>>
>>> I reduced and looked at var-arg-22.c and the issue is that in
>>> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
>>> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
>>> we try to do here is to change the pseudo 195 of the insn 118 below :
>>>
>>> (insn 118 114 112 8 (set (reg:DI 195)
>>>           (unspec:DI [
>>>                   (mem:DI (plus:SI (reg/f:SI 215)
>>>                           (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
>>> + 64B]+8 S8 A8])
>>>               ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>>>        (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>>>                   (const_int 8 [0x8])) [7 a35+8 S8 A32])
>>>           (nil)))
>>>
>>> with its equivalent (x arg of lra_eliminate_regs_1):
>>>
>>> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>>>           (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>>>
>>> lra_eliminate_regs_1 is called with full_p = true (it is not really
>>> clear for what it means),
>>
>>
>> It means we use full offset between the regs, otherwise we use change in the
>> full offset from the previous iteration (it can be changed as we reserve
>> stack memory for spilled pseudos and the reservation can be done several
>> times).  As equiv value is stored as it was before any elimination, we need
>> always to use full offset to make elimination.
>
> Ok thanks it's clearer.
>
>>   but in the PLUS switch case, we have offset
>>>
>>> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
>>> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
>>> 0x4c offset.
>>>
>>
>> 0 value is suspicious because it is default.  We might have not set up it
>> from neighbor insns.
>>
>>
>>
>>> So, here I don't get if it is the sp_offset value of the
>>> lra_insn_recog_data element which is not well updated or if lra_
>>> eliminate_regs_1 has to be called with update_p and not full_p (which
>>> fixed the value in that particular case).  Is it more obvious for you
>>> ?
>>>
>>
>> Yvan, could you send me the reduced preprocessed case and the options for
>> cc1 to reproduce it.
>
>
> Here is cc1 command line :
>
> cc1 -quiet -march=armv7-a -mtune=cortex-a15 -mfloat-abi=hard
> -mfpu=neon -mthumb  v2.c  -O2
>
> I use a native build on a chromebook, but it's reproducible with a
> cross compiler.
>
> With the attached test case the issue is when processing insn 118.

The offset is updated two times and that is wrong.  That is because 
memory in init insn is shared by ira_reg_equiv and the test involves 2 
equivalent substitutions.  As I wrote equiv should be stored in original 
form by the current patch design.  Simple copying will not work as the 
first substitution is not done in this case.

I need some time to think how to fix it better still I'll try to fix it 
tomorrow.  I expected that the patch might have some problems.  The 
patch code is quite big although it is just a long standing PR fix. 
Therefore that was my first PR fixed on stage 3.  It is good to have it 
tested earlier and sorry to break some arm tests.
Yvan Roux Dec. 13, 2013, 1:07 p.m. UTC | #10
Thanks for your help Vlad.  Another bad news about this PR fix, is
that it has resurrected the thumb_movhi_clobber bug (PR 58785) but in
a different manner as the original failing testcase still pass.  I
attached a testcase to be compiled with :

cc1 -mthumb -mcpu=cortex-m0 -O2 m.c

And Thumb bootstrap seems to be broken with an ICE in check_rtl, I'm
checking if it is the same issue.

Yvan


On 12 December 2013 20:18, Vladimir Makarov <vmakarov@redhat.com> wrote:
> On 12/11/2013, 1:59 PM, Yvan Roux wrote:
>>
>> On 11 December 2013 19:25, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>>
>>> On 12/11/2013, 5:35 AM, Yvan Roux wrote:
>>>>
>>>>
>>>> Hi Vladimir,
>>>>
>>>> I've some regressions on ARM after this SP elimination patch, and they
>>>> are execution failures.  Here is the list:
>>>>
>>>> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
>>>> gcc.c-torture/execute/va-arg-22.c  -O2
>>>> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
>>>> gfortran.dg/direct_io_12.f90  -O[23]
>>>> gfortran.dg/elemental_dependency_1.f90  -O2
>>>> gfortran.dg/matmul_2.f90  -O2
>>>> gfortran.dg/matmul_6.f90  -O2
>>>> gfortran.dg/mvbits_7.f90  -O3
>>>> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>>>>
>>>> I reduced and looked at var-arg-22.c and the issue is that in
>>>> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
>>>> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
>>>> we try to do here is to change the pseudo 195 of the insn 118 below :
>>>>
>>>> (insn 118 114 112 8 (set (reg:DI 195)
>>>>           (unspec:DI [
>>>>                   (mem:DI (plus:SI (reg/f:SI 215)
>>>>                           (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
>>>> + 64B]+8 S8 A8])
>>>>               ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>>>>        (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>>>>                   (const_int 8 [0x8])) [7 a35+8 S8 A32])
>>>>           (nil)))
>>>>
>>>> with its equivalent (x arg of lra_eliminate_regs_1):
>>>>
>>>> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>>>>           (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>>>>
>>>> lra_eliminate_regs_1 is called with full_p = true (it is not really
>>>> clear for what it means),
>>>
>>>
>>>
>>> It means we use full offset between the regs, otherwise we use change in
>>> the
>>> full offset from the previous iteration (it can be changed as we reserve
>>> stack memory for spilled pseudos and the reservation can be done several
>>> times).  As equiv value is stored as it was before any elimination, we
>>> need
>>> always to use full offset to make elimination.
>>
>>
>> Ok thanks it's clearer.
>>
>>>   but in the PLUS switch case, we have offset
>>>>
>>>>
>>>> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
>>>> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
>>>> 0x4c offset.
>>>>
>>>
>>> 0 value is suspicious because it is default.  We might have not set up it
>>> from neighbor insns.
>>>
>>>
>>>
>>>> So, here I don't get if it is the sp_offset value of the
>>>> lra_insn_recog_data element which is not well updated or if lra_
>>>> eliminate_regs_1 has to be called with update_p and not full_p (which
>>>> fixed the value in that particular case).  Is it more obvious for you
>>>> ?
>>>>
>>>
>>> Yvan, could you send me the reduced preprocessed case and the options for
>>> cc1 to reproduce it.
>>
>>
>>
>> Here is cc1 command line :
>>
>> cc1 -quiet -march=armv7-a -mtune=cortex-a15 -mfloat-abi=hard
>> -mfpu=neon -mthumb  v2.c  -O2
>>
>> I use a native build on a chromebook, but it's reproducible with a
>> cross compiler.
>>
>> With the attached test case the issue is when processing insn 118.
>
>
> The offset is updated two times and that is wrong.  That is because memory
> in init insn is shared by ira_reg_equiv and the test involves 2 equivalent
> substitutions.  As I wrote equiv should be stored in original form by the
> current patch design.  Simple copying will not work as the first
> substitution is not done in this case.
>
> I need some time to think how to fix it better still I'll try to fix it
> tomorrow.  I expected that the patch might have some problems.  The patch
> code is quite big although it is just a long standing PR fix. Therefore that
> was my first PR fixed on stage 3.  It is good to have it tested earlier and
> sorry to break some arm tests.
>
>
diff mbox

Patch

Index: ../../gcc/gcc/config/aarch64/aarch64.c
===================================================================
--- ../../gcc/gcc/config/aarch64/aarch64.c	(revision 205590)
+++ ../../gcc/gcc/config/aarch64/aarch64.c	(working copy)
@@ -1703,7 +1703,7 @@  aarch64_frame_pointer_required (void)
    if (flag_omit_frame_pointer && !faked_omit_frame_pointer)
      return false;
    else if (flag_omit_leaf_frame_pointer)
-    return !crtl->is_leaf;
+    return !crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM);
    return true;
  }

@@ -4126,7 +4126,7 @@  aarch64_can_eliminate (const int from, c
  	 of faked_omit_frame_pointer here (which is true when we always
  	 wish to keep non-leaf frame pointers but only wish to keep leaf frame
  	 pointers when LR is clobbered).  */
-      if (from == FRAME_POINTER_REGNUM && to == STACK_POINTER_REGNUM
+      if (to == STACK_POINTER_REGNUM
  	  && df_regs_ever_live_p (LR_REGNUM)
  	  && faked_omit_frame_pointer)
  	return false;