diff mbox series

[PR84877] Dynamically align the address for local parameter copy on the stack when required alignment is larger than MAX_SUPPORTED_STACK_ALIGNMENT

Message ID a2aabf1f-2663-816b-90f2-897394c10a1e@foss.arm.com
State New
Headers show
Series [PR84877] Dynamically align the address for local parameter copy on the stack when required alignment is larger than MAX_SUPPORTED_STACK_ALIGNMENT | expand

Commit Message

Renlin Li March 22, 2018, 11:56 a.m. UTC
Hi all,

As described in PR84877. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877
The local copy of parameter on stack is not aligned.

For BLKmode paramters, a local copy on the stack will be saved.
There are three cases:
1) arguments passed partially on the stack, partially via registers.
2) arguments passed fully on the stack.
3) arguments passed via registers.

After the change here, in all three cases, the stack slot for the local parameter copy is aligned by the data type.
The stack slot is the DECL_RTL of the parameter. All the references thereafter in the function will refer to this RTL.

To populate the local copy on the stack,
For case 1) and 2), there are operations to move data from the caller's stack (from incoming rtl) into callee's stack.
For case 3), the registers are directly saved into the stack slot.

In all cases, the destination address is properly aligned.
But for case 1) and case 2), the source address is not aligned by the type. It is defined by the PCS how the arguments are prepared.
The block move operation is fulfilled by emit_block_move (). As far as I can see,
it will use the smaller alignment of source and destination.
This looks fine as long as we don't use instructions which requires a strict larger alignment than the address actually has.

Here, it only changes receiving parameters.
The function assign_stack_local_1 will be called in various places.
Usually, the caller will constraint the ALIGN parameter. For example via STACK_SLOT_ALIGNMENT macro.
assign_parm_setup_block will call assign_stack_local () with alignment from the parameter type which in this case could be
larger than MAX_SUPPORTED_STACK_ALIGNMENT.

The alignment operation for parameter copy on the stack is similar to stack vars.
First, enough space is reserved on the stack. The size is fixed at compile time.
Instructions are emitted to dynamically get an aligned address at runtime within this piece of memory.

This will unavoidably increase the usage of stack. However, it really depends on
how many over-aligned parameters are passed by value.

x86-linux, arm-none-eabi, aarch64-one-elf regression test Okay.
linux-armhf bootstrap Okay.
   	
I assume there are other targets which will be affected by the change.
But I don't have environment to test.

Okay the commit?
   	

Regards,
Renlin

gcc/

2018-03-22  Renlin Li  <renlin.li@arm.com>

	PR middle-end/84877
	* explow.h (get_dynamic_stack_size): Declare it as external.
	* explow.c (record_new_stack_level): Remove function static attribute.
	* function.c (assign_stack_local_1): Dynamically align the stack slot
	addr for parameter copy on the stack.

gcc/testsuite/

2018-03-22  Renlin Li  <renlin.li@arm.com>

	PR middle-end/84877
	* gcc.dg/pr84877.c: New.

Comments

H.J. Lu March 22, 2018, 12:42 p.m. UTC | #1
On Thu, Mar 22, 2018 at 4:56 AM, Renlin Li <renlin.li@foss.arm.com> wrote:
> Hi all,
>
> As described in PR84877. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877
> The local copy of parameter on stack is not aligned.
>
> For BLKmode paramters, a local copy on the stack will be saved.
> There are three cases:
> 1) arguments passed partially on the stack, partially via registers.
> 2) arguments passed fully on the stack.
> 3) arguments passed via registers.
>
> After the change here, in all three cases, the stack slot for the local
> parameter copy is aligned by the data type.
> The stack slot is the DECL_RTL of the parameter. All the references
> thereafter in the function will refer to this RTL.
>
> To populate the local copy on the stack,
> For case 1) and 2), there are operations to move data from the caller's
> stack (from incoming rtl) into callee's stack.
> For case 3), the registers are directly saved into the stack slot.
>
> In all cases, the destination address is properly aligned.
> But for case 1) and case 2), the source address is not aligned by the type.
> It is defined by the PCS how the arguments are prepared.
> The block move operation is fulfilled by emit_block_move (). As far as I can
> see,
> it will use the smaller alignment of source and destination.
> This looks fine as long as we don't use instructions which requires a strict
> larger alignment than the address actually has.
>
> Here, it only changes receiving parameters.
> The function assign_stack_local_1 will be called in various places.
> Usually, the caller will constraint the ALIGN parameter. For example via
> STACK_SLOT_ALIGNMENT macro.
> assign_parm_setup_block will call assign_stack_local () with alignment from
> the parameter type which in this case could be
> larger than MAX_SUPPORTED_STACK_ALIGNMENT.
>
> The alignment operation for parameter copy on the stack is similar to stack
> vars.
> First, enough space is reserved on the stack. The size is fixed at compile
> time.
> Instructions are emitted to dynamically get an aligned address at runtime
> within this piece of memory.
>
> This will unavoidably increase the usage of stack. However, it really
> depends on
> how many over-aligned parameters are passed by value.
>
> x86-linux, arm-none-eabi, aarch64-one-elf regression test Okay.
> linux-armhf bootstrap Okay.
>
> I assume there are other targets which will be affected by the change.
> But I don't have environment to test.
>
> Okay the commit?
>
>
> Regards,
> Renlin
>
> gcc/
>
> 2018-03-22  Renlin Li  <renlin.li@arm.com>
>
>         PR middle-end/84877
>         * explow.h (get_dynamic_stack_size): Declare it as external.
>         * explow.c (record_new_stack_level): Remove function static
> attribute.
>         * function.c (assign_stack_local_1): Dynamically align the stack
> slot
>         addr for parameter copy on the stack.
>

Will this coded be used on x86?  X86 backend can dynamically align
stack just fine without the above change.
H.J. Lu March 22, 2018, 12:52 p.m. UTC | #2
On Thu, Mar 22, 2018 at 4:56 AM, Renlin Li <renlin.li@foss.arm.com> wrote:
> Hi all,
>
> As described in PR84877. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877
> The local copy of parameter on stack is not aligned.
>
> For BLKmode paramters, a local copy on the stack will be saved.
> There are three cases:
> 1) arguments passed partially on the stack, partially via registers.
> 2) arguments passed fully on the stack.
> 3) arguments passed via registers.
>
> After the change here, in all three cases, the stack slot for the local
> parameter copy is aligned by the data type.
> The stack slot is the DECL_RTL of the parameter. All the references
> thereafter in the function will refer to this RTL.
>
> To populate the local copy on the stack,
> For case 1) and 2), there are operations to move data from the caller's
> stack (from incoming rtl) into callee's stack.
> For case 3), the registers are directly saved into the stack slot.
>
> In all cases, the destination address is properly aligned.
> But for case 1) and case 2), the source address is not aligned by the type.
> It is defined by the PCS how the arguments are prepared.
> The block move operation is fulfilled by emit_block_move (). As far as I can
> see,
> it will use the smaller alignment of source and destination.
> This looks fine as long as we don't use instructions which requires a strict
> larger alignment than the address actually has.
>
> Here, it only changes receiving parameters.
> The function assign_stack_local_1 will be called in various places.
> Usually, the caller will constraint the ALIGN parameter. For example via
> STACK_SLOT_ALIGNMENT macro.
> assign_parm_setup_block will call assign_stack_local () with alignment from
> the parameter type which in this case could be
> larger than MAX_SUPPORTED_STACK_ALIGNMENT.
>
> The alignment operation for parameter copy on the stack is similar to stack
> vars.
> First, enough space is reserved on the stack. The size is fixed at compile
> time.
> Instructions are emitted to dynamically get an aligned address at runtime
> within this piece of memory.
>
> This will unavoidably increase the usage of stack. However, it really
> depends on
> how many over-aligned parameters are passed by value.
>
> x86-linux, arm-none-eabi, aarch64-one-elf regression test Okay.
> linux-armhf bootstrap Okay.
>
> I assume there are other targets which will be affected by the change.
> But I don't have environment to test.
>
> Okay the commit?
>
>
> Regards,
> Renlin
>
> gcc/
>
> 2018-03-22  Renlin Li  <renlin.li@arm.com>
>
>         PR middle-end/84877
>         * explow.h (get_dynamic_stack_size): Declare it as external.
>         * explow.c (record_new_stack_level): Remove function static
> attribute.
>         * function.c (assign_stack_local_1): Dynamically align the stack
> slot
>         addr for parameter copy on the stack.
>
> gcc/testsuite/
>
> 2018-03-22  Renlin Li  <renlin.li@arm.com>
>
>         PR middle-end/84877
>         * gcc.dg/pr84877.c: New.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

But your patch has

diff --git a/gcc/testsuite/gcc.target/arm/pr84877.c
b/gcc/testsuite/gcc.target/arm/pr84877.c
new file mode 100644
index 0000000000000000000000000000000000000000..b2df8a954f566dea3a4f1ed70572b43de39dda82
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr84877.c

Why is this ARM specific?
H.J. Lu March 22, 2018, 12:55 p.m. UTC | #3
On Thu, Mar 22, 2018 at 5:52 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Mar 22, 2018 at 4:56 AM, Renlin Li <renlin.li@foss.arm.com> wrote:
>> Hi all,
>>
>> As described in PR84877. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877
>> The local copy of parameter on stack is not aligned.
>>
>> For BLKmode paramters, a local copy on the stack will be saved.
>> There are three cases:
>> 1) arguments passed partially on the stack, partially via registers.
>> 2) arguments passed fully on the stack.
>> 3) arguments passed via registers.
>>
>> After the change here, in all three cases, the stack slot for the local
>> parameter copy is aligned by the data type.
>> The stack slot is the DECL_RTL of the parameter. All the references
>> thereafter in the function will refer to this RTL.
>>
>> To populate the local copy on the stack,
>> For case 1) and 2), there are operations to move data from the caller's
>> stack (from incoming rtl) into callee's stack.
>> For case 3), the registers are directly saved into the stack slot.
>>
>> In all cases, the destination address is properly aligned.
>> But for case 1) and case 2), the source address is not aligned by the type.
>> It is defined by the PCS how the arguments are prepared.
>> The block move operation is fulfilled by emit_block_move (). As far as I can
>> see,
>> it will use the smaller alignment of source and destination.
>> This looks fine as long as we don't use instructions which requires a strict
>> larger alignment than the address actually has.
>>
>> Here, it only changes receiving parameters.
>> The function assign_stack_local_1 will be called in various places.
>> Usually, the caller will constraint the ALIGN parameter. For example via
>> STACK_SLOT_ALIGNMENT macro.
>> assign_parm_setup_block will call assign_stack_local () with alignment from
>> the parameter type which in this case could be
>> larger than MAX_SUPPORTED_STACK_ALIGNMENT.
>>
>> The alignment operation for parameter copy on the stack is similar to stack
>> vars.
>> First, enough space is reserved on the stack. The size is fixed at compile
>> time.
>> Instructions are emitted to dynamically get an aligned address at runtime
>> within this piece of memory.
>>
>> This will unavoidably increase the usage of stack. However, it really
>> depends on
>> how many over-aligned parameters are passed by value.
>>
>> x86-linux, arm-none-eabi, aarch64-one-elf regression test Okay.
>> linux-armhf bootstrap Okay.
>>
>> I assume there are other targets which will be affected by the change.
>> But I don't have environment to test.
>>
>> Okay the commit?
>>
>>
>> Regards,
>> Renlin
>>
>> gcc/
>>
>> 2018-03-22  Renlin Li  <renlin.li@arm.com>
>>
>>         PR middle-end/84877
>>         * explow.h (get_dynamic_stack_size): Declare it as external.
>>         * explow.c (record_new_stack_level): Remove function static
>> attribute.
>>         * function.c (assign_stack_local_1): Dynamically align the stack
>> slot
>>         addr for parameter copy on the stack.
>>
>> gcc/testsuite/
>>
>> 2018-03-22  Renlin Li  <renlin.li@arm.com>
>>
>>         PR middle-end/84877
>>         * gcc.dg/pr84877.c: New.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> But your patch has
>
> diff --git a/gcc/testsuite/gcc.target/arm/pr84877.c
> b/gcc/testsuite/gcc.target/arm/pr84877.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b2df8a954f566dea3a4f1ed70572b43de39dda82
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pr84877.c
>
> Why is this ARM specific?

+#include <inttypes.h>
+
+struct U {
+    int M0;
+    int M1;
+} __attribute ((aligned(16)));
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Some targets align stack to 16 bytes by default.
You need 32 byte alignment to better test stack alignment.

+
Renlin Li March 22, 2018, 4:34 p.m. UTC | #4
Hi H.J.

On 22/03/18 12:55, H.J. Lu wrote:
> On Thu, Mar 22, 2018 at 5:52 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Thu, Mar 22, 2018 at 4:56 AM, Renlin Li <renlin.li@foss.arm.com> wrote:
>>> Hi all,
>>>
>>> As described in PR84877. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877
>>> The local copy of parameter on stack is not aligned.
>>>
>>> For BLKmode paramters, a local copy on the stack will be saved.
>>> There are three cases:
>>> 1) arguments passed partially on the stack, partially via registers.
>>> 2) arguments passed fully on the stack.
>>> 3) arguments passed via registers.
>>>
>>> After the change here, in all three cases, the stack slot for the local
>>> parameter copy is aligned by the data type.
>>> The stack slot is the DECL_RTL of the parameter. All the references
>>> thereafter in the function will refer to this RTL.
>>>
>>> To populate the local copy on the stack,
>>> For case 1) and 2), there are operations to move data from the caller's
>>> stack (from incoming rtl) into callee's stack.
>>> For case 3), the registers are directly saved into the stack slot.
>>>
>>> In all cases, the destination address is properly aligned.
>>> But for case 1) and case 2), the source address is not aligned by the type.
>>> It is defined by the PCS how the arguments are prepared.
>>> The block move operation is fulfilled by emit_block_move (). As far as I can
>>> see,
>>> it will use the smaller alignment of source and destination.
>>> This looks fine as long as we don't use instructions which requires a strict
>>> larger alignment than the address actually has.
>>>
>>> Here, it only changes receiving parameters.
>>> The function assign_stack_local_1 will be called in various places.
>>> Usually, the caller will constraint the ALIGN parameter. For example via
>>> STACK_SLOT_ALIGNMENT macro.
>>> assign_parm_setup_block will call assign_stack_local () with alignment from
>>> the parameter type which in this case could be
>>> larger than MAX_SUPPORTED_STACK_ALIGNMENT.
>>>
>>> The alignment operation for parameter copy on the stack is similar to stack
>>> vars.
>>> First, enough space is reserved on the stack. The size is fixed at compile
>>> time.
>>> Instructions are emitted to dynamically get an aligned address at runtime
>>> within this piece of memory.
>>>
>>> This will unavoidably increase the usage of stack. However, it really
>>> depends on
>>> how many over-aligned parameters are passed by value.
>>>
>>> x86-linux, arm-none-eabi, aarch64-one-elf regression test Okay.
>>> linux-armhf bootstrap Okay.
>>>
>>> I assume there are other targets which will be affected by the change.
>>> But I don't have environment to test.
>>>
>>> Okay the commit?
>>>
>>>
>>> Regards,
>>> Renlin
>>>
>>> gcc/
>>>
>>> 2018-03-22  Renlin Li  <renlin.li@arm.com>
>>>
>>>          PR middle-end/84877
>>>          * explow.h (get_dynamic_stack_size): Declare it as external.
>>>          * explow.c (record_new_stack_level): Remove function static
>>> attribute.
>>>          * function.c (assign_stack_local_1): Dynamically align the stack
>>> slot
>>>          addr for parameter copy on the stack.
>>>
>>> gcc/testsuite/
>>>
>>> 2018-03-22  Renlin Li  <renlin.li@arm.com>
>>>
>>>          PR middle-end/84877
>>>          * gcc.dg/pr84877.c: New.
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> But your patch has
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/pr84877.c
>> b/gcc/testsuite/gcc.target/arm/pr84877.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..b2df8a954f566dea3a4f1ed70572b43de39dda82
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/pr84877.c
>>
>> Why is this ARM specific?
I attached the wrong patch. The testcase should really be gcc/testsuite/gcc.dg/pr84877.c

> 
> +#include <inttypes.h>
> +
> +struct U {
> +    int M0;
> +    int M1;
> +} __attribute ((aligned(16)));
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Some targets align stack to 16 bytes by default.
> You need 32 byte alignment to better test stack alignment.
> 
yes, you are right here.

IIUIC, as for x86 and aarch64, the STRICT_ALIGNMNET marco is false and the STACK_BOUNDARY is 128.
For this particular test case here, the patch didn't make any difference.

As the code here indicates, for parameters passed on the stack fully or partially, a local copy
will be allocated when the following condition is true. Otherwise, it will use the stack space prepared by the caller.
And for that part, it is ABI specific.

>   /* If we can't trust the parm stack slot to be aligned enough for its
>      ultimate type, don't use that slot after entry.  We'll make another
>      stack slot, if we need one.  */
>   if (stack_parm
>       && ((STRICT_ALIGNMENT
> 	   && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm))
> 	  || (data->nominal_type
> 	      && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
> 	      && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
>     stack_parm = NULL;

So even I made the alignment 32 byte, in this case, the value will be passed on the stack.
the stack slot from caller will be used. No local copy will be created.
It won't shown any code-gen difference in x86 target.

The alignment test will test the stack address prepared by the caller.


Regards,
Renlin



> +
>
Jeff Law June 29, 2018, 7:34 p.m. UTC | #5
On 03/22/2018 05:56 AM, Renlin Li wrote:
> Hi all,
> 
> As described in PR84877. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877
> The local copy of parameter on stack is not aligned.
> 
> For BLKmode paramters, a local copy on the stack will be saved.
> There are three cases:
> 1) arguments passed partially on the stack, partially via registers.
> 2) arguments passed fully on the stack.
> 3) arguments passed via registers.
> 
> After the change here, in all three cases, the stack slot for the local parameter copy is aligned by the data type.Presumably this is only for named arguments.  If we have to deal with
stdarg/varargs there's a number of additional complications we'd need to
worry about.


> 
> The stack slot is the DECL_RTL of the parameter. All the references thereafter in the function will refer to this RTL.
OK.  This implies we're dealing strictly with named arguments.


> 
> 
> To populate the local copy on the stack,
> For case 1) and 2), there are operations to move data from the caller's stack (from incoming rtl) into callee's stack.
> 
> For case 3), the registers are directly saved into the stack slot.
> 
> In all cases, the destination address is properly aligned.
> But for case 1) and case 2), the source address is not aligned by the type. It is defined by the PCS how the arguments are prepared.
I'm not 100% sure the destination is always aligned.  I vaguely recall
the PA being an oddball on this kind of stuff.



> 
> The block move operation is fulfilled by emit_block_move (). As far as I can see,
Yes.  But we may have had to flush argument registers to memory prior to
using emit_block_move.  And the flushing operation can be odd because of
things like alignment, padding, etc.  The PA in particular was an
oddball here, but I don't remember the precise details.


> 
> it will use the smaller alignment of source and destination.
> This looks fine as long as we don't use instructions which requires a strict larger alignment than the address actually has.
Right.


> 
> 
> Here, it only changes receiving parameters.
> The function assign_stack_local_1 will be called in various places.
> Usually, the caller will constraint the ALIGN parameter. For example via STACK_SLOT_ALIGNMENT macro.
> 
> assign_parm_setup_block will call assign_stack_local () with alignment from the parameter type which in this case could be
> 
> larger than MAX_SUPPORTED_STACK_ALIGNMENT.
> 
> The alignment operation for parameter copy on the stack is similar to stack vars.
> 
> First, enough space is reserved on the stack. The size is fixed at compile time.
> 
> Instructions are emitted to dynamically get an aligned address at runtime within this piece of memory.
At least that's how it's supposed to work.  I have some concerns about
the existing dynamic alignment bits independent of your change.


> 
> 
> This will unavoidably increase the usage of stack. However, it really depends on
> 
> how many over-aligned parameters are passed by value.
It's relatively rare in my experience, so I wouldn't let this get in the
way.


> 
> x86-linux, arm-none-eabi, aarch64-one-elf regression test Okay.
> linux-armhf bootstrap Okay.
>       
> I assume there are other targets which will be affected by the change.
> But I don't have environment to test.
I don't think my tester will help much here as over-aligned parameters
are relatively rare and likely not triggered by bootstraps.

> 
> Okay the commit?
>       
> 
> Regards,
> Renlin
> 
> gcc/
> 
> 2018-03-22  Renlin Li  <renlin.li@arm.com>
> 
>     PR middle-end/84877
>     * explow.h (get_dynamic_stack_size): Declare it as external.
>     * explow.c (record_new_stack_level): Remove function static attribute.
>     * function.c (assign_stack_local_1): Dynamically align the stack slot
>     addr for parameter copy on the stack.
> 
> gcc/testsuite/
> 
> 2018-03-22  Renlin Li  <renlin.li@arm.com>
> 
>     PR middle-end/84877
>     * gcc.dg/pr84877.c: New.
OK.  Certainly keep an eye out for issues on other targets.
Jeff
Renlin Li July 23, 2018, 1:40 p.m. UTC | #6
Hi Jeff,

On 06/29/2018 08:34 PM, Jeff Law wrote:
> On 03/22/2018 05:56 AM, Renlin Li wrote:
>> Hi all,
>>
>> As described in PR84877. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877
>> The local copy of parameter on stack is not aligned.
>>
>> For BLKmode paramters, a local copy on the stack will be saved.
>> There are three cases:
>> 1) arguments passed partially on the stack, partially via registers.
>> 2) arguments passed fully on the stack.
>> 3) arguments passed via registers.
>>
>> After the change here, in all three cases, the stack slot for the local parameter copy is aligned by the data type.Presumably this is only for named arguments.  If we have to deal with
> stdarg/varargs there's a number of additional complications we'd need to
> worry about.
> 
> 
>>
>> The stack slot is the DECL_RTL of the parameter. All the references thereafter in the function will refer to this RTL.
> OK.  This implies we're dealing strictly with named arguments.

Yes, only for named arguments.

> 
> 
>>
>>
>> To populate the local copy on the stack,
>> For case 1) and 2), there are operations to move data from the caller's stack (from incoming rtl) into callee's stack.
>>
>> For case 3), the registers are directly saved into the stack slot.
>>
>> In all cases, the destination address is properly aligned.
>> But for case 1) and case 2), the source address is not aligned by the type. It is defined by the PCS how the arguments are prepared.
> I'm not 100% sure the destination is always aligned.  I vaguely recall
> the PA being an oddball on this kind of stuff.
> 
> 
> 
>>
>> The block move operation is fulfilled by emit_block_move (). As far as I can see,
> Yes.  But we may have had to flush argument registers to memory prior to
> using emit_block_move.  And the flushing operation can be odd because of
> things like alignment, padding, etc.  The PA in particular was an
> oddball here, but I don't remember the precise details.

I might not paragraph it properly, Block move will be generated for parameters passed on the stack.
If the parameters are of BLKmode but passed via registers (case target by this patch), in general cases, it will
move the reg into its stack slot first.

And the stack slot is dynamically aligned with this patch.

> 
> 
>>
>> it will use the smaller alignment of source and destination.
>> This looks fine as long as we don't use instructions which requires a strict larger alignment than the address actually has.
> Right.
> 
> 
>>
>>
>> Here, it only changes receiving parameters.
>> The function assign_stack_local_1 will be called in various places.
>> Usually, the caller will constraint the ALIGN parameter. For example via STACK_SLOT_ALIGNMENT macro.
>>
>> assign_parm_setup_block will call assign_stack_local () with alignment from the parameter type which in this case could be
>>
>> larger than MAX_SUPPORTED_STACK_ALIGNMENT.
>>
>> The alignment operation for parameter copy on the stack is similar to stack vars.
>>
>> First, enough space is reserved on the stack. The size is fixed at compile time.
>>
>> Instructions are emitted to dynamically get an aligned address at runtime within this piece of memory.
> At least that's how it's supposed to work.  I have some concerns about
> the existing dynamic alignment bits independent of your change.

I will double check.
> 
> 
>>
>>
>> This will unavoidably increase the usage of stack. However, it really depends on
>>
>> how many over-aligned parameters are passed by value.
> It's relatively rare in my experience, so I wouldn't let this get in the
> way.
> 
> 
>>
>> x86-linux, arm-none-eabi, aarch64-one-elf regression test Okay.
>> linux-armhf bootstrap Okay.
>>        
>> I assume there are other targets which will be affected by the change.
>> But I don't have environment to test.
> I don't think my tester will help much here as over-aligned parameters
> are relatively rare and likely not triggered by bootstraps.
> 
>>
>> Okay the commit?
>>        
>>
>> Regards,
>> Renlin
>>
>> gcc/
>>
>> 2018-03-22  Renlin Li  <renlin.li@arm.com>
>>
>>      PR middle-end/84877
>>      * explow.h (get_dynamic_stack_size): Declare it as external.
>>      * explow.c (record_new_stack_level): Remove function static attribute.
>>      * function.c (assign_stack_local_1): Dynamically align the stack slot
>>      addr for parameter copy on the stack.
>>
>> gcc/testsuite/
>>
>> 2018-03-22  Renlin Li  <renlin.li@arm.com>
>>
>>      PR middle-end/84877
>>      * gcc.dg/pr84877.c: New.
> OK.  Certainly keep an eye out for issues on other targets.
> Jeff
> 

Thanks! I will run regression test again and commit it if there is no regression.
Will track and take related issues if there are any.


Thanks,
Renlin
Eric Botcazou Dec. 13, 2018, 11:23 a.m. UTC | #7
> 2018-03-22  Renlin Li  <renlin.li@arm.com>
> 
> 	PR middle-end/84877
> 	* explow.h (get_dynamic_stack_size): Declare it as external.
> 	* explow.c (record_new_stack_level): Remove function static attribute.
> 	* function.c (assign_stack_local_1): Dynamically align the stack slot
> 	addr for parameter copy on the stack.
> 
> gcc/testsuite/
> 
> 2018-03-22  Renlin Li  <renlin.li@arm.com>
> 
> 	PR middle-end/84877
> 	* gcc.dg/pr84877.c: New.

As reported by Rainer, the test doesn't pass on several targets.  And the 
patch breaks 64-bit Windows, see PR bootstrap/88450.

More generally, the patch totally disregards the SUPPORTS_STACK_ALIGNMENT 
machinery of the middle-end, which is quite surprising given that this 
machinery was precisely designed for this purpose.  Why is it so?
diff mbox series

Patch

diff --git a/gcc/explow.h b/gcc/explow.h
index 18c13804b067d64dea159c0deef8e4f011be47ee..b263d353b84b9c10d04b9a8d7257c14c1c7b7ccc 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -104,6 +104,9 @@  extern void get_dynamic_stack_size (rtx *, unsigned, unsigned, HOST_WIDE_INT *);
 /* Returns the address of the dynamic stack space without allocating it.  */
 extern rtx get_dynamic_stack_base (poly_int64, unsigned);
 
+/* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
+extern rtx align_dynamic_address (rtx, unsigned);
+
 /* Emit one stack probe at ADDRESS, an address within the stack.  */
 extern void emit_stack_probe (rtx);
 
diff --git a/gcc/explow.c b/gcc/explow.c
index 042e71904ec897f6f8c6964119d4318dfe51bcc4..e7e20c1ec255be81e42c3d904191fa6c17c8cc77 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1175,9 +1175,10 @@  record_new_stack_level (void)
   if (targetm_common.except_unwind_info (&global_options) == UI_SJLJ)
     update_sjlj_context ();
 }
-
+
 /* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
-static rtx
+
+rtx
 align_dynamic_address (rtx target, unsigned required_align)
 {
   /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
diff --git a/gcc/function.c b/gcc/function.c
index 1a09ff0d31e2c95dec179b5cb2a9883b05fcc3d7..036be3b4912086a153939b97583e9212237b9b64 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -372,6 +372,7 @@  assign_stack_local_1 (machine_mode mode, poly_int64 size,
   poly_int64 bigend_correction = 0;
   poly_int64 slot_offset = 0, old_frame_offset;
   unsigned int alignment, alignment_in_bits;
+  bool dynamic_align_addr = false;
 
   if (align == 0)
     {
@@ -390,14 +391,20 @@  assign_stack_local_1 (machine_mode mode, poly_int64 size,
 
   alignment_in_bits = alignment * BITS_PER_UNIT;
 
-  /* Ignore alignment if it exceeds MAX_SUPPORTED_STACK_ALIGNMENT.  */
   if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT)
     {
-      alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT;
-      alignment = alignment_in_bits / BITS_PER_UNIT;
+      /* If the required alignment exceeds MAX_SUPPORTED_STACK_ALIGNMENT and
+	 it is not OK to reduce it.  Align the slot dynamically.  */
+      if (mode == BLKmode)
+	dynamic_align_addr = true;
+      else
+	{
+	  alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT;
+	  alignment = MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT;
+	}
     }
 
-  if (SUPPORTS_STACK_ALIGNMENT)
+  if (SUPPORTS_STACK_ALIGNMENT && !dynamic_align_addr)
     {
       if (crtl->stack_alignment_estimated < alignment_in_bits)
 	{
@@ -427,10 +434,42 @@  assign_stack_local_1 (machine_mode mode, poly_int64 size,
 	}
     }
 
-  if (crtl->stack_alignment_needed < alignment_in_bits)
-    crtl->stack_alignment_needed = alignment_in_bits;
-  if (crtl->max_used_stack_slot_alignment < alignment_in_bits)
-    crtl->max_used_stack_slot_alignment = alignment_in_bits;
+  /* Handle overalignment here for parameter copy on the stack.
+     Reserved enough space for it and dynamically align the address.
+     No free frame_space is added here.  */
+  if (dynamic_align_addr)
+    {
+      rtx allocsize = gen_int_mode (size, Pmode);
+      get_dynamic_stack_size (&allocsize, 0, alignment_in_bits, NULL);
+
+      /* This is the size of space needed to accommodate required size of data
+	 with given alignment.  */
+      poly_int64 len = rtx_to_poly_int64 (allocsize);
+      old_frame_offset = frame_offset;
+
+      if (FRAME_GROWS_DOWNWARD)
+	{
+	  frame_offset -= len;
+	  try_fit_stack_local (frame_offset, len, len,
+			       PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT,
+			       &slot_offset);
+	}
+      else
+	{
+	  frame_offset += len;
+	  try_fit_stack_local (old_frame_offset, len, len,
+			       PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT,
+			       &slot_offset);
+	}
+      goto found_space;
+    }
+  else
+    {
+      if (crtl->stack_alignment_needed < alignment_in_bits)
+	crtl->stack_alignment_needed = alignment_in_bits;
+      if (crtl->max_used_stack_slot_alignment < alignment_in_bits)
+	crtl->max_used_stack_slot_alignment = alignment_in_bits;
+    }
 
   if (mode != BLKmode || maybe_ne (size, 0))
     {
@@ -517,6 +556,12 @@  assign_stack_local_1 (machine_mode mode, poly_int64 size,
 			  (slot_offset + bigend_correction,
 			   Pmode));
 
+  if (dynamic_align_addr)
+    {
+      addr = align_dynamic_address (addr, alignment_in_bits);
+      mark_reg_pointer (addr, alignment_in_bits);
+    }
+
   x = gen_rtx_MEM (mode, addr);
   set_mem_align (x, alignment_in_bits);
   MEM_NOTRAP_P (x) = 1;
diff --git a/gcc/testsuite/gcc.target/arm/pr84877.c b/gcc/testsuite/gcc.target/arm/pr84877.c
new file mode 100644
index 0000000000000000000000000000000000000000..b2df8a954f566dea3a4f1ed70572b43de39dda82
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr84877.c
@@ -0,0 +1,38 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include <inttypes.h>
+
+struct U {
+    int M0;
+    int M1;
+} __attribute ((aligned(16)));
+
+volatile struct U p0 = {1, 0};
+
+void __attribute__ ((noinline))
+foo (struct U p)
+{
+
+  volatile intptr_t mask = 0b1111;
+  volatile int dummy[2];
+  struct U p1 = p;
+  dummy[1] = p.M0;
+
+  if ((intptr_t)(&p1) & mask)
+    __builtin_abort ();
+  if ((intptr_t)(&p) & mask)
+    __builtin_abort ();
+
+  if (p1.M0 != dummy[1])
+    __builtin_abort ();
+  if (p1.M1 != p.M1)
+    __builtin_abort ();
+}
+
+int
+main ()
+{
+  foo (p0);
+  return 0;
+}