diff mbox series

Aligning stack offsets for spills

Message ID 98179c8e-bcec-83ed-5b99-6f54791bd7cd@tachyum.com
State New
Headers show
Series Aligning stack offsets for spills | expand

Commit Message

Jeff Law June 7, 2021, 7 p.m. UTC
So, as many of you know I left Red Hat a while ago and joined Tachyum.  
We're building a new processor and we've come across an issue where I 
think we need upstream discussion.

I can't divulge many of the details right now, but one of the quirks of 
our architecture is that reg+d addressing modes for our vector 
loads/stores require the displacement to be aligned.  This is an 
artifact of how these instructions are encoded.

Obviously we can emit a load of the address into a register when the 
displacement isn't aligned.  From a correctness point that works 
perfectly.  Unfortunately, it's a significant performance hit on some 
standard benchmarks (spec) where we have a great number of spills of 
vector objects into the stack at unaligned offsets in the hot parts of 
the code.


We've considered 3 possible approaches to solve this problem.

1. When the displacement isn't properly aligned, allocate more space in 
assign_stack_local so that we can make the offset aligned.  The downside 
is this potentially burns a lot of stack space, but in practice the cost 
was minimal (16 bytes in a 9k frame)  From a performance standpoint this 
works perfectly.

2. Abuse the register elimination code to create a second pointer into 
the stack.  Spills would start as <virtual> + offset, then either get 
eliminated to sp+offset' when the offset is aligned or gpr+offset'' when 
the offset wasn't properly aligned. We started a bit down this path, but 
with #1 working so well, we didn't get this approach to proof-of-concept.

3. Hack up the post-reload optimizers to fix things up as best as we 
can.  This may still be advantageous, but again with #1 working so well, 
we didn't explore this in any significant way.  We may still look at 
this at some point in other contexts.



Here's what we're playing with.  Obviously we'd need a target hook to 
drive this behavior.  I was thinking that we'd pass in any slot offset 
alignment requirements (from the target hook) to assign_stack_local and 
that would bubble down to this point in try_fit_stack_local:

Comments

Richard Biener June 8, 2021, 6:56 a.m. UTC | #1
On Mon, Jun 7, 2021 at 9:00 PM Jeff Law <jlaw@tachyum.com> wrote:
>
>
> So, as many of you know I left Red Hat a while ago and joined Tachyum.
> We're building a new processor and we've come across an issue where I
> think we need upstream discussion.
>
> I can't divulge many of the details right now, but one of the quirks of
> our architecture is that reg+d addressing modes for our vector
> loads/stores require the displacement to be aligned.  This is an
> artifact of how these instructions are encoded.
>
> Obviously we can emit a load of the address into a register when the
> displacement isn't aligned.  From a correctness point that works
> perfectly.  Unfortunately, it's a significant performance hit on some
> standard benchmarks (spec) where we have a great number of spills of
> vector objects into the stack at unaligned offsets in the hot parts of
> the code.
>
>
> We've considered 3 possible approaches to solve this problem.
>
> 1. When the displacement isn't properly aligned, allocate more space in
> assign_stack_local so that we can make the offset aligned.  The downside
> is this potentially burns a lot of stack space, but in practice the cost
> was minimal (16 bytes in a 9k frame)  From a performance standpoint this
> works perfectly.
>
> 2. Abuse the register elimination code to create a second pointer into
> the stack.  Spills would start as <virtual> + offset, then either get
> eliminated to sp+offset' when the offset is aligned or gpr+offset'' when
> the offset wasn't properly aligned. We started a bit down this path, but
> with #1 working so well, we didn't get this approach to proof-of-concept.
>
> 3. Hack up the post-reload optimizers to fix things up as best as we
> can.  This may still be advantageous, but again with #1 working so well,
> we didn't explore this in any significant way.  We may still look at
> this at some point in other contexts.
>

So just as extra info - you're pre-allocating the frame (including for spills)
and not using push/pop?

> Here's what we're playing with.  Obviously we'd need a target hook to
> drive this behavior.  I was thinking that we'd pass in any slot offset
> alignment requirements (from the target hook) to assign_stack_local and
> that would bubble down to this point in try_fit_stack_local:
>
> diff --git a/gcc/function.c b/gcc/function.c
> index d616f5f64f4..7f441b87a63 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -307,6 +307,14 @@ try_fit_stack_local (poly_int64 start, poly_int64
> length,
>     frame_off = targetm.starting_frame_offset () % frame_alignment;
>     frame_phase = frame_off ? frame_alignment - frame_off : 0;
>
> +  if (known_eq (size, 64) && alignment < 64)
> +    alignment = 64;
> +

I'm not familiar with the spill slot allocation code in GCC (I assume the above
is part of it) - do we in any way "sort" the spill slots so the extra required
padding is minimal?  Does the above guarantee that in the end the
offset will be aligned?  I assume IRA/LRA can still choose to eliminate
the respective frame pointer to sth else that ends up misaligning the offset
again?  Thus is it a real fix or a heuristic that ends up working most of
the time?

The actual alignment value should be dependent on the mode and
target preference and thus a target hook I suppose (you mention
this applies to vector loads/stores only).

Don't you have the very same issue with non-stack accesses?

Richard.

>     /* Round the frame offset to the specified alignment.  */
>
>     if (FRAME_GROWS_DOWNWARD)
>
> Thoughts?
>
> Jeff
Michael Matz June 8, 2021, 2:08 p.m. UTC | #2
Hello,

On Mon, 7 Jun 2021, Jeff Law wrote:

> 
> So, as many of you know I left Red Hat a while ago and joined Tachyum.  We're
> building a new processor and we've come across an issue where I think we need
> upstream discussion.
> 
> I can't divulge many of the details right now, but one of the quirks of our
> architecture is that reg+d addressing modes for our vector loads/stores
> require the displacement to be aligned.  This is an artifact of how these
> instructions are encoded.
> 
> Obviously we can emit a load of the address into a register when the
> displacement isn't aligned.  From a correctness point that works perfectly. 
> Unfortunately, it's a significant performance hit on some standard benchmarks
> (spec) where we have a great number of spills of vector objects into the stack
> at unaligned offsets in the hot parts of the code.
> 
> 
> We've considered 3 possible approaches to solve this problem.
> 
> 1. When the displacement isn't properly aligned, allocate more space in
> assign_stack_local so that we can make the offset aligned.  The downside is
> this potentially burns a lot of stack space, but in practice the cost was
> minimal (16 bytes in a 9k frame)  From a performance standpoint this works
> perfectly.
> 
> 2. Abuse the register elimination code to create a second pointer into the
> stack.  Spills would start as <virtual> + offset, then either get eliminated
> to sp+offset' when the offset is aligned or gpr+offset'' when the offset
> wasn't properly aligned. We started a bit down this path, but with #1 working
> so well, we didn't get this approach to proof-of-concept.
> 
> 3. Hack up the post-reload optimizers to fix things up as best as we can. 
> This may still be advantageous, but again with #1 working so well, we didn't
> explore this in any significant way.  We may still look at this at some point
> in other contexts.
> 
> Here's what we're playing with.  Obviously we'd need a target hook to 
> drive this behavior.  I was thinking that we'd pass in any slot offset 
> alignment requirements (from the target hook) to assign_stack_local and 
> that would bubble down to this point in try_fit_stack_local:

Why is the machinery involving STACK_SLOT_ALIGNMENT and 
spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for 
backing stack slots) not working for you?  If everything is setup 
correctly the input alignment to try_fit_stack_local ought to be correct 
already.


Ciao,
Michael.
Jeff Law June 8, 2021, 2:47 p.m. UTC | #3
On 6/8/2021 8:08 AM, Michael Matz wrote:
> Hello,
>
> On Mon, 7 Jun 2021, Jeff Law wrote:
>
>> So, as many of you know I left Red Hat a while ago and joined Tachyum.  We're
>> building a new processor and we've come across an issue where I think we need
>> upstream discussion.
>>
>> I can't divulge many of the details right now, but one of the quirks of our
>> architecture is that reg+d addressing modes for our vector loads/stores
>> require the displacement to be aligned.  This is an artifact of how these
>> instructions are encoded.
>>
>> Obviously we can emit a load of the address into a register when the
>> displacement isn't aligned.  From a correctness point that works perfectly.
>> Unfortunately, it's a significant performance hit on some standard benchmarks
>> (spec) where we have a great number of spills of vector objects into the stack
>> at unaligned offsets in the hot parts of the code.
>>
>>
>> We've considered 3 possible approaches to solve this problem.
>>
>> 1. When the displacement isn't properly aligned, allocate more space in
>> assign_stack_local so that we can make the offset aligned.  The downside is
>> this potentially burns a lot of stack space, but in practice the cost was
>> minimal (16 bytes in a 9k frame)  From a performance standpoint this works
>> perfectly.
>>
>> 2. Abuse the register elimination code to create a second pointer into the
>> stack.  Spills would start as <virtual> + offset, then either get eliminated
>> to sp+offset' when the offset is aligned or gpr+offset'' when the offset
>> wasn't properly aligned. We started a bit down this path, but with #1 working
>> so well, we didn't get this approach to proof-of-concept.
>>
>> 3. Hack up the post-reload optimizers to fix things up as best as we can.
>> This may still be advantageous, but again with #1 working so well, we didn't
>> explore this in any significant way.  We may still look at this at some point
>> in other contexts.
>>
>> Here's what we're playing with.  Obviously we'd need a target hook to
>> drive this behavior.  I was thinking that we'd pass in any slot offset
>> alignment requirements (from the target hook) to assign_stack_local and
>> that would bubble down to this point in try_fit_stack_local:
> Why is the machinery involving STACK_SLOT_ALIGNMENT and
> spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for
> backing stack slots) not working for you?  If everything is setup
> correctly the input alignment to try_fit_stack_local ought to be correct
> already.
We don't need the MEM as a whole aligned, just the offset in the address 
calculation due to how we encode those instructions.  If I've read that 
code correctly, it would arrange for a dynamic realignment of the stack  
so that it could then align the slot. None of that is necessary for us 
and we'd like to avoid forcing the dynamic stack realignment.  Or did I 
misread the code?

jeff
Jakub Jelinek June 8, 2021, 2:55 p.m. UTC | #4
On Tue, Jun 08, 2021 at 08:47:26AM -0600, Jeff Law wrote:
> > Why is the machinery involving STACK_SLOT_ALIGNMENT and
> > spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for
> > backing stack slots) not working for you?  If everything is setup
> > correctly the input alignment to try_fit_stack_local ought to be correct
> > already.
> We don't need the MEM as a whole aligned, just the offset in the address
> calculation due to how we encode those instructions.  If I've read that code
> correctly, it would arrange for a dynamic realignment of the stack  so that
> it could then align the slot. None of that is necessary for us and we'd like
> to avoid forcing the dynamic stack realignment.  Or did I misread the code?

I think dynamic stack realignment is done only on x86, no other backend has
that support, on all the other arches larger alignments are done
in expand_stack_vars by effectively performing __builtin_alloca_with_align
for the block containing all such variables.
So I'd the the functions Michael mentioned shouldn't be doing dynamic stack
realignment, though perhaps by pretending the vars have higher alignment
might be recorded in MEM_ALIGN and perhaps might result in wrong-code if
something will try to e.g. test if least significant bits of certain MEM
address are 0.

	Jakub
Jeff Law June 8, 2021, 3 p.m. UTC | #5
On 6/8/2021 12:56 AM, Richard Biener wrote:
> On Mon, Jun 7, 2021 at 9:00 PM Jeff Law <jlaw@tachyum.com> wrote:
>>
>> So, as many of you know I left Red Hat a while ago and joined Tachyum.
>> We're building a new processor and we've come across an issue where I
>> think we need upstream discussion.
>>
>> I can't divulge many of the details right now, but one of the quirks of
>> our architecture is that reg+d addressing modes for our vector
>> loads/stores require the displacement to be aligned.  This is an
>> artifact of how these instructions are encoded.
>>
>> Obviously we can emit a load of the address into a register when the
>> displacement isn't aligned.  From a correctness point that works
>> perfectly.  Unfortunately, it's a significant performance hit on some
>> standard benchmarks (spec) where we have a great number of spills of
>> vector objects into the stack at unaligned offsets in the hot parts of
>> the code.
>>
>>
>> We've considered 3 possible approaches to solve this problem.
>>
>> 1. When the displacement isn't properly aligned, allocate more space in
>> assign_stack_local so that we can make the offset aligned.  The downside
>> is this potentially burns a lot of stack space, but in practice the cost
>> was minimal (16 bytes in a 9k frame)  From a performance standpoint this
>> works perfectly.
>>
>> 2. Abuse the register elimination code to create a second pointer into
>> the stack.  Spills would start as <virtual> + offset, then either get
>> eliminated to sp+offset' when the offset is aligned or gpr+offset'' when
>> the offset wasn't properly aligned. We started a bit down this path, but
>> with #1 working so well, we didn't get this approach to proof-of-concept.
>>
>> 3. Hack up the post-reload optimizers to fix things up as best as we
>> can.  This may still be advantageous, but again with #1 working so well,
>> we didn't explore this in any significant way.  We may still look at
>> this at some point in other contexts.
>>
> So just as extra info - you're pre-allocating the frame (including for spills)
> and not using push/pop?
Yes, we're an ACCUMULATE_OUTGOING_ARGS target.


>
>> Here's what we're playing with.  Obviously we'd need a target hook to
>> drive this behavior.  I was thinking that we'd pass in any slot offset
>> alignment requirements (from the target hook) to assign_stack_local and
>> that would bubble down to this point in try_fit_stack_local:
>>
>> diff --git a/gcc/function.c b/gcc/function.c
>> index d616f5f64f4..7f441b87a63 100644
>> --- a/gcc/function.c
>> +++ b/gcc/function.c
>> @@ -307,6 +307,14 @@ try_fit_stack_local (poly_int64 start, poly_int64
>> length,
>>      frame_off = targetm.starting_frame_offset () % frame_alignment;
>>      frame_phase = frame_off ? frame_alignment - frame_off : 0;
>>
>> +  if (known_eq (size, 64) && alignment < 64)
>> +    alignment = 64;
>> +
> I'm not familiar with the spill slot allocation code in GCC (I assume the above
> is part of it) - do we in any way "sort" the spill slots so the extra required
> padding is minimal?  Does the above guarantee that in the end the
> offset will be aligned?  I assume IRA/LRA can still choose to eliminate
> the respective frame pointer to sth else that ends up misaligning the offset
> again?  Thus is it a real fix or a heuristic that ends up working most of
> the time?
LRA does sort the spill slots, but I haven't looked into its sorting 
algorithm to see if it's anything other than a priority sort.  LRA does 
allow sharing spill slots for non-conflicting pseudos which is what I've 
assumed has kept the extra padding to a minimum.

It's a real fix, not a heuristic.

>
> The actual alignment value should be dependent on the mode and
> target preference and thus a target hook I suppose (you mention
> this applies to vector loads/stores only).
Absolutely.  What I posted was just the initial proof-of-concept. It 
needs to be a target hook and we need to pass in the data from LRA since 
by the time we get into assign_stack_local, we don't have a useful mode 
-- LRA passes in the size and BLKmode.

I probably trimmed out too many comments in my attempt to avoid 
disclosing anything I shouldn't.  It's worth noting that adjusting 
things at that particular point results in getting the offsets aligned 
without forcing the stack as a whole into a higher alignment or even 
forcing slots to a higher alignment.



> Don't you have the very same issue with non-stack accesses?
We do and will continue to handle those by reloading the reg+d address 
when the displacement isn't suitably aligned.  In practice those cases 
aren't common and aren't on critical paths.

Jeff
H.J. Lu June 8, 2021, 3:06 p.m. UTC | #6
On Tue, Jun 8, 2021 at 7:56 AM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Tue, Jun 08, 2021 at 08:47:26AM -0600, Jeff Law wrote:
> > > Why is the machinery involving STACK_SLOT_ALIGNMENT and
> > > spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for
> > > backing stack slots) not working for you?  If everything is setup
> > > correctly the input alignment to try_fit_stack_local ought to be correct
> > > already.
> > We don't need the MEM as a whole aligned, just the offset in the address
> > calculation due to how we encode those instructions.  If I've read that code
> > correctly, it would arrange for a dynamic realignment of the stack  so that
> > it could then align the slot. None of that is necessary for us and we'd like
> > to avoid forcing the dynamic stack realignment.  Or did I misread the code?
>
> I think dynamic stack realignment is done only on x86, no other backend has

I believe that all pieces of infrastructure to realign the stack are
in place.  You
just need to properly align the stack in the backend.

> that support, on all the other arches larger alignments are done
> in expand_stack_vars by effectively performing __builtin_alloca_with_align
> for the block containing all such variables.
> So I'd the the functions Michael mentioned shouldn't be doing dynamic stack
> realignment, though perhaps by pretending the vars have higher alignment
> might be recorded in MEM_ALIGN and perhaps might result in wrong-code if
> something will try to e.g. test if least significant bits of certain MEM
> address are 0.
>
>         Jakub
>
Jeff Law June 8, 2021, 3:18 p.m. UTC | #7
On 6/8/2021 9:06 AM, H.J. Lu wrote:
> On Tue, Jun 8, 2021 at 7:56 AM Jakub Jelinek via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>> On Tue, Jun 08, 2021 at 08:47:26AM -0600, Jeff Law wrote:
>>>> Why is the machinery involving STACK_SLOT_ALIGNMENT and
>>>> spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for
>>>> backing stack slots) not working for you?  If everything is setup
>>>> correctly the input alignment to try_fit_stack_local ought to be correct
>>>> already.
>>> We don't need the MEM as a whole aligned, just the offset in the address
>>> calculation due to how we encode those instructions.  If I've read that code
>>> correctly, it would arrange for a dynamic realignment of the stack  so that
>>> it could then align the slot. None of that is necessary for us and we'd like
>>> to avoid forcing the dynamic stack realignment.  Or did I misread the code?
>> I think dynamic stack realignment is done only on x86, no other backend has
> I believe that all pieces of infrastructure to realign the stack are
> in place.  You
> just need to properly align the stack in the backend.
As I've stated, we don't need the stack aligned to these higher 
boundaries.  Nor do we need the slot as a whole aligned.   That's 
ultimately just wasteful since we don't need them.  We just want to get 
an aligned offset.

Jeff
Michael Matz June 8, 2021, 3:56 p.m. UTC | #8
Hello,

On Tue, 8 Jun 2021, Jeff Law wrote:

> On 6/8/2021 9:06 AM, H.J. Lu wrote:
> > On Tue, Jun 8, 2021 at 7:56 AM Jakub Jelinek via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >> On Tue, Jun 08, 2021 at 08:47:26AM -0600, Jeff Law wrote:
> >>>> Why is the machinery involving STACK_SLOT_ALIGNMENT and
> >>>> spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for
> >>>> backing stack slots) not working for you?  If everything is setup
> >>>> correctly the input alignment to try_fit_stack_local ought to be correct
> >>>> already.
> >>> We don't need the MEM as a whole aligned, just the offset in the address
> >>> calculation due to how we encode those instructions.  If I've read that
> >>> code
> >>> correctly, it would arrange for a dynamic realignment of the stack  so
> >>> that
> >>> it could then align the slot. None of that is necessary for us and we'd
> >>> like
> >>> to avoid forcing the dynamic stack realignment.  Or did I misread the
> >>> code?
> >> I think dynamic stack realignment is done only on x86, no other backend has
> > I believe that all pieces of infrastructure to realign the stack are
> > in place.  You
> > just need to properly align the stack in the backend.
> 
> As I've stated, we don't need the stack aligned to these higher boundaries. 
> Nor do we need the slot as a whole aligned.   That's ultimately just wasteful
> since we don't need them.  We just want to get an aligned offset.

Well, but isn't that creating a difference when there is none?  You need 
an aligned offset; when given an aligned stack pointer that then is 
equivalent to an aligned stack address.  You are saying that you don't 
need the aligned stack pointer, sure, but would it be a problem for you?

Apart from that: dynamic stack realignment can be disabled (or probably 
isn't enabled for your target to start with), then the stack offset 
alignment machinery should still work in isolation.  (Well it might 
generate alignment claims in MEM RTL which then isn't in fact true, 
depends on the architecture if that's a problem for you).

Either way, I think whatever you need should probably be somehow 
integrated with the existing stack slot alignment knobs.  Or rather the 
two orthogonal pieces (stack pointer alignment and stack offset alignment) 
be separated and you then using only the latter.

(Btw: are you also trying to improve non-stack addresses?  Because 
ultimately your constraints aren't about stack at all but about all 
address forms.  In a way this all is more like a job for addressing 
mode selection and massaging, but of course our support for such in GCC is 
limited beyond avoiding invalid modes by reloading into registers, which 
is exactly what you don't want :) )


Ciao,
Michael.
Peter Bergner June 10, 2021, 7:28 p.m. UTC | #9
On 6/7/21 2:00 PM, Jeff Law wrote:
> I can't divulge many of the details right now, but one of the quirks of our
> architecture is that reg+d addressing modes for our vector loads/stores require
> the displacement to be aligned.  This is an artifact of how these instructions
> are encoded.

Given what you're describing, it sounds like POWER has something similar.
Our reg+displacement addressing uses 16-bit displacements using D, DS or DQ
operand fields.  The D field encodes the entire 16-bits, but the DS and DQ
fields only encode 14-bits and 12-bits respectively.  The DS and DQ operands
have the same maximum displacement as D operands, we just force that their
bottom 2-bits/4-bits must be zero, so we don't need to include them in the
insn encoding.  I believe this is all just handled in our legitimate address
routines, but maybe Segher and/or Mike can correct me if I'm wrong?

Peter
Segher Boessenkool June 10, 2021, 9:34 p.m. UTC | #10
On Thu, Jun 10, 2021 at 02:28:34PM -0500, Peter Bergner via Gcc-patches wrote:
> On 6/7/21 2:00 PM, Jeff Law wrote:
> > I can't divulge many of the details right now, but one of the quirks of our
> > architecture is that reg+d addressing modes for our vector loads/stores require
> > the displacement to be aligned.  This is an artifact of how these instructions
> > are encoded.
> 
> Given what you're describing, it sounds like POWER has something similar.
> Our reg+displacement addressing uses 16-bit displacements using D, DS or DQ
> operand fields.  The D field encodes the entire 16-bits, but the DS and DQ
> fields only encode 14-bits and 12-bits respectively.

Yes, the low 2 resp. 4 bits of the offset are used for other things in
the instruction encoding for DS and DQ.

> The DS and DQ operands
> have the same maximum displacement as D operands, we just force that their
> bottom 2-bits/4-bits must be zero, so we don't need to include them in the
> insn encoding.  I believe this is all just handled in our legitimate address
> routines, but maybe Segher and/or Mike can correct me if I'm wrong?

This is all correct.  But probably the main thing is that our ABIs
require most things to be naturally aligned (up to 16 bytes).  The
machines do not require that (anymore), but requiring this in the ABI
still is a performance win afaics (and a lot easier for (compiler)
implementers -- most issues you describe just disappear ;-) )

Something tells me this won't work for you though?


Segher
Jeff Law June 10, 2021, 10:49 p.m. UTC | #11
On 6/8/2021 8:55 AM, Jakub Jelinek wrote:
> On Tue, Jun 08, 2021 at 08:47:26AM -0600, Jeff Law wrote:
>>> Why is the machinery involving STACK_SLOT_ALIGNMENT and
>>> spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for
>>> backing stack slots) not working for you?  If everything is setup
>>> correctly the input alignment to try_fit_stack_local ought to be correct
>>> already.
>> We don't need the MEM as a whole aligned, just the offset in the address
>> calculation due to how we encode those instructions.  If I've read that code
>> correctly, it would arrange for a dynamic realignment of the stack  so that
>> it could then align the slot. None of that is necessary for us and we'd like
>> to avoid forcing the dynamic stack realignment.  Or did I misread the code?
> I think dynamic stack realignment is done only on x86, no other backend has
> that support, on all the other arches larger alignments are done
> in expand_stack_vars by effectively performing __builtin_alloca_with_align
> for the block containing all such variables.
> So I'd the the functions Michael mentioned shouldn't be doing dynamic stack
> realignment, though perhaps by pretending the vars have higher alignment
> might be recorded in MEM_ALIGN and perhaps might result in wrong-code if
> something will try to e.g. test if least significant bits of certain MEM
> address are 0.
Hmm, I thought we'd do a dynamic realignment, if that's not the case 
then we've got another approach to consider.

Thanks everyone...

jeff
diff mbox series

Patch

diff --git a/gcc/function.c b/gcc/function.c
index d616f5f64f4..7f441b87a63 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -307,6 +307,14 @@  try_fit_stack_local (poly_int64 start, poly_int64 
length,
    frame_off = targetm.starting_frame_offset () % frame_alignment;
    frame_phase = frame_off ? frame_alignment - frame_off : 0;

+  if (known_eq (size, 64) && alignment < 64)
+    alignment = 64;
+
    /* Round the frame offset to the specified alignment.  */

    if (FRAME_GROWS_DOWNWARD)

Thoughts?

Jeff