Message ID | AM6PR07MB4037775DF79E0229DCCA425AE44F0@AM6PR07MB4037.eurprd07.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | [PATCHv2] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544) | expand |
Hi, I'd like to ping for this patch: https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00438.html Thanks Bernd. On 3/10/19 10:42 AM, Bernd Edlinger wrote: > Hi, > > This patch is an update to the previous patch, which takes into account that > the middle-end is not supposed to use the unaligned DI value directly which > was passed in an unaligned stack slot due to the AAPCS parameter passing rules. > > The patch works by changing use_register_for_decl to return false if the > incoming RTL is not sufficiently aligned on a STRICT_ALIGNMENT target, > as if the address of the parameter was taken (which is TREE_ADDRESSABLE). > So not taking the address of the parameter is a necessary condition > for the wrong-code in PR 89544. > > It works together with this check in assign_parm_adjust_stack_rtl: > /* If we can't trust the parm stack slot to be aligned enough for its > ultimate type, don't use that slot after entry. We'll make another > stack slot, if we need one. */ > if (stack_parm > && ((STRICT_ALIGNMENT > && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)) > ... > stack_param = NULL > > This makes assign_parms use assign_parm_setup_stack instead of > assign_parm_setup_reg, and the latter does assign a suitably aligned > stack slot, because stack_param == NULL by now, and uses emit_block_move > which avoids the issue with the back-end. > > Additionally, to prevent unnecessary performance regressions, > assign_parm_find_stack_rtl is enhanced to make use of a possible larger > alignment if the parameter was passed in an aligned stack slot without > the ABI requiring it. > > > Bootstrapped and reg-tested with all languages on arm-linux-gnueabihf. > Is it OK for trunk? > > > Thanks > Bernd. >
On Sun, 10 Mar 2019, Bernd Edlinger wrote: > Hi, > > This patch is an update to the previous patch, which takes into account that > the middle-end is not supposed to use the unaligned DI value directly which > was passed in an unaligned stack slot due to the AAPCS parameter passing rules. > > The patch works by changing use_register_for_decl to return false if the > incoming RTL is not sufficiently aligned on a STRICT_ALIGNMENT target, > as if the address of the parameter was taken (which is TREE_ADDRESSABLE). > So not taking the address of the parameter is a necessary condition > for the wrong-code in PR 89544. > > It works together with this check in assign_parm_adjust_stack_rtl: > /* If we can't trust the parm stack slot to be aligned enough for its > ultimate type, don't use that slot after entry. We'll make another > stack slot, if we need one. */ > if (stack_parm > && ((STRICT_ALIGNMENT > && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)) > ... > stack_param = NULL > > This makes assign_parms use assign_parm_setup_stack instead of > assign_parm_setup_reg, and the latter does assign a suitably aligned > stack slot, because stack_param == NULL by now, and uses emit_block_move > which avoids the issue with the back-end. > > Additionally, to prevent unnecessary performance regressions, > assign_parm_find_stack_rtl is enhanced to make use of a possible larger > alignment if the parameter was passed in an aligned stack slot without > the ABI requiring it. > > > Bootstrapped and reg-tested with all languages on arm-linux-gnueabihf. > Is it OK for trunk? I think the assign_parm_find_stack_rtl is not appropriate at this stage, I am also missing an update to the comment of the block you change. It also changes code I'm not familar enough with to review... Finally... Index: gcc/function.c =================================================================== --- gcc/function.c (revision 269264) +++ gcc/function.c (working copy) @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl) if (DECL_MODE (decl) == BLKmode) return false; + if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL + && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl)) + && GET_MODE_ALIGNMENT (DECL_MODE (decl)) + > MEM_ALIGN (DECL_INCOMING_RTL (decl))) + return false; + /* If -ffloat-store specified, don't put explicit float variables into registers. */ /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa I wonder if it is necessary to look at DECL_INCOMING_RTL here and why such RTL may not exist? That is, iff DECL_INCOMING_RTL doesn't exist then shouldn't we return false for safety reasons? Similarly the very same issue should exist on x86_64 which is !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate alignment on the caller side. So the STRICT_ALIGNMENT check is a wrong one. Which makes me think that a proper fix is not here, but in target(hook) code. Changing use_register_for_decl sounds odd anyways since if we return true we for the testcase still end up in memory, no? The hunk obviously misses a comment since the effect that this will cause a copy to be emitted isn't obvious (and relying on this probably fragile). Richard.
On 3/21/19 12:15 PM, Richard Biener wrote: > On Sun, 10 Mar 2019, Bernd Edlinger wrote: > Finally... > > Index: gcc/function.c > =================================================================== > --- gcc/function.c (revision 269264) > +++ gcc/function.c (working copy) > @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl) > if (DECL_MODE (decl) == BLKmode) > return false; > > + if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL > + && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl)) > + && GET_MODE_ALIGNMENT (DECL_MODE (decl)) > + > MEM_ALIGN (DECL_INCOMING_RTL (decl))) > + return false; > + > /* If -ffloat-store specified, don't put explicit float variables > into registers. */ > /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa > > I wonder if it is necessary to look at DECL_INCOMING_RTL here > and why such RTL may not exist? That is, iff DECL_INCOMING_RTL > doesn't exist then shouldn't we return false for safety reasons? > I think that happens a few times already before the INCOMING_RTL is assigned. I thought that might be too pessimistic. > Similarly the very same issue should exist on x86_64 which is > !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate > alignment on the caller side. So the STRICT_ALIGNMENT check is > a wrong one. > I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets just use MEM_ALIGN to select the right instructions. MEM_ALIGN is always 32-bit align on the DImode memory. The x86_64 vector instructions would look at MEM_ALIGN and do the right thing, yes? It seems to be the definition of STRICT_ALIGNMENT targets that all RTL instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target does not even have to look at MEM_ALIGN except in the mov_misalign_optab, right? The other hunk, where I admit I did not fully understand the comment, tries only to increase the MEM_ALIGN to 64-bit if the stack slot is 64-bit aligned although the target said it only needs 32-bit alignment. So that it is no longer necessary to copy the incoming value. > Which makes me think that a proper fix is not here, but in > target(hook) code. > > Changing use_register_for_decl sounds odd anyways since if we return true > we for the testcase still end up in memory, no? > It seems to make us use the incoming register _or_ stack slot if this function returns true here. If it returns false here, a new stack slot is allocated, but only if the original stack slot was not aligned. This works together with the other STRICT_ALIGNMENT check in assign_parm_adjust_stack_rtl. Where also for !STRICT_ALIGNMENT target TYPE_ALIGN and MEM_ALIGN are checked, but this seems to have only an effect if an address is taken, in that case I see use_register_for_decl return false due to TREE_ADDRESSABLE (decl), and whoops, we have an aligned copy of the unaligned stack slot. So I believe that there was already a fix for unaligned stack positions, that relied on the addressability of the parameter, while the target relied on the 8-byte alignment of the DImode access. > The hunk obviously misses a comment since the effect that this > will cause a copy to be emitted isn't obvious (and relying on > this probably fragile). > Yes, also that the copy is done using movmisalign optab is important. Thanks Bernd.
On Fri, 22 Mar 2019, Bernd Edlinger wrote: > On 3/21/19 12:15 PM, Richard Biener wrote: > > On Sun, 10 Mar 2019, Bernd Edlinger wrote: > > Finally... > > > > Index: gcc/function.c > > =================================================================== > > --- gcc/function.c (revision 269264) > > +++ gcc/function.c (working copy) > > @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl) > > if (DECL_MODE (decl) == BLKmode) > > return false; > > > > + if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL > > + && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl)) > > + && GET_MODE_ALIGNMENT (DECL_MODE (decl)) > > + > MEM_ALIGN (DECL_INCOMING_RTL (decl))) > > + return false; > > + > > /* If -ffloat-store specified, don't put explicit float variables > > into registers. */ > > /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa > > > > I wonder if it is necessary to look at DECL_INCOMING_RTL here > > and why such RTL may not exist? That is, iff DECL_INCOMING_RTL > > doesn't exist then shouldn't we return false for safety reasons? > > > > I think that happens a few times already before the INCOMING_RTL > is assigned. I thought that might be too pessimistic. > > > Similarly the very same issue should exist on x86_64 which is > > !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate > > alignment on the caller side. So the STRICT_ALIGNMENT check is > > a wrong one. > > > > I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets > just use MEM_ALIGN to select the right instructions. MEM_ALIGN > is always 32-bit align on the DImode memory. The x86_64 vector instructions > would look at MEM_ALIGN and do the right thing, yes? No, they need to use the movmisalign optab and end up with UNSPECs for example. > It seems to be the definition of STRICT_ALIGNMENT targets that all RTL > instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target > does not even have to look at MEM_ALIGN except in the mov_misalign_optab, > right? Yes, I think we never losened that. Note that RTL expansion has to fix this up for them. Note that strictly speaking SLOW_UNALIGNED_ACCESS specifies that x86 is strict-align wrt vector modes. > The other hunk, where I admit I did not fully understand the comment, tries > only to increase the MEM_ALIGN to 64-bit if the stack slot is > 64-bit aligned although the target said it only needs 32-bit alignment. > So that it is no longer necessary to copy the incoming value. > > > > Which makes me think that a proper fix is not here, but in > > target(hook) code. > > > > Changing use_register_for_decl sounds odd anyways since if we return true > > we for the testcase still end up in memory, no? > > > > It seems to make us use the incoming register _or_ stack slot if this function > returns true here. > > If it returns false here, a new stack slot is allocated, but only if the > original stack slot was not aligned. This works together with the > other STRICT_ALIGNMENT check in assign_parm_adjust_stack_rtl. Yes, I understood this - but then the check should be in that code deciding whether to copy, not in use_register_for_decl? > Where also for !STRICT_ALIGNMENT target TYPE_ALIGN and MEM_ALIGN > are checked, but this seems to have only an effect if an address > is taken, in that case I see use_register_for_decl return false > due to TREE_ADDRESSABLE (decl), and whoops, we have an aligned copy > of the unaligned stack slot. > > So I believe that there was already a fix for unaligned stack positions, > that relied on the addressability of the parameter, while the target > relied on the 8-byte alignment of the DImode access. > > > The hunk obviously misses a comment since the effect that this > > will cause a copy to be emitted isn't obvious (and relying on > > this probably fragile). > > > > Yes, also that the copy is done using movmisalign optab is important. > > > Thanks > Bernd. >
2019-03-05 Bernd Edlinger <bernd.edlinger@hotmail.de> PR middle-end/89544 * function.c (use_register_for_decl): Avoid using unaligned stack values on strict alignment targets. (assign_parm_find_stack_rtl): Use larger alignment when possible. testsuite: 2019-03-05 Bernd Edlinger <bernd.edlinger@hotmail.de> PR middle-end/89544 * gcc.target/arm/unaligned-argument-1.c: New test. * gcc.target/arm/unaligned-argument-2.c: New test. Index: gcc/function.c =================================================================== --- gcc/function.c (revision 269264) +++ gcc/function.c (working copy) @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl) if (DECL_MODE (decl) == BLKmode) return false; + if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL + && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl)) + && GET_MODE_ALIGNMENT (DECL_MODE (decl)) + > MEM_ALIGN (DECL_INCOMING_RTL (decl))) + return false; + /* If -ffloat-store specified, don't put explicit float variables into registers. */ /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa @@ -2698,8 +2704,20 @@ assign_parm_find_stack_rtl (tree parm, struct assi intentionally forcing upward padding. Otherwise we have to come up with a guess at the alignment based on OFFSET_RTX. */ poly_int64 offset; - if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm) + if (data->locate.where_pad == PAD_NONE || data->entry_parm) align = boundary; + else if (data->locate.where_pad == PAD_UPWARD) + { + align = boundary; + if (poly_int_rtx_p (offset_rtx, &offset) + && STACK_POINTER_OFFSET == 0) + { + unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT; + if (offset_align == 0 || offset_align > STACK_BOUNDARY) + offset_align = STACK_BOUNDARY; + align = MAX (align, offset_align); + } + } else if (poly_int_rtx_p (offset_rtx, &offset)) { align = least_bit_hwi (boundary); Index: gcc/testsuite/gcc.target/arm/unaligned-argument-1.c =================================================================== --- gcc/testsuite/gcc.target/arm/unaligned-argument-1.c (revision 0) +++ gcc/testsuite/gcc.target/arm/unaligned-argument-1.c (working copy) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */ + +struct s { + int a, b; +} __attribute__((aligned(8))); + +struct s f0; + +void f(int a, int b, int c, int d, struct s f) +{ + f0 = f; +} + +/* { dg-final { scan-assembler-times "stmdb" 0 } } */ +/* { dg-final { scan-assembler-times "ldrd\[^\\n\]*\\\[sp\\\]" 1 } } */ +/* { dg-final { scan-assembler-times "ldrd" 1 } } */ +/* { dg-final { scan-assembler-times "strd" 1 } } */ Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c =================================================================== --- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c (revision 0) +++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c (working copy) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */ + +struct s { + int a, b; +} __attribute__((aligned(8))); + +struct s f0; + +void f(int a, int b, int c, int d, int e, struct s f) +{ + f0 = f; +} + +/* { dg-final { scan-assembler-times "stmdb" 1 } } */ +/* { dg-final { scan-assembler-times "ldrd\[^\\n\]*\\\[sp\\\]" 1 } } */ +/* { dg-final { scan-assembler-times "ldrd" 1 } } */ +/* { dg-final { scan-assembler-times "strd" 1 } } */