Message ID | DB8PR10MB2569255E3191B5C7FA70D338E4D70@DB8PR10MB2569.EURPRD10.PROD.OUTLOOK.COM |
---|---|
State | New |
Headers | show |
Series | [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544) | expand |
Hi! I'd like to ping for this patch: https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00546.html Thanks Bernd.
On Thu, 8 Aug 2019, Bernd Edlinger wrote: > On 8/2/19 9:01 PM, Bernd Edlinger wrote: > > On 8/2/19 3:11 PM, Richard Biener wrote: > >> On Tue, 30 Jul 2019, Bernd Edlinger wrote: > >> > >>> > >>> I have no test coverage for the movmisalign optab though, so I > >>> rely on your code review for that part. > >> > >> It looks OK. I tried to make it trigger on the following on > >> i?86 with -msse2: > >> > >> typedef int v4si __attribute__((vector_size (16))); > >> > >> struct S { v4si v; } __attribute__((packed)); > >> > >> v4si foo (struct S s) > >> { > >> return s.v; > >> } > >> > > > > Hmm, the entry_parm need to be a MEM_P and an unaligned one. > > So the test case could be made to trigger it this way: > > > > typedef int v4si __attribute__((vector_size (16))); > > > > struct S { v4si v; } __attribute__((packed)); > > > > int t; > > v4si foo (struct S a, struct S b, struct S c, struct S d, > > struct S e, struct S f, struct S g, struct S h, > > int i, int j, int k, int l, int m, int n, > > int o, struct S s) > > { > > t = o; > > return s.v; > > } > > > > Ah, I realized that there are already a couple of very similar > test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c, > gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c, > which also manage to execute the movmisalign code with the latest patch > version. So I thought that it is not necessary to add another one. > > > However the code path is still not reached, since targetm.slow_ualigned_access > > is always FALSE, which is probably a flaw in my patch. > > > > So I think, > > > > + else if (MEM_P (data->entry_parm) > > + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > > + > MEM_ALIGN (data->entry_parm) > > + && targetm.slow_unaligned_access (promoted_nominal_mode, > > + MEM_ALIGN (data->entry_parm))) > > > > should probably better be > > > > + else if (MEM_P (data->entry_parm) > > + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > > + > MEM_ALIGN (data->entry_parm) > > + && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode)) > > + != CODE_FOR_nothing) > > + || targetm.slow_unaligned_access (promoted_nominal_mode, > > + MEM_ALIGN (data->entry_parm)))) > > > > Right? > > > > Then the modified test case would use the movmisalign optab. > > However nothing changes in the end, since the i386 back-end is used to work > > around the middle end not using movmisalign optab when it should do so. > > > > I prefer the second form of the check, as it offers more test coverage, > and is probably more correct than the former. > > Note there are more variations of this misalign check in expr.c, > some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR: > > && mode != BLKmode > && align < GET_MODE_ALIGNMENT (mode)) > { > if ((icode = optab_handler (movmisalign_optab, mode)) > != CODE_FOR_nothing) > [...] > else if (targetm.slow_unaligned_access (mode, align)) > temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode), > 0, TYPE_UNSIGNED (TREE_TYPE (exp)), > (modifier == EXPAND_STACK_PARM > ? NULL_RTX : target), > mode, mode, false, alt_rtl); > > I wonder if they are correct this way, why shouldn't we use the movmisalign > optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ? Doesn't the code do exactly this? Prefer movmisalign over extrct_bit_field? > > > I wonder if I should try to add a gcc_checking_assert to the mov<mode> expand > > patterns that the memory is properly aligned ? > > > > Wow, that was a really exciting bug-hunt with those assertions around... :) > >> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all > >> > >> did_conversion = true; > >> } > >> + else if (MEM_P (data->entry_parm) > >> + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > >> + > MEM_ALIGN (data->entry_parm) > >> > >> we arrive here by-passing > >> > >> else if (need_conversion) > >> { > >> /* We did not have an insn to convert directly, or the sequence > >> generated appeared unsafe. We must first copy the parm to a > >> pseudo reg, and save the conversion until after all > >> parameters have been moved. */ > >> > >> int save_tree_used; > >> rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm)); > >> > >> emit_move_insn (tempreg, validated_mem); > >> > >> but this move instruction is invalid in the same way as the case > >> you fix, no? So wouldn't it be better to do > >> > > > > We could do that, but I supposed that there must be a reason why > > assign_parm_setup_stack gets away with that same: > > > > if (data->promoted_mode != data->nominal_mode) > > { > > /* Conversion is required. */ > > rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm)); > > > > emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm))); > > > > > > So either some back-ends are too permissive with us, > > or there is a reason why promoted_mode != nominal_mode > > does not happen together with unaligned entry_parm. > > In a way that would be a rather unusual ABI. > > > > To find out if that ever happens I added a couple of checking > assertions in the arm mov<mode> expand patterns. > > So far the assertions did (almost) always hold, so it is likely not > necessary to fiddle with all those naive move instructions here. > > So my gut feeling is, leave those places alone until there is a reason > for changing them. Works for me - I wonder if we should add those asserts to generic code (guarded with flag_checking) though. > However the assertion in movsi triggered a couple times in the > ada testsuite due to expand_builtin_init_descriptor using a > BLKmode MEM rtx, which is only 8-bit aligned. So, I set the > ptr_mode alignment there explicitly. Looks good given we emit ptr_mode moves into it. Thus OK independently. > Several struct-layout-1.dg testcase tripped over misaligned > complex_cst constants, fixed by varasm.c (align_variable). > This is likely a wrong code bug, because misaligned complex > constants, are expanded to misaligned MEM_REF, but the > expansion cannot handle misaligned constants, only packed > structure fields. Hmm. So your patch overrides user-alignment here. Woudln't it be better to do that more conciously by if (! DECL_USER_ALIGN (decl) || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl)) && targetm.slow_unaligned_access (DECL_MODE (decl), align))) ? And why is the movmisalign optab support missing here? IMHO whatever code later fails to properly use unaligned loads should be fixed instead rather than ignoring user requested alignment. Can you quote a short testcase that explains what exactly goes wrong? The struct-layout ones are awkward to look at... > Furthermore gcc.dg/Warray-bounds-33.c was fixed by the > change in expr.c (expand_expr_real_1). Certainly is it invalid > to read memory at a function address, but it should not ICE. > The problem here, is the MEM_REF has no valid MEM_ALIGN, it looks > like A32, so the misaligned code execution is not taken, but it is > set to A8 below, but then we hit an ICE if the result is used: So the user accessed it as A32. > /* Don't set memory attributes if the base expression is > SSA_NAME that got expanded as a MEM. In that case, we should > just honor its original memory attributes. */ > if (TREE_CODE (tem) != SSA_NAME || !MEM_P (orig_op0)) > set_mem_attributes (op0, exp, 0); Huh, I don't understand this. 'tem' should never be SSA_NAME. But set_mem_attributes_minus_bitpos uses get_object_alignment_1 and that has special treatment for FUNCTION_DECLs that is not covered by /* When EXP is an actual memory reference then we can use TYPE_ALIGN of a pointer indirection to derive alignment. Do so only if get_pointer_alignment_1 did not reveal absolute alignment knowledge and if using that alignment would improve the situation. */ unsigned int talign; if (!addr_p && !known_alignment && (talign = min_align_of_type (TREE_TYPE (exp)) * BITS_PER_UNIT) && talign > align) align = talign; which could be moved out of the if-cascade. That said, setting A8 should eventually result into appropriate unaligned expansion, so it seems odd this triggers the assert... > > Finally gcc.dg/torture/pr48493.c required the change > in assign_parm_setup_stack. This is just not using the > correct MEM_ALIGN attribute value, while the memory is > actually aligned. But doesn't int align = STACK_SLOT_ALIGNMENT (data->passed_type, GET_MODE (data->entry_parm), TYPE_ALIGN (data->passed_type)); + if (align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)) + && targetm.slow_unaligned_access (GET_MODE (data->entry_parm), + align)) + align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)); hint at that STACK_SLOT_ALIGNMENT is simply bogus for the target? That is, the target says, for natural alignment 64 the stack slot alignment can only be guaranteed 32. You can't then simply up it but have to use unaligned accesses (or the target/middle-end needs to do dynamic stack alignment). > Note that set_mem_attributes does not > always preserve the MEM_ALIGN of the ref, since: set_mem_attributes sets _all_ attributes from an expression or type. > /* Default values from pre-existing memory attributes if present. */ > refattrs = MEM_ATTRS (ref); > if (refattrs) > { > /* ??? Can this ever happen? Calling this routine on a MEM that > already carries memory attributes should probably be invalid. */ > attrs.expr = refattrs->expr; > attrs.offset_known_p = refattrs->offset_known_p; > attrs.offset = refattrs->offset; > attrs.size_known_p = refattrs->size_known_p; > attrs.size = refattrs->size; > attrs.align = refattrs->align; > } > > but if we happen to set_mem_align to _exactly_ the MODE_ALIGNMENT > the MEM_ATTRS are zero, and a smaller alignment may result. Not sure what you are saying here. That set_mem_align (MEM:SI A32, 32) produces a NULL MEM_ATTRS and thus set_mem_attributes not inheriting the A32 but eventually computing sth lower? Yeah, that's probably an interesting "hole" here. I'm quite sure that if we'd do refattrs = MEM_ATTRS (ref) ? MEM_ATTRS (ref) : mem_mode_attrs[(int) GET_MODE (ref)]; we run into issues exactly on strict-align targets ... > Well with those checks in place it should now be a lot harder to generate > invalid code on STRICT_ALIGNMENT targets, without running into an ICE. > > Attached is the latest version of my arm alignment patch. > > > Boot-strapped and reg-tested on x64_64-pc-linux-gnu and arm-linux-gnueabihf. > Is it OK for trunk? @@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all did_conversion = true; } + else if (MEM_P (data->entry_parm) + && GET_MODE_ALIGNMENT (promoted_nominal_mode) + > MEM_ALIGN (data->entry_parm) + && (((icode = optab_handler (movmisalign_optab, + promoted_nominal_mode)) + != CODE_FOR_nothing) + || targetm.slow_unaligned_access (promoted_nominal_mode, + MEM_ALIGN (data->entry_parm)))) + { + if (icode != CODE_FOR_nothing) + emit_insn (GEN_FCN (icode) (parmreg, validated_mem)); + else + rtl = parmreg = extract_bit_field (validated_mem, + GET_MODE_BITSIZE (promoted_nominal_mode), 0, + unsignedp, parmreg, + promoted_nominal_mode, VOIDmode, false, NULL); + } else emit_move_insn (parmreg, validated_mem); This hunk would be obvious to me if we'd use MEM_ALIGN (validated_mem) / GET_MODE (validated_mem) instead of MEM_ALIGN (data->entry_parm) and promoted_nominal_mode. Not sure if it helps on its own. I'm nervous about the alignment one since I'm not at all familiar with this code... Thanks, Richard.
On 8/14/19 2:00 PM, Richard Biener wrote: > On Thu, 8 Aug 2019, Bernd Edlinger wrote: > >> On 8/2/19 9:01 PM, Bernd Edlinger wrote: >>> On 8/2/19 3:11 PM, Richard Biener wrote: >>>> On Tue, 30 Jul 2019, Bernd Edlinger wrote: >>>> >>>>> >>>>> I have no test coverage for the movmisalign optab though, so I >>>>> rely on your code review for that part. >>>> >>>> It looks OK. I tried to make it trigger on the following on >>>> i?86 with -msse2: >>>> >>>> typedef int v4si __attribute__((vector_size (16))); >>>> >>>> struct S { v4si v; } __attribute__((packed)); >>>> >>>> v4si foo (struct S s) >>>> { >>>> return s.v; >>>> } >>>> >>> >>> Hmm, the entry_parm need to be a MEM_P and an unaligned one. >>> So the test case could be made to trigger it this way: >>> >>> typedef int v4si __attribute__((vector_size (16))); >>> >>> struct S { v4si v; } __attribute__((packed)); >>> >>> int t; >>> v4si foo (struct S a, struct S b, struct S c, struct S d, >>> struct S e, struct S f, struct S g, struct S h, >>> int i, int j, int k, int l, int m, int n, >>> int o, struct S s) >>> { >>> t = o; >>> return s.v; >>> } >>> >> >> Ah, I realized that there are already a couple of very similar >> test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c, >> gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c, >> which also manage to execute the movmisalign code with the latest patch >> version. So I thought that it is not necessary to add another one. >> >>> However the code path is still not reached, since targetm.slow_ualigned_access >>> is always FALSE, which is probably a flaw in my patch. >>> >>> So I think, >>> >>> + else if (MEM_P (data->entry_parm) >>> + && GET_MODE_ALIGNMENT (promoted_nominal_mode) >>> + > MEM_ALIGN (data->entry_parm) >>> + && targetm.slow_unaligned_access (promoted_nominal_mode, >>> + MEM_ALIGN (data->entry_parm))) >>> >>> should probably better be >>> >>> + else if (MEM_P (data->entry_parm) >>> + && GET_MODE_ALIGNMENT (promoted_nominal_mode) >>> + > MEM_ALIGN (data->entry_parm) >>> + && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode)) >>> + != CODE_FOR_nothing) >>> + || targetm.slow_unaligned_access (promoted_nominal_mode, >>> + MEM_ALIGN (data->entry_parm)))) >>> >>> Right? >>> >>> Then the modified test case would use the movmisalign optab. >>> However nothing changes in the end, since the i386 back-end is used to work >>> around the middle end not using movmisalign optab when it should do so. >>> >> >> I prefer the second form of the check, as it offers more test coverage, >> and is probably more correct than the former. >> >> Note there are more variations of this misalign check in expr.c, >> some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR: >> >> && mode != BLKmode >> && align < GET_MODE_ALIGNMENT (mode)) >> { >> if ((icode = optab_handler (movmisalign_optab, mode)) >> != CODE_FOR_nothing) >> [...] >> else if (targetm.slow_unaligned_access (mode, align)) >> temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode), >> 0, TYPE_UNSIGNED (TREE_TYPE (exp)), >> (modifier == EXPAND_STACK_PARM >> ? NULL_RTX : target), >> mode, mode, false, alt_rtl); >> >> I wonder if they are correct this way, why shouldn't we use the movmisalign >> optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ? > > Doesn't the code do exactly this? Prefer movmisalign over > extrct_bit_field? > Ah, yes. How could I miss that. >> >>> I wonder if I should try to add a gcc_checking_assert to the mov<mode> expand >>> patterns that the memory is properly aligned ? >>> >> >> Wow, that was a really exciting bug-hunt with those assertions around... > > :) > >>>> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all >>>> >>>> did_conversion = true; >>>> } >>>> + else if (MEM_P (data->entry_parm) >>>> + && GET_MODE_ALIGNMENT (promoted_nominal_mode) >>>> + > MEM_ALIGN (data->entry_parm) >>>> >>>> we arrive here by-passing >>>> >>>> else if (need_conversion) >>>> { >>>> /* We did not have an insn to convert directly, or the sequence >>>> generated appeared unsafe. We must first copy the parm to a >>>> pseudo reg, and save the conversion until after all >>>> parameters have been moved. */ >>>> >>>> int save_tree_used; >>>> rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm)); >>>> >>>> emit_move_insn (tempreg, validated_mem); >>>> >>>> but this move instruction is invalid in the same way as the case >>>> you fix, no? So wouldn't it be better to do >>>> >>> >>> We could do that, but I supposed that there must be a reason why >>> assign_parm_setup_stack gets away with that same: >>> >>> if (data->promoted_mode != data->nominal_mode) >>> { >>> /* Conversion is required. */ >>> rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm)); >>> >>> emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm))); >>> >>> >>> So either some back-ends are too permissive with us, >>> or there is a reason why promoted_mode != nominal_mode >>> does not happen together with unaligned entry_parm. >>> In a way that would be a rather unusual ABI. >>> >> >> To find out if that ever happens I added a couple of checking >> assertions in the arm mov<mode> expand patterns. >> >> So far the assertions did (almost) always hold, so it is likely not >> necessary to fiddle with all those naive move instructions here. >> >> So my gut feeling is, leave those places alone until there is a reason >> for changing them. > > Works for me - I wonder if we should add those asserts to generic > code (guarded with flag_checking) though. > Well, yes, but I was scared away by the complexity of emit_move_insn_1. It could be done, but in the moment I would be happy to have these checks of one major strict alignment target, ARM is a good candidate since most instructions work even if they are accidentally using unaligned arguments. So middle-end errors do not always visible by ordinary tests. Nevertheless it is a blatant violation of the contract between middle-end and back-end, which should be avoided. >> However the assertion in movsi triggered a couple times in the >> ada testsuite due to expand_builtin_init_descriptor using a >> BLKmode MEM rtx, which is only 8-bit aligned. So, I set the >> ptr_mode alignment there explicitly. > > Looks good given we emit ptr_mode moves into it. Thus OK independently. > Thanks, committed as r274487. >> Several struct-layout-1.dg testcase tripped over misaligned >> complex_cst constants, fixed by varasm.c (align_variable). >> This is likely a wrong code bug, because misaligned complex >> constants, are expanded to misaligned MEM_REF, but the >> expansion cannot handle misaligned constants, only packed >> structure fields. > > Hmm. So your patch overrides user-alignment here. Woudln't it > be better to do that more conciously by > > if (! DECL_USER_ALIGN (decl) > || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl)) > && targetm.slow_unaligned_access (DECL_MODE (decl), align))) > > ? And why is the movmisalign optab support missing here? > Yes, I wanted to replicate what we have in assign_parm_adjust_stack_rtl: /* If we can't trust the parm stack slot to be aligned enough for its ultimate type, don't use that slot after entry. We'll make another stack slot, if we need one. */ if (stack_parm && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm) && targetm.slow_unaligned_access (data->nominal_mode, MEM_ALIGN (stack_parm))) which also makes a variable more aligned than it is declared. But maybe both should also check the movmisalign optab in addition to slow_unaligned_access ? > IMHO whatever code later fails to properly use unaligned loads > should be fixed instead rather than ignoring user requested alignment. > > Can you quote a short testcase that explains what exactly goes wrong? > The struct-layout ones are awkward to look at... > Sure, $ cat test.c _Complex float __attribute__((aligned(1))) cf; void foo (void) { cf = 1.0i; } $ arm-linux-gnueabihf-gcc -S test.c during RTL pass: expand test.c: In function 'foo': test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003 5 | cf = 1.0i; | ~~~^~~~~~ 0x7ba475 gen_movsf(rtx_def*, rtx_def*) ../../gcc-trunk/gcc/config/arm/arm.md:7003 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const ../../gcc-trunk/gcc/recog.h:318 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*) ../../gcc-trunk/gcc/expr.c:3695 0xa49914 emit_move_insn(rtx_def*, rtx_def*) ../../gcc-trunk/gcc/expr.c:3791 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*) ../../gcc-trunk/gcc/expr.c:3490 0xa49914 emit_move_insn(rtx_def*, rtx_def*) ../../gcc-trunk/gcc/expr.c:3791 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool) ../../gcc-trunk/gcc/expr.c:5855 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) ../../gcc-trunk/gcc/expr.c:5441 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) ../../gcc-trunk/gcc/expr.c:4983 0x93396f expand_gimple_stmt_1 ../../gcc-trunk/gcc/cfgexpand.c:3777 0x93396f expand_gimple_stmt ../../gcc-trunk/gcc/cfgexpand.c:3875 0x9392e1 expand_gimple_basic_block ../../gcc-trunk/gcc/cfgexpand.c:5915 0x93b046 execute ../../gcc-trunk/gcc/cfgexpand.c:6538 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. Without the hunk in varasm.c of course. What happens is that expand_expr_real_2 returns a unaligned mem_ref here: case COMPLEX_CST: /* Handle evaluating a complex constant in a CONCAT target. */ if (original_target && GET_CODE (original_target) == CONCAT) { [... this path not taken ...] } /* fall through */ case STRING_CST: temp = expand_expr_constant (exp, 1, modifier); /* temp contains a constant address. On RISC machines where a constant address isn't valid, make some insns to get that address into a register. */ if (modifier != EXPAND_CONST_ADDRESS && modifier != EXPAND_INITIALIZER && modifier != EXPAND_SUM && ! memory_address_addr_space_p (mode, XEXP (temp, 0), MEM_ADDR_SPACE (temp))) return replace_equiv_address (temp, copy_rtx (XEXP (temp, 0))); return temp; The result of expand_expr_real(..., EXPAND_NORMAL) ought to be usable by emit_move_insn, that is expected just *everywhere* and can't be changed. This could probably be fixed in an ugly way in the COMPLEX_CST, handler but OTOH, I don't see any reason why this constant has to be misaligned when it can be easily aligned, which avoids the need for a misaligned access. >> Furthermore gcc.dg/Warray-bounds-33.c was fixed by the >> change in expr.c (expand_expr_real_1). Certainly is it invalid >> to read memory at a function address, but it should not ICE. >> The problem here, is the MEM_REF has no valid MEM_ALIGN, it looks >> like A32, so the misaligned code execution is not taken, but it is >> set to A8 below, but then we hit an ICE if the result is used: > > So the user accessed it as A32. > >> /* Don't set memory attributes if the base expression is >> SSA_NAME that got expanded as a MEM. In that case, we should >> just honor its original memory attributes. */ >> if (TREE_CODE (tem) != SSA_NAME || !MEM_P (orig_op0)) >> set_mem_attributes (op0, exp, 0); > > Huh, I don't understand this. 'tem' should never be SSA_NAME. tem is the result of get_inner_reference, why can't that be a SSA_NAME ? > But set_mem_attributes_minus_bitpos uses get_object_alignment_1 > and that has special treatment for FUNCTION_DECLs that is not > covered by > > /* When EXP is an actual memory reference then we can use > TYPE_ALIGN of a pointer indirection to derive alignment. > Do so only if get_pointer_alignment_1 did not reveal absolute > alignment knowledge and if using that alignment would > improve the situation. */ > unsigned int talign; > if (!addr_p && !known_alignment > && (talign = min_align_of_type (TREE_TYPE (exp)) * > BITS_PER_UNIT) > && talign > align) > align = talign; > > which could be moved out of the if-cascade. > > That said, setting A8 should eventually result into appropriate > unaligned expansion, so it seems odd this triggers the assert... > The function pointer is really 32-byte aligned in ARM mode to start with... The problem is that the code that handles this misaligned access is skipped because the mem_rtx has initially no MEM_ATTRS and therefore MEM_ALIGN == 32, and therefore the code that handles the unaligned access is not taken. BUT before the mem_rtx is returned it is set to MEM_ALIGN = 8 by set_mem_attributes, and we have an assertion, because the result from expand_expr_real(..., EXPAND_NORMAL) ought to be usable with emit_move_insn. >> >> Finally gcc.dg/torture/pr48493.c required the change >> in assign_parm_setup_stack. This is just not using the >> correct MEM_ALIGN attribute value, while the memory is >> actually aligned. > > But doesn't > > int align = STACK_SLOT_ALIGNMENT (data->passed_type, > GET_MODE (data->entry_parm), > TYPE_ALIGN > (data->passed_type)); > + if (align < (int)GET_MODE_ALIGNMENT (GET_MODE > (data->entry_parm)) > + && targetm.slow_unaligned_access (GET_MODE > (data->entry_parm), > + align)) > + align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)); > > hint at that STACK_SLOT_ALIGNMENT is simply bogus for the target? > That is, the target says, for natural alignment 64 the stack slot > alignment can only be guaranteed 32. You can't then simply up it > but have to use unaligned accesses (or the target/middle-end needs > to do dynamic stack alignment). > Yes, maybe, but STACK_SLOT_ALIGNMENT is used in a few other places as well, and none of them have a problem, probably because they use expand_expr, but here we use emit_move_insn: if (MEM_P (src)) { [...] } else { if (!REG_P (src)) src = force_reg (GET_MODE (src), src); emit_move_insn (dest, src); } So I could restrict that to if (!MEM_P (data->entry_parm) && align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)) && ((optab_handler (movmisalign_optab, GET_MODE (data->entry_parm)) != CODE_FOR_nothing) || targetm.slow_unaligned_access (GET_MODE (data->entry_parm), align))) align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)); But OTOH even for arguments arriving in unaligned stack slots where emit_block_move could handle it, that would just work against the intention of assign_parm_adjust_stack_rtl. Of course there are limits how much alignment assign_stack_local can handle, and that would result in an assertion in the emit_move_insn. But in the end if that happens it is just an impossible target configuration. > >> Note that set_mem_attributes does not >> always preserve the MEM_ALIGN of the ref, since: > > set_mem_attributes sets _all_ attributes from an expression or type. > Not really: refattrs = MEM_ATTRS (ref); if (refattrs) { /* ??? Can this ever happen? Calling this routine on a MEM that already carries memory attributes should probably be invalid. */ [...] attrs.align = refattrs->align; } else [...] if (objectp || TREE_CODE (t) == INDIRECT_REF) attrs.align = MAX (attrs.align, TYPE_ALIGN (type)); >> /* Default values from pre-existing memory attributes if present. */ >> refattrs = MEM_ATTRS (ref); >> if (refattrs) >> { >> /* ??? Can this ever happen? Calling this routine on a MEM that >> already carries memory attributes should probably be invalid. */ >> attrs.expr = refattrs->expr; >> attrs.offset_known_p = refattrs->offset_known_p; >> attrs.offset = refattrs->offset; >> attrs.size_known_p = refattrs->size_known_p; >> attrs.size = refattrs->size; >> attrs.align = refattrs->align; >> } >> >> but if we happen to set_mem_align to _exactly_ the MODE_ALIGNMENT >> the MEM_ATTRS are zero, and a smaller alignment may result. > > Not sure what you are saying here. That > > set_mem_align (MEM:SI A32, 32) > > produces a NULL MEM_ATTRS and thus set_mem_attributes not inheriting > the A32 but eventually computing sth lower? Yeah, that's probably > an interesting "hole" here. I'm quite sure that if we'd do > > refattrs = MEM_ATTRS (ref) ? MEM_ATTRS (ref) : mem_mode_attrs[(int) GET_MODE (ref)]; > > we run into issues exactly on strict-align targets ... > Yeah, that's scary... >> Well with those checks in place it should now be a lot harder to generate >> invalid code on STRICT_ALIGNMENT targets, without running into an ICE. >> >> Attached is the latest version of my arm alignment patch. >> >> >> Boot-strapped and reg-tested on x64_64-pc-linux-gnu and arm-linux-gnueabihf. >> Is it OK for trunk? > > @@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all > > did_conversion = true; > } > + else if (MEM_P (data->entry_parm) > + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > + > MEM_ALIGN (data->entry_parm) > + && (((icode = optab_handler (movmisalign_optab, > + promoted_nominal_mode)) > + != CODE_FOR_nothing) > + || targetm.slow_unaligned_access (promoted_nominal_mode, > + MEM_ALIGN > (data->entry_parm)))) > + { > + if (icode != CODE_FOR_nothing) > + emit_insn (GEN_FCN (icode) (parmreg, validated_mem)); > + else > + rtl = parmreg = extract_bit_field (validated_mem, > + GET_MODE_BITSIZE (promoted_nominal_mode), 0, > + unsignedp, parmreg, > + promoted_nominal_mode, VOIDmode, false, NULL); > + } > else > emit_move_insn (parmreg, validated_mem); > > This hunk would be obvious to me if we'd use MEM_ALIGN (validated_mem) / > GET_MODE (validated_mem) instead of MEM_ALIGN (data->entry_parm) > and promoted_nominal_mode. > Yes, the idea is just to save some cycles, since parmreg = gen_reg_rtx (promoted_nominal_mode); we know that parmreg will also have that mode, plus emit_move_insn (parmreg, validated_mem) which would be called here asserts that: gcc_assert (mode != BLKmode && (GET_MODE (y) == mode || GET_MODE (y) == VOIDmode)); so GET_MODE(validated_mem) == GET_MODE (parmreg) == promoted_nominal_mode I still like the current version with promoted_nominal_mode slighhtly better both because of performance, and the 80-column restriction. :) > Not sure if it helps on its own. > > I'm nervous about the alignment one since I'm not at all familiar > with this code... > > Thanks, > Richard. > I will send a new version of the patch shortly. Thanks Bernd.
On Wed, 14 Aug 2019, Bernd Edlinger wrote: > On 8/14/19 2:00 PM, Richard Biener wrote: > > On Thu, 8 Aug 2019, Bernd Edlinger wrote: > > > >> On 8/2/19 9:01 PM, Bernd Edlinger wrote: > >>> On 8/2/19 3:11 PM, Richard Biener wrote: > >>>> On Tue, 30 Jul 2019, Bernd Edlinger wrote: > >>>> > >>>>> > >>>>> I have no test coverage for the movmisalign optab though, so I > >>>>> rely on your code review for that part. > >>>> > >>>> It looks OK. I tried to make it trigger on the following on > >>>> i?86 with -msse2: > >>>> > >>>> typedef int v4si __attribute__((vector_size (16))); > >>>> > >>>> struct S { v4si v; } __attribute__((packed)); > >>>> > >>>> v4si foo (struct S s) > >>>> { > >>>> return s.v; > >>>> } > >>>> > >>> > >>> Hmm, the entry_parm need to be a MEM_P and an unaligned one. > >>> So the test case could be made to trigger it this way: > >>> > >>> typedef int v4si __attribute__((vector_size (16))); > >>> > >>> struct S { v4si v; } __attribute__((packed)); > >>> > >>> int t; > >>> v4si foo (struct S a, struct S b, struct S c, struct S d, > >>> struct S e, struct S f, struct S g, struct S h, > >>> int i, int j, int k, int l, int m, int n, > >>> int o, struct S s) > >>> { > >>> t = o; > >>> return s.v; > >>> } > >>> > >> > >> Ah, I realized that there are already a couple of very similar > >> test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c, > >> gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c, > >> which also manage to execute the movmisalign code with the latest patch > >> version. So I thought that it is not necessary to add another one. > >> > >>> However the code path is still not reached, since targetm.slow_ualigned_access > >>> is always FALSE, which is probably a flaw in my patch. > >>> > >>> So I think, > >>> > >>> + else if (MEM_P (data->entry_parm) > >>> + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > >>> + > MEM_ALIGN (data->entry_parm) > >>> + && targetm.slow_unaligned_access (promoted_nominal_mode, > >>> + MEM_ALIGN (data->entry_parm))) > >>> > >>> should probably better be > >>> > >>> + else if (MEM_P (data->entry_parm) > >>> + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > >>> + > MEM_ALIGN (data->entry_parm) > >>> + && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode)) > >>> + != CODE_FOR_nothing) > >>> + || targetm.slow_unaligned_access (promoted_nominal_mode, > >>> + MEM_ALIGN (data->entry_parm)))) > >>> > >>> Right? > >>> > >>> Then the modified test case would use the movmisalign optab. > >>> However nothing changes in the end, since the i386 back-end is used to work > >>> around the middle end not using movmisalign optab when it should do so. > >>> > >> > >> I prefer the second form of the check, as it offers more test coverage, > >> and is probably more correct than the former. > >> > >> Note there are more variations of this misalign check in expr.c, > >> some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR: > >> > >> && mode != BLKmode > >> && align < GET_MODE_ALIGNMENT (mode)) > >> { > >> if ((icode = optab_handler (movmisalign_optab, mode)) > >> != CODE_FOR_nothing) > >> [...] > >> else if (targetm.slow_unaligned_access (mode, align)) > >> temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode), > >> 0, TYPE_UNSIGNED (TREE_TYPE (exp)), > >> (modifier == EXPAND_STACK_PARM > >> ? NULL_RTX : target), > >> mode, mode, false, alt_rtl); > >> > >> I wonder if they are correct this way, why shouldn't we use the movmisalign > >> optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ? > > > > Doesn't the code do exactly this? Prefer movmisalign over > > extrct_bit_field? > > > > Ah, yes. How could I miss that. > > >> > >>> I wonder if I should try to add a gcc_checking_assert to the mov<mode> expand > >>> patterns that the memory is properly aligned ? > >>> > >> > >> Wow, that was a really exciting bug-hunt with those assertions around... > > > > :) > > > >>>> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all > >>>> > >>>> did_conversion = true; > >>>> } > >>>> + else if (MEM_P (data->entry_parm) > >>>> + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > >>>> + > MEM_ALIGN (data->entry_parm) > >>>> > >>>> we arrive here by-passing > >>>> > >>>> else if (need_conversion) > >>>> { > >>>> /* We did not have an insn to convert directly, or the sequence > >>>> generated appeared unsafe. We must first copy the parm to a > >>>> pseudo reg, and save the conversion until after all > >>>> parameters have been moved. */ > >>>> > >>>> int save_tree_used; > >>>> rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm)); > >>>> > >>>> emit_move_insn (tempreg, validated_mem); > >>>> > >>>> but this move instruction is invalid in the same way as the case > >>>> you fix, no? So wouldn't it be better to do > >>>> > >>> > >>> We could do that, but I supposed that there must be a reason why > >>> assign_parm_setup_stack gets away with that same: > >>> > >>> if (data->promoted_mode != data->nominal_mode) > >>> { > >>> /* Conversion is required. */ > >>> rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm)); > >>> > >>> emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm))); > >>> > >>> > >>> So either some back-ends are too permissive with us, > >>> or there is a reason why promoted_mode != nominal_mode > >>> does not happen together with unaligned entry_parm. > >>> In a way that would be a rather unusual ABI. > >>> > >> > >> To find out if that ever happens I added a couple of checking > >> assertions in the arm mov<mode> expand patterns. > >> > >> So far the assertions did (almost) always hold, so it is likely not > >> necessary to fiddle with all those naive move instructions here. > >> > >> So my gut feeling is, leave those places alone until there is a reason > >> for changing them. > > > > Works for me - I wonder if we should add those asserts to generic > > code (guarded with flag_checking) though. > > > > Well, yes, but I was scared away by the complexity of emit_move_insn_1. > > It could be done, but in the moment I would be happy to have these > checks of one major strict alignment target, ARM is a good candidate > since most instructions work even if they are accidentally > using unaligned arguments. So middle-end errors do not always > visible by ordinary tests. Nevertheless it is a blatant violation of the > contract between middle-end and back-end, which should be avoided. Fair enough. > >> However the assertion in movsi triggered a couple times in the > >> ada testsuite due to expand_builtin_init_descriptor using a > >> BLKmode MEM rtx, which is only 8-bit aligned. So, I set the > >> ptr_mode alignment there explicitly. > > > > Looks good given we emit ptr_mode moves into it. Thus OK independently. > > > > Thanks, committed as r274487. > > >> Several struct-layout-1.dg testcase tripped over misaligned > >> complex_cst constants, fixed by varasm.c (align_variable). > >> This is likely a wrong code bug, because misaligned complex > >> constants, are expanded to misaligned MEM_REF, but the > >> expansion cannot handle misaligned constants, only packed > >> structure fields. > > > > Hmm. So your patch overrides user-alignment here. Woudln't it > > be better to do that more conciously by > > > > if (! DECL_USER_ALIGN (decl) > > || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl)) > > && targetm.slow_unaligned_access (DECL_MODE (decl), align))) > > > > ? And why is the movmisalign optab support missing here? > > > > Yes, I wanted to replicate what we have in assign_parm_adjust_stack_rtl: > > /* If we can't trust the parm stack slot to be aligned enough for its > ultimate type, don't use that slot after entry. We'll make another > stack slot, if we need one. */ > if (stack_parm > && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm) > && targetm.slow_unaligned_access (data->nominal_mode, > MEM_ALIGN (stack_parm))) > > which also makes a variable more aligned than it is declared. > But maybe both should also check the movmisalign optab in > addition to slow_unaligned_access ? Quite possible. > > IMHO whatever code later fails to properly use unaligned loads > > should be fixed instead rather than ignoring user requested alignment. > > > > Can you quote a short testcase that explains what exactly goes wrong? > > The struct-layout ones are awkward to look at... > > > > Sure, > > $ cat test.c > _Complex float __attribute__((aligned(1))) cf; > > void foo (void) > { > cf = 1.0i; > } > > $ arm-linux-gnueabihf-gcc -S test.c > during RTL pass: expand > test.c: In function 'foo': > test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003 > 5 | cf = 1.0i; > | ~~~^~~~~~ > 0x7ba475 gen_movsf(rtx_def*, rtx_def*) > ../../gcc-trunk/gcc/config/arm/arm.md:7003 > 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const > ../../gcc-trunk/gcc/recog.h:318 > 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*) > ../../gcc-trunk/gcc/expr.c:3695 > 0xa49914 emit_move_insn(rtx_def*, rtx_def*) > ../../gcc-trunk/gcc/expr.c:3791 > 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*) > ../../gcc-trunk/gcc/expr.c:3490 > 0xa49914 emit_move_insn(rtx_def*, rtx_def*) > ../../gcc-trunk/gcc/expr.c:3791 > 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool) > ../../gcc-trunk/gcc/expr.c:5855 > 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) > ../../gcc-trunk/gcc/expr.c:5441 Huh, so why didn't it trigger /* Handle misaligned stores. */ mode = TYPE_MODE (TREE_TYPE (to)); if ((TREE_CODE (to) == MEM_REF || TREE_CODE (to) == TARGET_MEM_REF) && mode != BLKmode && !mem_ref_refers_to_non_mem_p (to) && ((align = get_object_alignment (to)) < GET_MODE_ALIGNMENT (mode)) && (((icode = optab_handler (movmisalign_optab, mode)) != CODE_FOR_nothing) || targetm.slow_unaligned_access (mode, align))) { ? (_Complex float is 32bit aligned it seems, the DECL_RTL for the var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] <var_decl 0x2aaaaaad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned. Ah, 'to' is a plain DECL here so the above handling is incomplete. IIRC component refs like __real cf = 0.f should be handled fine again(?). So, does adding || DECL_P (to) fix the case as well? > 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) > ../../gcc-trunk/gcc/expr.c:4983 > 0x93396f expand_gimple_stmt_1 > ../../gcc-trunk/gcc/cfgexpand.c:3777 > 0x93396f expand_gimple_stmt > ../../gcc-trunk/gcc/cfgexpand.c:3875 > 0x9392e1 expand_gimple_basic_block > ../../gcc-trunk/gcc/cfgexpand.c:5915 > 0x93b046 execute > ../../gcc-trunk/gcc/cfgexpand.c:6538 > Please submit a full bug report, > with preprocessed source if appropriate. > Please include the complete backtrace with any bug report. > See <https://gcc.gnu.org/bugs/> for instructions. > > Without the hunk in varasm.c of course. > > What happens is that expand_expr_real_2 returns a unaligned mem_ref here: > > case COMPLEX_CST: > /* Handle evaluating a complex constant in a CONCAT target. */ > if (original_target && GET_CODE (original_target) == CONCAT) > { > [... this path not taken ...] > } > > /* fall through */ > > case STRING_CST: > temp = expand_expr_constant (exp, 1, modifier); > > /* temp contains a constant address. > On RISC machines where a constant address isn't valid, > make some insns to get that address into a register. */ > if (modifier != EXPAND_CONST_ADDRESS > && modifier != EXPAND_INITIALIZER > && modifier != EXPAND_SUM > && ! memory_address_addr_space_p (mode, XEXP (temp, 0), > MEM_ADDR_SPACE (temp))) > return replace_equiv_address (temp, > copy_rtx (XEXP (temp, 0))); > return temp; > > The result of expand_expr_real(..., EXPAND_NORMAL) ought to be usable > by emit_move_insn, that is expected just *everywhere* and can't be changed. > > This could probably be fixed in an ugly way in the COMPLEX_CST, handler > but OTOH, I don't see any reason why this constant has to be misaligned > when it can be easily aligned, which avoids the need for a misaligned access. If the COMPLEX_CST happends to end up in unaligned memory then that's of course a bug (unless the target requests that for all COMPLEX_CSTs). That is, if the unalignment is triggered because the store is to an unaligned decl. But I think the issue is the above one? > >> Furthermore gcc.dg/Warray-bounds-33.c was fixed by the > >> change in expr.c (expand_expr_real_1). Certainly is it invalid > >> to read memory at a function address, but it should not ICE. > >> The problem here, is the MEM_REF has no valid MEM_ALIGN, it looks > >> like A32, so the misaligned code execution is not taken, but it is > >> set to A8 below, but then we hit an ICE if the result is used: > > > > So the user accessed it as A32. > > > >> /* Don't set memory attributes if the base expression is > >> SSA_NAME that got expanded as a MEM. In that case, we should > >> just honor its original memory attributes. */ > >> if (TREE_CODE (tem) != SSA_NAME || !MEM_P (orig_op0)) > >> set_mem_attributes (op0, exp, 0); > > > > Huh, I don't understand this. 'tem' should never be SSA_NAME. > > tem is the result of get_inner_reference, why can't that be a SSA_NAME ? We can't subset an SSA_NAME. I have really no idea what this intended to do... > > But set_mem_attributes_minus_bitpos uses get_object_alignment_1 > > and that has special treatment for FUNCTION_DECLs that is not > > covered by > > > > /* When EXP is an actual memory reference then we can use > > TYPE_ALIGN of a pointer indirection to derive alignment. > > Do so only if get_pointer_alignment_1 did not reveal absolute > > alignment knowledge and if using that alignment would > > improve the situation. */ > > unsigned int talign; > > if (!addr_p && !known_alignment > > && (talign = min_align_of_type (TREE_TYPE (exp)) * > > BITS_PER_UNIT) > > && talign > align) > > align = talign; > > > > which could be moved out of the if-cascade. > > > > That said, setting A8 should eventually result into appropriate > > unaligned expansion, so it seems odd this triggers the assert... > > > > The function pointer is really 32-byte aligned in ARM mode to start > with... > > The problem is that the code that handles this misaligned access > is skipped because the mem_rtx has initially no MEM_ATTRS and therefore > MEM_ALIGN == 32, and therefore the code that handles the unaligned > access is not taken. BUT before the mem_rtx is returned it is > set to MEM_ALIGN = 8 by set_mem_attributes, and we have an assertion, > because the result from expand_expr_real(..., EXPAND_NORMAL) ought to be > usable with emit_move_insn. yes, as said the _access_ determines the address should be aligned so we shouldn't end up setting MEM_ALIGN to 8 but to 32 according to the access type/mode. But we can't trust DECL_ALIGN of FUNCTION_DECLs but we _can_ trust users writing *(int *)fn (maybe for actual accesses we _can_ trust DECL_ALIGN, it's just we may not compute nonzero bits for the actual address because of function pointer mangling) (for accessing function code I'd say this would be premature optimization, but ...) > >> > >> Finally gcc.dg/torture/pr48493.c required the change > >> in assign_parm_setup_stack. This is just not using the > >> correct MEM_ALIGN attribute value, while the memory is > >> actually aligned. > > > > But doesn't > > > > int align = STACK_SLOT_ALIGNMENT (data->passed_type, > > GET_MODE (data->entry_parm), > > TYPE_ALIGN > > (data->passed_type)); > > + if (align < (int)GET_MODE_ALIGNMENT (GET_MODE > > (data->entry_parm)) > > + && targetm.slow_unaligned_access (GET_MODE > > (data->entry_parm), > > + align)) > > + align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)); > > > > hint at that STACK_SLOT_ALIGNMENT is simply bogus for the target? > > That is, the target says, for natural alignment 64 the stack slot > > alignment can only be guaranteed 32. You can't then simply up it > > but have to use unaligned accesses (or the target/middle-end needs > > to do dynamic stack alignment). > > > Yes, maybe, but STACK_SLOT_ALIGNMENT is used in a few other places as well, > and none of them have a problem, probably because they use expand_expr, > but here we use emit_move_insn: > > if (MEM_P (src)) > { > [...] > } > else > { > if (!REG_P (src)) > src = force_reg (GET_MODE (src), src); > emit_move_insn (dest, src); > } > > So I could restrict that to > > if (!MEM_P (data->entry_parm) > && align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)) > && ((optab_handler (movmisalign_optab, > GET_MODE (data->entry_parm)) > != CODE_FOR_nothing) > || targetm.slow_unaligned_access (GET_MODE (data->entry_parm), > align))) > align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)); > > But OTOH even for arguments arriving in unaligned stack slots where > emit_block_move could handle it, that would just work against the > intention of assign_parm_adjust_stack_rtl. > > Of course there are limits how much alignment assign_stack_local > can handle, and that would result in an assertion in the emit_move_insn. > But in the end if that happens it is just an impossible target > configuration. Still I think you can't simply override STACK_SLOT_ALIGNMENT just because of the mode of an entry param, can you? If you can assume a bigger alignment then STACK_SLOT_ALIGNMENT should return it. > > > >> Note that set_mem_attributes does not > >> always preserve the MEM_ALIGN of the ref, since: > > > > set_mem_attributes sets _all_ attributes from an expression or type. > > > > Not really: > > refattrs = MEM_ATTRS (ref); > if (refattrs) > { > /* ??? Can this ever happen? Calling this routine on a MEM that > already carries memory attributes should probably be invalid. */ > [...] > attrs.align = refattrs->align; > } > else > [...] > > if (objectp || TREE_CODE (t) == INDIRECT_REF) > attrs.align = MAX (attrs.align, TYPE_ALIGN (type)); > > >> /* Default values from pre-existing memory attributes if present. */ > >> refattrs = MEM_ATTRS (ref); > >> if (refattrs) > >> { > >> /* ??? Can this ever happen? Calling this routine on a MEM that > >> already carries memory attributes should probably be invalid. */ > >> attrs.expr = refattrs->expr; > >> attrs.offset_known_p = refattrs->offset_known_p; > >> attrs.offset = refattrs->offset; > >> attrs.size_known_p = refattrs->size_known_p; > >> attrs.size = refattrs->size; > >> attrs.align = refattrs->align; > >> } > >> > >> but if we happen to set_mem_align to _exactly_ the MODE_ALIGNMENT > >> the MEM_ATTRS are zero, and a smaller alignment may result. > > > > Not sure what you are saying here. That > > > > set_mem_align (MEM:SI A32, 32) > > > > produces a NULL MEM_ATTRS and thus set_mem_attributes not inheriting > > the A32 but eventually computing sth lower? Yeah, that's probably > > an interesting "hole" here. I'm quite sure that if we'd do > > > > refattrs = MEM_ATTRS (ref) ? MEM_ATTRS (ref) : mem_mode_attrs[(int) GET_MODE (ref)]; > > > > we run into issues exactly on strict-align targets ... > > > > Yeah, that's scary... > > >> Well with those checks in place it should now be a lot harder to generate > >> invalid code on STRICT_ALIGNMENT targets, without running into an ICE. > >> > >> Attached is the latest version of my arm alignment patch. > >> > >> > >> Boot-strapped and reg-tested on x64_64-pc-linux-gnu and arm-linux-gnueabihf. > >> Is it OK for trunk? > > > > @@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all > > > > did_conversion = true; > > } > > + else if (MEM_P (data->entry_parm) > > + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > > + > MEM_ALIGN (data->entry_parm) > > + && (((icode = optab_handler (movmisalign_optab, > > + promoted_nominal_mode)) > > + != CODE_FOR_nothing) > > + || targetm.slow_unaligned_access (promoted_nominal_mode, > > + MEM_ALIGN > > (data->entry_parm)))) > > + { > > + if (icode != CODE_FOR_nothing) > > + emit_insn (GEN_FCN (icode) (parmreg, validated_mem)); > > + else > > + rtl = parmreg = extract_bit_field (validated_mem, > > + GET_MODE_BITSIZE (promoted_nominal_mode), 0, > > + unsignedp, parmreg, > > + promoted_nominal_mode, VOIDmode, false, NULL); > > + } > > else > > emit_move_insn (parmreg, validated_mem); > > > > This hunk would be obvious to me if we'd use MEM_ALIGN (validated_mem) / > > GET_MODE (validated_mem) instead of MEM_ALIGN (data->entry_parm) > > and promoted_nominal_mode. > > > > Yes, the idea is just to save some cycles, since > > parmreg = gen_reg_rtx (promoted_nominal_mode); > we know that parmreg will also have that mode, plus > emit_move_insn (parmreg, validated_mem) which would be called here > asserts that: > > gcc_assert (mode != BLKmode > && (GET_MODE (y) == mode || GET_MODE (y) == VOIDmode)); > > so GET_MODE(validated_mem) == GET_MODE (parmreg) == promoted_nominal_mode > > I still like the current version with promoted_nominal_mode slighhtly > better both because of performance, and the 80-column restriction. :) So if you say they are 1:1 equivalent then go for it (for this hunk, approved as "obvious"). Richard.
On 8/15/19 10:55 AM, Richard Biener wrote: > On Wed, 14 Aug 2019, Bernd Edlinger wrote: > >> On 8/14/19 2:00 PM, Richard Biener wrote: >> >> Well, yes, but I was scared away by the complexity of emit_move_insn_1. >> >> It could be done, but in the moment I would be happy to have these >> checks of one major strict alignment target, ARM is a good candidate >> since most instructions work even if they are accidentally >> using unaligned arguments. So middle-end errors do not always >> visible by ordinary tests. Nevertheless it is a blatant violation of the >> contract between middle-end and back-end, which should be avoided. > > Fair enough. > >>>> Several struct-layout-1.dg testcase tripped over misaligned >>>> complex_cst constants, fixed by varasm.c (align_variable). >>>> This is likely a wrong code bug, because misaligned complex >>>> constants, are expanded to misaligned MEM_REF, but the >>>> expansion cannot handle misaligned constants, only packed >>>> structure fields. >>> >>> Hmm. So your patch overrides user-alignment here. Woudln't it >>> be better to do that more conciously by >>> >>> if (! DECL_USER_ALIGN (decl) >>> || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl)) >>> && targetm.slow_unaligned_access (DECL_MODE (decl), align))) >>> ? I don't know why that would be better? If the value is underaligned no matter why, pretend it was declared as naturally aligned if that causes wrong code otherwise. That was the idea here. >>> ? And why is the movmisalign optab support missing here? >>> >> >> Yes, I wanted to replicate what we have in assign_parm_adjust_stack_rtl: >> >> /* If we can't trust the parm stack slot to be aligned enough for its >> ultimate type, don't use that slot after entry. We'll make another >> stack slot, if we need one. */ >> if (stack_parm >> && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm) >> && targetm.slow_unaligned_access (data->nominal_mode, >> MEM_ALIGN (stack_parm))) >> >> which also makes a variable more aligned than it is declared. >> But maybe both should also check the movmisalign optab in >> addition to slow_unaligned_access ? > > Quite possible. > Will do, see attached new version of the patch. >>> IMHO whatever code later fails to properly use unaligned loads >>> should be fixed instead rather than ignoring user requested alignment. >>> >>> Can you quote a short testcase that explains what exactly goes wrong? >>> The struct-layout ones are awkward to look at... >>> >> >> Sure, >> >> $ cat test.c >> _Complex float __attribute__((aligned(1))) cf; >> >> void foo (void) >> { >> cf = 1.0i; >> } >> >> $ arm-linux-gnueabihf-gcc -S test.c >> during RTL pass: expand >> test.c: In function 'foo': >> test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003 >> 5 | cf = 1.0i; >> | ~~~^~~~~~ >> 0x7ba475 gen_movsf(rtx_def*, rtx_def*) >> ../../gcc-trunk/gcc/config/arm/arm.md:7003 >> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const >> ../../gcc-trunk/gcc/recog.h:318 >> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*) >> ../../gcc-trunk/gcc/expr.c:3695 >> 0xa49914 emit_move_insn(rtx_def*, rtx_def*) >> ../../gcc-trunk/gcc/expr.c:3791 >> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*) >> ../../gcc-trunk/gcc/expr.c:3490 >> 0xa49914 emit_move_insn(rtx_def*, rtx_def*) >> ../../gcc-trunk/gcc/expr.c:3791 >> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool) >> ../../gcc-trunk/gcc/expr.c:5855 >> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) >> ../../gcc-trunk/gcc/expr.c:5441 > > Huh, so why didn't it trigger > > /* Handle misaligned stores. */ > mode = TYPE_MODE (TREE_TYPE (to)); > if ((TREE_CODE (to) == MEM_REF > || TREE_CODE (to) == TARGET_MEM_REF) > && mode != BLKmode > && !mem_ref_refers_to_non_mem_p (to) > && ((align = get_object_alignment (to)) > < GET_MODE_ALIGNMENT (mode)) > && (((icode = optab_handler (movmisalign_optab, mode)) > != CODE_FOR_nothing) > || targetm.slow_unaligned_access (mode, align))) > { > > ? (_Complex float is 32bit aligned it seems, the DECL_RTL for the > var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] <var_decl > 0x2aaaaaad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned. > > Ah, 'to' is a plain DECL here so the above handling is incomplete. > IIRC component refs like __real cf = 0.f should be handled fine > again(?). So, does adding || DECL_P (to) fix the case as well? > So I tried this instead of the varasm.c change: Index: expr.c =================================================================== --- expr.c (revision 274487) +++ expr.c (working copy) @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem /* Handle misaligned stores. */ mode = TYPE_MODE (TREE_TYPE (to)); if ((TREE_CODE (to) == MEM_REF - || TREE_CODE (to) == TARGET_MEM_REF) + || TREE_CODE (to) == TARGET_MEM_REF + || DECL_P (to)) && mode != BLKmode - && !mem_ref_refers_to_non_mem_p (to) + && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to)) && ((align = get_object_alignment (to)) < GET_MODE_ALIGNMENT (mode)) && (((icode = optab_handler (movmisalign_optab, mode)) Result, yes, it fixes this test case but then I run all struct-layout-1.exp there are sill cases. where we have problems: In file included from /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h: In function 'test2112':^M /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:23:10: internal compiler error: in gen_movdf, at config/arm/arm.md:7107^M /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:62:3: note: in definition of macro 'TX'^M /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:1: note: in expansion of macro 'TCI'^M /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:294: note: in expansion of macro 'F'^M 0x7ba377 gen_movdf(rtx_def*, rtx_def*)^M ../../gcc-trunk/gcc/config/arm/arm.md:7107^M 0xa494c7 insn_gen_fn::operator()(rtx_def*, rtx_def*) const^M ../../gcc-trunk/gcc/recog.h:318^M 0xa494c7 emit_move_insn_1(rtx_def*, rtx_def*)^M ../../gcc-trunk/gcc/expr.c:3695^M 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M ../../gcc-trunk/gcc/expr.c:3791^M 0xa49437 emit_move_complex_parts(rtx_def*, rtx_def*)^M ../../gcc-trunk/gcc/expr.c:3490^M 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M ../../gcc-trunk/gcc/expr.c:3791^M 0xa50faf store_expr(tree_node*, rtx_def*, int, bool, bool)^M ../../gcc-trunk/gcc/expr.c:5856^M 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M ../../gcc-trunk/gcc/expr.c:5302^M 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M ../../gcc-trunk/gcc/expr.c:4983^M 0x9338af expand_gimple_stmt_1^M ../../gcc-trunk/gcc/cfgexpand.c:3777^M 0x9338af expand_gimple_stmt^M ../../gcc-trunk/gcc/cfgexpand.c:3875^M 0x939221 expand_gimple_basic_block^M ../../gcc-trunk/gcc/cfgexpand.c:5915^M 0x93af86 execute^M ../../gcc-trunk/gcc/cfgexpand.c:6538^M Please submit a full bug report,^M My personal gut feeling this will be more fragile than over-aligning the constants. >> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) >> ../../gcc-trunk/gcc/expr.c:4983 >> 0x93396f expand_gimple_stmt_1 >> ../../gcc-trunk/gcc/cfgexpand.c:3777 >> 0x93396f expand_gimple_stmt >> ../../gcc-trunk/gcc/cfgexpand.c:3875 >> 0x9392e1 expand_gimple_basic_block >> ../../gcc-trunk/gcc/cfgexpand.c:5915 >> 0x93b046 execute >> ../../gcc-trunk/gcc/cfgexpand.c:6538 >> Please submit a full bug report, >> with preprocessed source if appropriate. >> Please include the complete backtrace with any bug report. >> See <https://gcc.gnu.org/bugs/> for instructions. >> >> Without the hunk in varasm.c of course. >> >> What happens is that expand_expr_real_2 returns a unaligned mem_ref here: >> >> case COMPLEX_CST: >> /* Handle evaluating a complex constant in a CONCAT target. */ >> if (original_target && GET_CODE (original_target) == CONCAT) >> { >> [... this path not taken ...] BTW: this code block executes when the other ICE happens. >> } >> >> /* fall through */ >> >> case STRING_CST: >> temp = expand_expr_constant (exp, 1, modifier); >> >> /* temp contains a constant address. >> On RISC machines where a constant address isn't valid, >> make some insns to get that address into a register. */ >> if (modifier != EXPAND_CONST_ADDRESS >> && modifier != EXPAND_INITIALIZER >> && modifier != EXPAND_SUM >> && ! memory_address_addr_space_p (mode, XEXP (temp, 0), >> MEM_ADDR_SPACE (temp))) >> return replace_equiv_address (temp, >> copy_rtx (XEXP (temp, 0))); >> return temp; >> >> The result of expand_expr_real(..., EXPAND_NORMAL) ought to be usable >> by emit_move_insn, that is expected just *everywhere* and can't be changed. >> >> This could probably be fixed in an ugly way in the COMPLEX_CST, handler >> but OTOH, I don't see any reason why this constant has to be misaligned >> when it can be easily aligned, which avoids the need for a misaligned access. > > If the COMPLEX_CST happends to end up in unaligned memory then that's > of course a bug (unless the target requests that for all COMPLEX_CSTs). > That is, if the unalignment is triggered because the store is to an > unaligned decl. > > But I think the issue is the above one? > yes initially the constant seems to be unaligned. then it is expanded, and there is no special handling for unaligned constants in expand_expr_real, and then probably expand_assignment or store_expr seem not fully prepared for this either. >>>> Furthermore gcc.dg/Warray-bounds-33.c was fixed by the >>>> change in expr.c (expand_expr_real_1). Certainly is it invalid >>>> to read memory at a function address, but it should not ICE. >>>> The problem here, is the MEM_REF has no valid MEM_ALIGN, it looks >>>> like A32, so the misaligned code execution is not taken, but it is >>>> set to A8 below, but then we hit an ICE if the result is used: >>> >>> So the user accessed it as A32. >>> >>>> /* Don't set memory attributes if the base expression is >>>> SSA_NAME that got expanded as a MEM. In that case, we should >>>> just honor its original memory attributes. */ >>>> if (TREE_CODE (tem) != SSA_NAME || !MEM_P (orig_op0)) >>>> set_mem_attributes (op0, exp, 0); >>> >>> Huh, I don't understand this. 'tem' should never be SSA_NAME. >> >> tem is the result of get_inner_reference, why can't that be a SSA_NAME ? > > We can't subset an SSA_NAME. I have really no idea what this intended > to do... > Nice, so would you do a patch to change that to a gcc_checking_assert (TREE_CODE (tem) != SSA_NAME) ? maybe with a small explanation? >>> But set_mem_attributes_minus_bitpos uses get_object_alignment_1 >>> and that has special treatment for FUNCTION_DECLs that is not >>> covered by >>> >>> /* When EXP is an actual memory reference then we can use >>> TYPE_ALIGN of a pointer indirection to derive alignment. >>> Do so only if get_pointer_alignment_1 did not reveal absolute >>> alignment knowledge and if using that alignment would >>> improve the situation. */ >>> unsigned int talign; >>> if (!addr_p && !known_alignment >>> && (talign = min_align_of_type (TREE_TYPE (exp)) * >>> BITS_PER_UNIT) >>> && talign > align) >>> align = talign; >>> >>> which could be moved out of the if-cascade. >>> >>> That said, setting A8 should eventually result into appropriate >>> unaligned expansion, so it seems odd this triggers the assert... >>> >> >> The function pointer is really 32-byte aligned in ARM mode to start >> with... >> >> The problem is that the code that handles this misaligned access >> is skipped because the mem_rtx has initially no MEM_ATTRS and therefore >> MEM_ALIGN == 32, and therefore the code that handles the unaligned >> access is not taken. BUT before the mem_rtx is returned it is >> set to MEM_ALIGN = 8 by set_mem_attributes, and we have an assertion, >> because the result from expand_expr_real(..., EXPAND_NORMAL) ought to be >> usable with emit_move_insn. > > yes, as said the _access_ determines the address should be aligned > so we shouldn't end up setting MEM_ALIGN to 8 but to 32 according > to the access type/mode. But we can't trust DECL_ALIGN of > FUNCTION_DECLs but we _can_ trust users writing *(int *)fn > (maybe for actual accesses we _can_ trust DECL_ALIGN, it's just > we may not compute nonzero bits for the actual address because > of function pointer mangling) > (for accessing function code I'd say this would be premature > optimization, but ...) > Not a very nice solution, but it is not worth to spend much effort in optimizing undefined behavior, I just want to avoid the ICE at this time and would not trust the DECL_ALIGN either. >>>> >>>> Finally gcc.dg/torture/pr48493.c required the change >>>> in assign_parm_setup_stack. This is just not using the >>>> correct MEM_ALIGN attribute value, while the memory is >>>> actually aligned. >>> >>> But doesn't >>> >>> int align = STACK_SLOT_ALIGNMENT (data->passed_type, >>> GET_MODE (data->entry_parm), >>> TYPE_ALIGN >>> (data->passed_type)); >>> + if (align < (int)GET_MODE_ALIGNMENT (GET_MODE >>> (data->entry_parm)) >>> + && targetm.slow_unaligned_access (GET_MODE >>> (data->entry_parm), >>> + align)) >>> + align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)); >>> >>> hint at that STACK_SLOT_ALIGNMENT is simply bogus for the target? >>> That is, the target says, for natural alignment 64 the stack slot >>> alignment can only be guaranteed 32. You can't then simply up it >>> but have to use unaligned accesses (or the target/middle-end needs >>> to do dynamic stack alignment). >>> >> Yes, maybe, but STACK_SLOT_ALIGNMENT is used in a few other places as well, >> and none of them have a problem, probably because they use expand_expr, >> but here we use emit_move_insn: >> >> if (MEM_P (src)) >> { >> [...] >> } >> else >> { >> if (!REG_P (src)) >> src = force_reg (GET_MODE (src), src); >> emit_move_insn (dest, src); >> } >> >> So I could restrict that to >> >> if (!MEM_P (data->entry_parm) >> && align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)) >> && ((optab_handler (movmisalign_optab, >> GET_MODE (data->entry_parm)) >> != CODE_FOR_nothing) >> || targetm.slow_unaligned_access (GET_MODE (data->entry_parm), >> align))) >> align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)); >> >> But OTOH even for arguments arriving in unaligned stack slots where >> emit_block_move could handle it, that would just work against the >> intention of assign_parm_adjust_stack_rtl. >> >> Of course there are limits how much alignment assign_stack_local >> can handle, and that would result in an assertion in the emit_move_insn. >> But in the end if that happens it is just an impossible target >> configuration. > > Still I think you can't simply override STACK_SLOT_ALIGNMENT just because > of the mode of an entry param, can you? If you can assume a bigger > alignment then STACK_SLOT_ALIGNMENT should return it. > I don't see a real problem here. All target except i386 and gcn (whatever that is) use the default for STACK_SLOT_ALIGNMENT which simply allows any (large) align value to rule the effective STACK_SLOT_ALIGNMENT. The user could have simply declared the local variable with the alignment that results in better code FWIW. If the stack alignment is too high that is capped in assign_stack_local: /* Ignore alignment if it exceeds MAX_SUPPORTED_STACK_ALIGNMENT. */ if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT) { alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT; alignment = MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT; } I for one, would just assume that MAX_SUPPORTED_STACK_ALIGNMENT should be sufficient for all modes that need movmisalign_optab and friends. If it is not, an ICE would be just fine. >>> >>>> Note that set_mem_attributes does not >>>> always preserve the MEM_ALIGN of the ref, since: >>> >>> set_mem_attributes sets _all_ attributes from an expression or type. >>> >> >> Not really: >> >> refattrs = MEM_ATTRS (ref); >> if (refattrs) >> { >> /* ??? Can this ever happen? Calling this routine on a MEM that >> already carries memory attributes should probably be invalid. */ >> [...] >> attrs.align = refattrs->align; >> } >> else >> [...] >> >> if (objectp || TREE_CODE (t) == INDIRECT_REF) >> attrs.align = MAX (attrs.align, TYPE_ALIGN (type)); >> >>>> /* Default values from pre-existing memory attributes if present. */ >>>> refattrs = MEM_ATTRS (ref); >>>> if (refattrs) >>>> { >>>> /* ??? Can this ever happen? Calling this routine on a MEM that >>>> already carries memory attributes should probably be invalid. */ >>>> attrs.expr = refattrs->expr; >>>> attrs.offset_known_p = refattrs->offset_known_p; >>>> attrs.offset = refattrs->offset; >>>> attrs.size_known_p = refattrs->size_known_p; >>>> attrs.size = refattrs->size; >>>> attrs.align = refattrs->align; >>>> } >>>> >>>> but if we happen to set_mem_align to _exactly_ the MODE_ALIGNMENT >>>> the MEM_ATTRS are zero, and a smaller alignment may result. >>> >>> Not sure what you are saying here. That >>> >>> set_mem_align (MEM:SI A32, 32) >>> >>> produces a NULL MEM_ATTRS and thus set_mem_attributes not inheriting >>> the A32 but eventually computing sth lower? Yeah, that's probably >>> an interesting "hole" here. I'm quite sure that if we'd do >>> >>> refattrs = MEM_ATTRS (ref) ? MEM_ATTRS (ref) : mem_mode_attrs[(int) GET_MODE (ref)]; >>> >>> we run into issues exactly on strict-align targets ... >>> >> >> Yeah, that's scary... >> >>> >>> @@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all >>> >>> did_conversion = true; >>> } >>> + else if (MEM_P (data->entry_parm) >>> + && GET_MODE_ALIGNMENT (promoted_nominal_mode) >>> + > MEM_ALIGN (data->entry_parm) >>> + && (((icode = optab_handler (movmisalign_optab, >>> + promoted_nominal_mode)) >>> + != CODE_FOR_nothing) >>> + || targetm.slow_unaligned_access (promoted_nominal_mode, >>> + MEM_ALIGN >>> (data->entry_parm)))) >>> + { >>> + if (icode != CODE_FOR_nothing) >>> + emit_insn (GEN_FCN (icode) (parmreg, validated_mem)); >>> + else >>> + rtl = parmreg = extract_bit_field (validated_mem, >>> + GET_MODE_BITSIZE (promoted_nominal_mode), 0, >>> + unsignedp, parmreg, >>> + promoted_nominal_mode, VOIDmode, false, NULL); >>> + } >>> else >>> emit_move_insn (parmreg, validated_mem); >>> >>> This hunk would be obvious to me if we'd use MEM_ALIGN (validated_mem) / >>> GET_MODE (validated_mem) instead of MEM_ALIGN (data->entry_parm) >>> and promoted_nominal_mode. >>> >> >> Yes, the idea is just to save some cycles, since >> >> parmreg = gen_reg_rtx (promoted_nominal_mode); >> we know that parmreg will also have that mode, plus >> emit_move_insn (parmreg, validated_mem) which would be called here >> asserts that: >> >> gcc_assert (mode != BLKmode >> && (GET_MODE (y) == mode || GET_MODE (y) == VOIDmode)); >> >> so GET_MODE(validated_mem) == GET_MODE (parmreg) == promoted_nominal_mode >> >> I still like the current version with promoted_nominal_mode slighhtly >> better both because of performance, and the 80-column restriction. :) > > So if you say they are 1:1 equivalent then go for it (for this hunk, > approved as "obvious"). > Okay. Thanks, so I committed that hunk as r274531. Here is what I have right now, boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf (still running, but looks good so far). Is it OK for trunk? Thanks Bernd.
On Thu, 15 Aug 2019, Bernd Edlinger wrote: > On 8/15/19 10:55 AM, Richard Biener wrote: > > On Wed, 14 Aug 2019, Bernd Edlinger wrote: > > > >> On 8/14/19 2:00 PM, Richard Biener wrote: > >> > >> Well, yes, but I was scared away by the complexity of emit_move_insn_1. > >> > >> It could be done, but in the moment I would be happy to have these > >> checks of one major strict alignment target, ARM is a good candidate > >> since most instructions work even if they are accidentally > >> using unaligned arguments. So middle-end errors do not always > >> visible by ordinary tests. Nevertheless it is a blatant violation of the > >> contract between middle-end and back-end, which should be avoided. > > > > Fair enough. > > > >>>> Several struct-layout-1.dg testcase tripped over misaligned > >>>> complex_cst constants, fixed by varasm.c (align_variable). > >>>> This is likely a wrong code bug, because misaligned complex > >>>> constants, are expanded to misaligned MEM_REF, but the > >>>> expansion cannot handle misaligned constants, only packed > >>>> structure fields. > >>> > >>> Hmm. So your patch overrides user-alignment here. Woudln't it > >>> be better to do that more conciously by > >>> > >>> if (! DECL_USER_ALIGN (decl) > >>> || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl)) > >>> && targetm.slow_unaligned_access (DECL_MODE (decl), align))) > >>> > > ? I don't know why that would be better? > If the value is underaligned no matter why, pretend it was declared as > naturally aligned if that causes wrong code otherwise. > That was the idea here. It would be better because then we ignore it and use what we'd use by default rather than inventing sth new. And your patch suggests it might be needed to up align even w/o DECL_USER_ALIGN. > >>> ? And why is the movmisalign optab support missing here? > >>> > >> > >> Yes, I wanted to replicate what we have in assign_parm_adjust_stack_rtl: > >> > >> /* If we can't trust the parm stack slot to be aligned enough for its > >> ultimate type, don't use that slot after entry. We'll make another > >> stack slot, if we need one. */ > >> if (stack_parm > >> && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm) > >> && targetm.slow_unaligned_access (data->nominal_mode, > >> MEM_ALIGN (stack_parm))) > >> > >> which also makes a variable more aligned than it is declared. > >> But maybe both should also check the movmisalign optab in > >> addition to slow_unaligned_access ? > > > > Quite possible. > > > > Will do, see attached new version of the patch. > > >>> IMHO whatever code later fails to properly use unaligned loads > >>> should be fixed instead rather than ignoring user requested alignment. > >>> > >>> Can you quote a short testcase that explains what exactly goes wrong? > >>> The struct-layout ones are awkward to look at... > >>> > >> > >> Sure, > >> > >> $ cat test.c > >> _Complex float __attribute__((aligned(1))) cf; > >> > >> void foo (void) > >> { > >> cf = 1.0i; > >> } > >> > >> $ arm-linux-gnueabihf-gcc -S test.c > >> during RTL pass: expand > >> test.c: In function 'foo': > >> test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003 > >> 5 | cf = 1.0i; > >> | ~~~^~~~~~ > >> 0x7ba475 gen_movsf(rtx_def*, rtx_def*) > >> ../../gcc-trunk/gcc/config/arm/arm.md:7003 > >> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const > >> ../../gcc-trunk/gcc/recog.h:318 > >> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*) > >> ../../gcc-trunk/gcc/expr.c:3695 > >> 0xa49914 emit_move_insn(rtx_def*, rtx_def*) > >> ../../gcc-trunk/gcc/expr.c:3791 > >> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*) > >> ../../gcc-trunk/gcc/expr.c:3490 > >> 0xa49914 emit_move_insn(rtx_def*, rtx_def*) > >> ../../gcc-trunk/gcc/expr.c:3791 > >> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool) > >> ../../gcc-trunk/gcc/expr.c:5855 > >> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) > >> ../../gcc-trunk/gcc/expr.c:5441 > > > > Huh, so why didn't it trigger > > > > /* Handle misaligned stores. */ > > mode = TYPE_MODE (TREE_TYPE (to)); > > if ((TREE_CODE (to) == MEM_REF > > || TREE_CODE (to) == TARGET_MEM_REF) > > && mode != BLKmode > > && !mem_ref_refers_to_non_mem_p (to) > > && ((align = get_object_alignment (to)) > > < GET_MODE_ALIGNMENT (mode)) > > && (((icode = optab_handler (movmisalign_optab, mode)) > > != CODE_FOR_nothing) > > || targetm.slow_unaligned_access (mode, align))) > > { > > > > ? (_Complex float is 32bit aligned it seems, the DECL_RTL for the > > var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] <var_decl > > 0x2aaaaaad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned. > > > > Ah, 'to' is a plain DECL here so the above handling is incomplete. > > IIRC component refs like __real cf = 0.f should be handled fine > > again(?). So, does adding || DECL_P (to) fix the case as well? > > > > So I tried this instead of the varasm.c change: > > Index: expr.c > =================================================================== > --- expr.c (revision 274487) > +++ expr.c (working copy) > @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem > /* Handle misaligned stores. */ > mode = TYPE_MODE (TREE_TYPE (to)); > if ((TREE_CODE (to) == MEM_REF > - || TREE_CODE (to) == TARGET_MEM_REF) > + || TREE_CODE (to) == TARGET_MEM_REF > + || DECL_P (to)) > && mode != BLKmode > - && !mem_ref_refers_to_non_mem_p (to) > + && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to)) > && ((align = get_object_alignment (to)) > < GET_MODE_ALIGNMENT (mode)) > && (((icode = optab_handler (movmisalign_optab, mode)) > > Result, yes, it fixes this test case > but then I run all struct-layout-1.exp there are sill cases. where we have problems: > > In file included from /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M > /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h: In function 'test2112':^M > /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:23:10: internal compiler error: in gen_movdf, at config/arm/arm.md:7107^M > /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:62:3: note: in definition of macro 'TX'^M > /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:1: note: in expansion of macro 'TCI'^M > /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:294: note: in expansion of macro 'F'^M > 0x7ba377 gen_movdf(rtx_def*, rtx_def*)^M > ../../gcc-trunk/gcc/config/arm/arm.md:7107^M > 0xa494c7 insn_gen_fn::operator()(rtx_def*, rtx_def*) const^M > ../../gcc-trunk/gcc/recog.h:318^M > 0xa494c7 emit_move_insn_1(rtx_def*, rtx_def*)^M > ../../gcc-trunk/gcc/expr.c:3695^M > 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M > ../../gcc-trunk/gcc/expr.c:3791^M > 0xa49437 emit_move_complex_parts(rtx_def*, rtx_def*)^M > ../../gcc-trunk/gcc/expr.c:3490^M > 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M > ../../gcc-trunk/gcc/expr.c:3791^M > 0xa50faf store_expr(tree_node*, rtx_def*, int, bool, bool)^M > ../../gcc-trunk/gcc/expr.c:5856^M > 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M > ../../gcc-trunk/gcc/expr.c:5302^M > 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M > ../../gcc-trunk/gcc/expr.c:4983^M > 0x9338af expand_gimple_stmt_1^M > ../../gcc-trunk/gcc/cfgexpand.c:3777^M > 0x9338af expand_gimple_stmt^M > ../../gcc-trunk/gcc/cfgexpand.c:3875^M > 0x939221 expand_gimple_basic_block^M > ../../gcc-trunk/gcc/cfgexpand.c:5915^M > 0x93af86 execute^M > ../../gcc-trunk/gcc/cfgexpand.c:6538^M > Please submit a full bug report,^M > > My personal gut feeling this will be more fragile than over-aligning the > constants. As said the constant shouldn't end up under-aligned, the user cannot specify alignment of literal constants. Not sure what you mean with "over"-aligning. > > > >> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) > >> ../../gcc-trunk/gcc/expr.c:4983 > >> 0x93396f expand_gimple_stmt_1 > >> ../../gcc-trunk/gcc/cfgexpand.c:3777 > >> 0x93396f expand_gimple_stmt > >> ../../gcc-trunk/gcc/cfgexpand.c:3875 > >> 0x9392e1 expand_gimple_basic_block > >> ../../gcc-trunk/gcc/cfgexpand.c:5915 > >> 0x93b046 execute > >> ../../gcc-trunk/gcc/cfgexpand.c:6538 > >> Please submit a full bug report, > >> with preprocessed source if appropriate. > >> Please include the complete backtrace with any bug report. > >> See <https://gcc.gnu.org/bugs/> for instructions. > >> > >> Without the hunk in varasm.c of course. > >> > >> What happens is that expand_expr_real_2 returns a unaligned mem_ref here: > >> > >> case COMPLEX_CST: > >> /* Handle evaluating a complex constant in a CONCAT target. */ > >> if (original_target && GET_CODE (original_target) == CONCAT) > >> { > >> [... this path not taken ...] > > BTW: this code block executes when the other ICE happens. > > >> } > >> > >> /* fall through */ > >> > >> case STRING_CST: > >> temp = expand_expr_constant (exp, 1, modifier); > >> > >> /* temp contains a constant address. > >> On RISC machines where a constant address isn't valid, > >> make some insns to get that address into a register. */ > >> if (modifier != EXPAND_CONST_ADDRESS > >> && modifier != EXPAND_INITIALIZER > >> && modifier != EXPAND_SUM > >> && ! memory_address_addr_space_p (mode, XEXP (temp, 0), > >> MEM_ADDR_SPACE (temp))) > >> return replace_equiv_address (temp, > >> copy_rtx (XEXP (temp, 0))); > >> return temp; > >> > >> The result of expand_expr_real(..., EXPAND_NORMAL) ought to be usable > >> by emit_move_insn, that is expected just *everywhere* and can't be changed. > >> > >> This could probably be fixed in an ugly way in the COMPLEX_CST, handler > >> but OTOH, I don't see any reason why this constant has to be misaligned > >> when it can be easily aligned, which avoids the need for a misaligned access. > > > > If the COMPLEX_CST happends to end up in unaligned memory then that's > > of course a bug (unless the target requests that for all COMPLEX_CSTs). > > That is, if the unalignment is triggered because the store is to an > > unaligned decl. > > > > But I think the issue is the above one? > > > > yes initially the constant seems to be unaligned. then it is expanded, > and there is no special handling for unaligned constants in expand_expr_real, > and then probably expand_assignment or store_expr seem not fully prepared for > this either. With a cross I see the constant has regular aligned _Complex type so not sure how it can end up unaligned. > >>>> Furthermore gcc.dg/Warray-bounds-33.c was fixed by the > >>>> change in expr.c (expand_expr_real_1). Certainly is it invalid > >>>> to read memory at a function address, but it should not ICE. > >>>> The problem here, is the MEM_REF has no valid MEM_ALIGN, it looks > >>>> like A32, so the misaligned code execution is not taken, but it is > >>>> set to A8 below, but then we hit an ICE if the result is used: > >>> > >>> So the user accessed it as A32. > >>> > >>>> /* Don't set memory attributes if the base expression is > >>>> SSA_NAME that got expanded as a MEM. In that case, we should > >>>> just honor its original memory attributes. */ > >>>> if (TREE_CODE (tem) != SSA_NAME || !MEM_P (orig_op0)) > >>>> set_mem_attributes (op0, exp, 0); > >>> > >>> Huh, I don't understand this. 'tem' should never be SSA_NAME. > >> > >> tem is the result of get_inner_reference, why can't that be a SSA_NAME ? > > > > We can't subset an SSA_NAME. I have really no idea what this intended > > to do... > > > > Nice, so would you do a patch to change that to a > gcc_checking_assert (TREE_CODE (tem) != SSA_NAME) ? > maybe with a small explanation? I'll try. > >>> But set_mem_attributes_minus_bitpos uses get_object_alignment_1 > >>> and that has special treatment for FUNCTION_DECLs that is not > >>> covered by > >>> > >>> /* When EXP is an actual memory reference then we can use > >>> TYPE_ALIGN of a pointer indirection to derive alignment. > >>> Do so only if get_pointer_alignment_1 did not reveal absolute > >>> alignment knowledge and if using that alignment would > >>> improve the situation. */ > >>> unsigned int talign; > >>> if (!addr_p && !known_alignment > >>> && (talign = min_align_of_type (TREE_TYPE (exp)) * > >>> BITS_PER_UNIT) > >>> && talign > align) > >>> align = talign; > >>> > >>> which could be moved out of the if-cascade. > >>> > >>> That said, setting A8 should eventually result into appropriate > >>> unaligned expansion, so it seems odd this triggers the assert... > >>> > >> > >> The function pointer is really 32-byte aligned in ARM mode to start > >> with... > >> > >> The problem is that the code that handles this misaligned access > >> is skipped because the mem_rtx has initially no MEM_ATTRS and therefore > >> MEM_ALIGN == 32, and therefore the code that handles the unaligned > >> access is not taken. BUT before the mem_rtx is returned it is > >> set to MEM_ALIGN = 8 by set_mem_attributes, and we have an assertion, > >> because the result from expand_expr_real(..., EXPAND_NORMAL) ought to be > >> usable with emit_move_insn. > > > > yes, as said the _access_ determines the address should be aligned > > so we shouldn't end up setting MEM_ALIGN to 8 but to 32 according > > to the access type/mode. But we can't trust DECL_ALIGN of > > FUNCTION_DECLs but we _can_ trust users writing *(int *)fn > > (maybe for actual accesses we _can_ trust DECL_ALIGN, it's just > > we may not compute nonzero bits for the actual address because > > of function pointer mangling) > > (for accessing function code I'd say this would be premature > > optimization, but ...) > > > > Not a very nice solution, but it is not worth to spend much effort > in optimizing undefined behavior, I just want to avoid the ICE > at this time and would not trust the DECL_ALIGN either. So I meant Index: gcc/builtins.c =================================================================== --- gcc/builtins.c (revision 274534) +++ gcc/builtins.c (working copy) @@ -255,7 +255,8 @@ get_object_alignment_2 (tree exp, unsign /* Extract alignment information from the innermost object and possibly adjust bitpos and offset. */ - if (TREE_CODE (exp) == FUNCTION_DECL) + if (TREE_CODE (exp) == FUNCTION_DECL + && addr_p) { /* Function addresses can encode extra information besides their alignment. However, if TARGET_PTRMEMFUNC_VBIT_LOCATION so we get at DECL_ALIGN of the FUNCTION_DECL (not sure if we can trust it). > >>>> > >>>> Finally gcc.dg/torture/pr48493.c required the change > >>>> in assign_parm_setup_stack. This is just not using the > >>>> correct MEM_ALIGN attribute value, while the memory is > >>>> actually aligned. > >>> > >>> But doesn't > >>> > >>> int align = STACK_SLOT_ALIGNMENT (data->passed_type, > >>> GET_MODE (data->entry_parm), > >>> TYPE_ALIGN > >>> (data->passed_type)); > >>> + if (align < (int)GET_MODE_ALIGNMENT (GET_MODE > >>> (data->entry_parm)) > >>> + && targetm.slow_unaligned_access (GET_MODE > >>> (data->entry_parm), > >>> + align)) > >>> + align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)); > >>> > >>> hint at that STACK_SLOT_ALIGNMENT is simply bogus for the target? > >>> That is, the target says, for natural alignment 64 the stack slot > >>> alignment can only be guaranteed 32. You can't then simply up it > >>> but have to use unaligned accesses (or the target/middle-end needs > >>> to do dynamic stack alignment). > >>> > >> Yes, maybe, but STACK_SLOT_ALIGNMENT is used in a few other places as well, > >> and none of them have a problem, probably because they use expand_expr, > >> but here we use emit_move_insn: > >> > >> if (MEM_P (src)) > >> { > >> [...] > >> } > >> else > >> { > >> if (!REG_P (src)) > >> src = force_reg (GET_MODE (src), src); > >> emit_move_insn (dest, src); > >> } > >> > >> So I could restrict that to > >> > >> if (!MEM_P (data->entry_parm) > >> && align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)) > >> && ((optab_handler (movmisalign_optab, > >> GET_MODE (data->entry_parm)) > >> != CODE_FOR_nothing) > >> || targetm.slow_unaligned_access (GET_MODE (data->entry_parm), > >> align))) > >> align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)); > >> > >> But OTOH even for arguments arriving in unaligned stack slots where > >> emit_block_move could handle it, that would just work against the > >> intention of assign_parm_adjust_stack_rtl. > >> > >> Of course there are limits how much alignment assign_stack_local > >> can handle, and that would result in an assertion in the emit_move_insn. > >> But in the end if that happens it is just an impossible target > >> configuration. > > > > Still I think you can't simply override STACK_SLOT_ALIGNMENT just because > > of the mode of an entry param, can you? If you can assume a bigger > > alignment then STACK_SLOT_ALIGNMENT should return it. > > > > I don't see a real problem here. All target except i386 and gcn (whatever that is) > use the default for STACK_SLOT_ALIGNMENT which simply allows any (large) align value > to rule the effective STACK_SLOT_ALIGNMENT. The user could have simply declared > the local variable with the alignment that results in better code FWIW. > > If the stack alignment is too high that is capped in assign_stack_local: > > /* Ignore alignment if it exceeds MAX_SUPPORTED_STACK_ALIGNMENT. */ > if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT) > { > alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT; > alignment = MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT; > } > > I for one, would just assume that MAX_SUPPORTED_STACK_ALIGNMENT should > be sufficient for all modes that need movmisalign_optab and friends. > If it is not, an ICE would be just fine. Hmm. In some way we could better communicate with the user then and do not allow under-aligning automatic vars? But the you still have packed structs with BLKmode where the actual field accesses will carry SImode even when not aligned(?) > >>> > >>>> Note that set_mem_attributes does not > >>>> always preserve the MEM_ALIGN of the ref, since: > >>> > >>> set_mem_attributes sets _all_ attributes from an expression or type. > >>> > >> > >> Not really: > >> > >> refattrs = MEM_ATTRS (ref); > >> if (refattrs) > >> { > >> /* ??? Can this ever happen? Calling this routine on a MEM that > >> already carries memory attributes should probably be invalid. */ > >> [...] > >> attrs.align = refattrs->align; > >> } > >> else > >> [...] > >> > >> if (objectp || TREE_CODE (t) == INDIRECT_REF) > >> attrs.align = MAX (attrs.align, TYPE_ALIGN (type)); > >> > >>>> /* Default values from pre-existing memory attributes if present. */ > >>>> refattrs = MEM_ATTRS (ref); > >>>> if (refattrs) > >>>> { > >>>> /* ??? Can this ever happen? Calling this routine on a MEM that > >>>> already carries memory attributes should probably be invalid. */ > >>>> attrs.expr = refattrs->expr; > >>>> attrs.offset_known_p = refattrs->offset_known_p; > >>>> attrs.offset = refattrs->offset; > >>>> attrs.size_known_p = refattrs->size_known_p; > >>>> attrs.size = refattrs->size; > >>>> attrs.align = refattrs->align; > >>>> } > >>>> > >>>> but if we happen to set_mem_align to _exactly_ the MODE_ALIGNMENT > >>>> the MEM_ATTRS are zero, and a smaller alignment may result. > >>> > >>> Not sure what you are saying here. That > >>> > >>> set_mem_align (MEM:SI A32, 32) > >>> > >>> produces a NULL MEM_ATTRS and thus set_mem_attributes not inheriting > >>> the A32 but eventually computing sth lower? Yeah, that's probably > >>> an interesting "hole" here. I'm quite sure that if we'd do > >>> > >>> refattrs = MEM_ATTRS (ref) ? MEM_ATTRS (ref) : mem_mode_attrs[(int) GET_MODE (ref)]; > >>> > >>> we run into issues exactly on strict-align targets ... > >>> > >> > >> Yeah, that's scary... > >> > >>> > >>> @@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all > >>> > >>> did_conversion = true; > >>> } > >>> + else if (MEM_P (data->entry_parm) > >>> + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > >>> + > MEM_ALIGN (data->entry_parm) > >>> + && (((icode = optab_handler (movmisalign_optab, > >>> + promoted_nominal_mode)) > >>> + != CODE_FOR_nothing) > >>> + || targetm.slow_unaligned_access (promoted_nominal_mode, > >>> + MEM_ALIGN > >>> (data->entry_parm)))) > >>> + { > >>> + if (icode != CODE_FOR_nothing) > >>> + emit_insn (GEN_FCN (icode) (parmreg, validated_mem)); > >>> + else > >>> + rtl = parmreg = extract_bit_field (validated_mem, > >>> + GET_MODE_BITSIZE (promoted_nominal_mode), 0, > >>> + unsignedp, parmreg, > >>> + promoted_nominal_mode, VOIDmode, false, NULL); > >>> + } > >>> else > >>> emit_move_insn (parmreg, validated_mem); > >>> > >>> This hunk would be obvious to me if we'd use MEM_ALIGN (validated_mem) / > >>> GET_MODE (validated_mem) instead of MEM_ALIGN (data->entry_parm) > >>> and promoted_nominal_mode. > >>> > >> > >> Yes, the idea is just to save some cycles, since > >> > >> parmreg = gen_reg_rtx (promoted_nominal_mode); > >> we know that parmreg will also have that mode, plus > >> emit_move_insn (parmreg, validated_mem) which would be called here > >> asserts that: > >> > >> gcc_assert (mode != BLKmode > >> && (GET_MODE (y) == mode || GET_MODE (y) == VOIDmode)); > >> > >> so GET_MODE(validated_mem) == GET_MODE (parmreg) == promoted_nominal_mode > >> > >> I still like the current version with promoted_nominal_mode slighhtly > >> better both because of performance, and the 80-column restriction. :) > > > > So if you say they are 1:1 equivalent then go for it (for this hunk, > > approved as "obvious"). > > > > Okay. Thanks, so I committed that hunk as r274531. > > Here is what I have right now, boot-strapped and reg-tested on x86_64-pc-linux-gnu > and arm-linux-gnueabihf (still running, but looks good so far). > > Is it OK for trunk? Please split it into the parts for the PR and parts making the asserts not trigger. The PR is already fixed, right? The assign_parm_find_stack_rtl hunk is merely an optimization? Richard.
On Thu, 15 Aug 2019, Richard Biener wrote: > On Thu, 15 Aug 2019, Bernd Edlinger wrote: > > > > > > We can't subset an SSA_NAME. I have really no idea what this intended > > > to do... > > > > > > > Nice, so would you do a patch to change that to a > > gcc_checking_assert (TREE_CODE (tem) != SSA_NAME) ? > > maybe with a small explanation? > > I'll try. So actually we can via BIT_FIELD_REF<_1, ...> and that _1 can end up being expanded in memory. See r233656 which brought this in. Richard.
On 8/15/19 2:54 PM, Richard Biener wrote: > On Thu, 15 Aug 2019, Bernd Edlinger wrote: > >>>>> >>>>> Hmm. So your patch overrides user-alignment here. Woudln't it >>>>> be better to do that more conciously by >>>>> >>>>> if (! DECL_USER_ALIGN (decl) >>>>> || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl)) >>>>> && targetm.slow_unaligned_access (DECL_MODE (decl), align))) >>>>> >> >> ? I don't know why that would be better? >> If the value is underaligned no matter why, pretend it was declared as >> naturally aligned if that causes wrong code otherwise. >> That was the idea here. > > It would be better because then we ignore it and use what we'd use > by default rather than inventing sth new. And your patch suggests > it might be needed to up align even w/o DECL_USER_ALIGN. > Hmmm, you mean the constant 1.0i should not have DECL_USER_ALIGN set? But it inherits the alignment from the destination variable, apparently. did you mean if (! DECL_USER_ALIGN (decl) && align < GET_MODE_ALIGNMENT (DECL_MODE (decl)) && ... ? I can give it a try. >>>>> IMHO whatever code later fails to properly use unaligned loads >>>>> should be fixed instead rather than ignoring user requested alignment. >>>>> >>>>> Can you quote a short testcase that explains what exactly goes wrong? >>>>> The struct-layout ones are awkward to look at... >>>>> >>>> >>>> Sure, >>>> >>>> $ cat test.c >>>> _Complex float __attribute__((aligned(1))) cf; >>>> >>>> void foo (void) >>>> { >>>> cf = 1.0i; >>>> } >>>> >>>> $ arm-linux-gnueabihf-gcc -S test.c >>>> during RTL pass: expand >>>> test.c: In function 'foo': >>>> test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003 >>>> 5 | cf = 1.0i; >>>> | ~~~^~~~~~ >>>> 0x7ba475 gen_movsf(rtx_def*, rtx_def*) >>>> ../../gcc-trunk/gcc/config/arm/arm.md:7003 >>>> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const >>>> ../../gcc-trunk/gcc/recog.h:318 >>>> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*) >>>> ../../gcc-trunk/gcc/expr.c:3695 >>>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*) >>>> ../../gcc-trunk/gcc/expr.c:3791 >>>> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*) >>>> ../../gcc-trunk/gcc/expr.c:3490 >>>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*) >>>> ../../gcc-trunk/gcc/expr.c:3791 >>>> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool) >>>> ../../gcc-trunk/gcc/expr.c:5855 >>>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) >>>> ../../gcc-trunk/gcc/expr.c:5441 >>> >>> Huh, so why didn't it trigger >>> >>> /* Handle misaligned stores. */ >>> mode = TYPE_MODE (TREE_TYPE (to)); >>> if ((TREE_CODE (to) == MEM_REF >>> || TREE_CODE (to) == TARGET_MEM_REF) >>> && mode != BLKmode >>> && !mem_ref_refers_to_non_mem_p (to) >>> && ((align = get_object_alignment (to)) >>> < GET_MODE_ALIGNMENT (mode)) >>> && (((icode = optab_handler (movmisalign_optab, mode)) >>> != CODE_FOR_nothing) >>> || targetm.slow_unaligned_access (mode, align))) >>> { >>> >>> ? (_Complex float is 32bit aligned it seems, the DECL_RTL for the >>> var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] <var_decl >>> 0x2aaaaaad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned. >>> >>> Ah, 'to' is a plain DECL here so the above handling is incomplete. >>> IIRC component refs like __real cf = 0.f should be handled fine >>> again(?). So, does adding || DECL_P (to) fix the case as well? >>> >> >> So I tried this instead of the varasm.c change: >> >> Index: expr.c >> =================================================================== >> --- expr.c (revision 274487) >> +++ expr.c (working copy) >> @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem >> /* Handle misaligned stores. */ >> mode = TYPE_MODE (TREE_TYPE (to)); >> if ((TREE_CODE (to) == MEM_REF >> - || TREE_CODE (to) == TARGET_MEM_REF) >> + || TREE_CODE (to) == TARGET_MEM_REF >> + || DECL_P (to)) >> && mode != BLKmode >> - && !mem_ref_refers_to_non_mem_p (to) >> + && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to)) >> && ((align = get_object_alignment (to)) >> < GET_MODE_ALIGNMENT (mode)) >> && (((icode = optab_handler (movmisalign_optab, mode)) >> >> Result, yes, it fixes this test case >> but then I run all struct-layout-1.exp there are sill cases. where we have problems: >> >> In file included from /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M >> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h: In function 'test2112':^M >> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:23:10: internal compiler error: in gen_movdf, at config/arm/arm.md:7107^M >> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:62:3: note: in definition of macro 'TX'^M >> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:1: note: in expansion of macro 'TCI'^M >> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:294: note: in expansion of macro 'F'^M >> 0x7ba377 gen_movdf(rtx_def*, rtx_def*)^M >> ../../gcc-trunk/gcc/config/arm/arm.md:7107^M >> 0xa494c7 insn_gen_fn::operator()(rtx_def*, rtx_def*) const^M >> ../../gcc-trunk/gcc/recog.h:318^M >> 0xa494c7 emit_move_insn_1(rtx_def*, rtx_def*)^M >> ../../gcc-trunk/gcc/expr.c:3695^M >> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M >> ../../gcc-trunk/gcc/expr.c:3791^M >> 0xa49437 emit_move_complex_parts(rtx_def*, rtx_def*)^M >> ../../gcc-trunk/gcc/expr.c:3490^M >> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M >> ../../gcc-trunk/gcc/expr.c:3791^M >> 0xa50faf store_expr(tree_node*, rtx_def*, int, bool, bool)^M >> ../../gcc-trunk/gcc/expr.c:5856^M >> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M >> ../../gcc-trunk/gcc/expr.c:5302^M >> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M >> ../../gcc-trunk/gcc/expr.c:4983^M >> 0x9338af expand_gimple_stmt_1^M >> ../../gcc-trunk/gcc/cfgexpand.c:3777^M >> 0x9338af expand_gimple_stmt^M >> ../../gcc-trunk/gcc/cfgexpand.c:3875^M >> 0x939221 expand_gimple_basic_block^M >> ../../gcc-trunk/gcc/cfgexpand.c:5915^M >> 0x93af86 execute^M >> ../../gcc-trunk/gcc/cfgexpand.c:6538^M >> Please submit a full bug report,^M >> >> My personal gut feeling this will be more fragile than over-aligning the >> constants. > > As said the constant shouldn't end up under-aligned, the user cannot > specify alignment of literal constants. Not sure what you mean > with "over"-aligning. > Hmm wait a moment, I actually wanted _only_ to change the DECL_ARTIFICIAL that is built by build_constant_desc. It uses align_variable of course, but I totally missed that this also controls the alignment of normal variables, sorry about the confusion here. I mean we should align the constant for the unaligned complex with the natural alignment of the type-mode. That wrong fix made the variables ignore the alignment, which was of course not intended, and instead I would need: Index: expr.c =================================================================== --- expr.c (revision 274531) +++ expr.c (working copy) @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem /* Handle misaligned stores. */ mode = TYPE_MODE (TREE_TYPE (to)); if ((TREE_CODE (to) == MEM_REF - || TREE_CODE (to) == TARGET_MEM_REF) + || TREE_CODE (to) == TARGET_MEM_REF + || DECL_P (to)) && mode != BLKmode - && !mem_ref_refers_to_non_mem_p (to) + && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to)) && ((align = get_object_alignment (to)) < GET_MODE_ALIGNMENT (mode)) && (((icode = optab_handler (movmisalign_optab, mode)) Index: varasm.c =================================================================== --- varasm.c (revision 274531) +++ varasm.c (working copy) @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3. If not see #include "stmt.h" #include "expr.h" #include "expmed.h" +#include "optabs.h" #include "output.h" #include "langhooks.h" #include "debug.h" @@ -3386,7 +3387,15 @@ build_constant_desc (tree exp) if (TREE_CODE (exp) == STRING_CST) SET_DECL_ALIGN (decl, targetm.constant_alignment (exp, DECL_ALIGN (decl))); else - align_variable (decl, 0); + { + align_variable (decl, 0); + if (DECL_ALIGN (decl) < GET_MODE_ALIGNMENT (DECL_MODE (decl)) + && ((optab_handler (movmisalign_optab, DECL_MODE (decl)) + != CODE_FOR_nothing) + || targetm.slow_unaligned_access (DECL_MODE (decl), + DECL_ALIGN (decl)))) + SET_DECL_ALIGN (decl, GET_MODE_ALIGNMENT (DECL_MODE (decl))); + } /* Now construct the SYMBOL_REF and the MEM. */ if (use_object_blocks_p ()) >> >> >>>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) >>>> ../../gcc-trunk/gcc/expr.c:4983 >>>> 0x93396f expand_gimple_stmt_1 >>>> ../../gcc-trunk/gcc/cfgexpand.c:3777 >>>> 0x93396f expand_gimple_stmt >>>> ../../gcc-trunk/gcc/cfgexpand.c:3875 >>>> 0x9392e1 expand_gimple_basic_block >>>> ../../gcc-trunk/gcc/cfgexpand.c:5915 >>>> 0x93b046 execute >>>> ../../gcc-trunk/gcc/cfgexpand.c:6538 >>>> Please submit a full bug report, >>>> with preprocessed source if appropriate. >>>> Please include the complete backtrace with any bug report. >>>> See <https://gcc.gnu.org/bugs/> for instructions. >>>> >>>> Without the hunk in varasm.c of course. >>>> >>>> What happens is that expand_expr_real_2 returns a unaligned mem_ref here: >>>> >>>> case COMPLEX_CST: >>>> /* Handle evaluating a complex constant in a CONCAT target. */ >>>> if (original_target && GET_CODE (original_target) == CONCAT) >>>> { >>>> [... this path not taken ...] >> >> BTW: this code block executes when the other ICE happens. >> >>>> } >>>> >>>> /* fall through */ >>>> >>>> case STRING_CST: >>>> temp = expand_expr_constant (exp, 1, modifier); >>>> >>>> /* temp contains a constant address. >>>> On RISC machines where a constant address isn't valid, >>>> make some insns to get that address into a register. */ >>>> if (modifier != EXPAND_CONST_ADDRESS >>>> && modifier != EXPAND_INITIALIZER >>>> && modifier != EXPAND_SUM >>>> && ! memory_address_addr_space_p (mode, XEXP (temp, 0), >>>> MEM_ADDR_SPACE (temp))) >>>> return replace_equiv_address (temp, >>>> copy_rtx (XEXP (temp, 0))); >>>> return temp; >>>> >>>> The result of expand_expr_real(..., EXPAND_NORMAL) ought to be usable >>>> by emit_move_insn, that is expected just *everywhere* and can't be changed. >>>> >>>> This could probably be fixed in an ugly way in the COMPLEX_CST, handler >>>> but OTOH, I don't see any reason why this constant has to be misaligned >>>> when it can be easily aligned, which avoids the need for a misaligned access. >>> >>> If the COMPLEX_CST happends to end up in unaligned memory then that's >>> of course a bug (unless the target requests that for all COMPLEX_CSTs). >>> That is, if the unalignment is triggered because the store is to an >>> unaligned decl. >>> >>> But I think the issue is the above one? >>> >> >> yes initially the constant seems to be unaligned. then it is expanded, >> and there is no special handling for unaligned constants in expand_expr_real, >> and then probably expand_assignment or store_expr seem not fully prepared for >> this either. > > With a cross I see the constant has regular aligned _Complex type > so not sure how it can end up unaligned. > Maybe a target configuration issue. Not sure, I have configured mine this way: ../gcc-trunk/configure --prefix=/home/ed/gnu/arm-linux-gnueabihf-linux64 --target=arm-linux-gnueabihf --enable-languages=all --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard However it appears now there are two different errors, one is in expand_assignment which you found (I start to wonder if I should add you to the authors section of this patch), and a different one, which I have not yet simplified, but you can easily try that for yourself: make check-gcc-c RUNTESTFLAGS="struct-layout-1.exp=*" it is okay when the test fails to execute but there should no internal compiler errors. >>>> >>>> The problem is that the code that handles this misaligned access >>>> is skipped because the mem_rtx has initially no MEM_ATTRS and therefore >>>> MEM_ALIGN == 32, and therefore the code that handles the unaligned >>>> access is not taken. BUT before the mem_rtx is returned it is >>>> set to MEM_ALIGN = 8 by set_mem_attributes, and we have an assertion, >>>> because the result from expand_expr_real(..., EXPAND_NORMAL) ought to be >>>> usable with emit_move_insn. >>> >>> yes, as said the _access_ determines the address should be aligned >>> so we shouldn't end up setting MEM_ALIGN to 8 but to 32 according >>> to the access type/mode. But we can't trust DECL_ALIGN of >>> FUNCTION_DECLs but we _can_ trust users writing *(int *)fn >>> (maybe for actual accesses we _can_ trust DECL_ALIGN, it's just >>> we may not compute nonzero bits for the actual address because >>> of function pointer mangling) >>> (for accessing function code I'd say this would be premature >>> optimization, but ...) >>> >> >> Not a very nice solution, but it is not worth to spend much effort >> in optimizing undefined behavior, I just want to avoid the ICE >> at this time and would not trust the DECL_ALIGN either. > > So I meant > > Index: gcc/builtins.c > =================================================================== > --- gcc/builtins.c (revision 274534) > +++ gcc/builtins.c (working copy) > @@ -255,7 +255,8 @@ get_object_alignment_2 (tree exp, unsign > > /* Extract alignment information from the innermost object and > possibly adjust bitpos and offset. */ > - if (TREE_CODE (exp) == FUNCTION_DECL) > + if (TREE_CODE (exp) == FUNCTION_DECL > + && addr_p) > { > /* Function addresses can encode extra information besides their > alignment. However, if TARGET_PTRMEMFUNC_VBIT_LOCATION > > so we get at DECL_ALIGN of the FUNCTION_DECL (not sure if we > can trust it). > >>> >>> Still I think you can't simply override STACK_SLOT_ALIGNMENT just because >>> of the mode of an entry param, can you? If you can assume a bigger >>> alignment then STACK_SLOT_ALIGNMENT should return it. >>> >> >> I don't see a real problem here. All target except i386 and gcn (whatever that is) >> use the default for STACK_SLOT_ALIGNMENT which simply allows any (large) align value >> to rule the effective STACK_SLOT_ALIGNMENT. The user could have simply declared >> the local variable with the alignment that results in better code FWIW. >> >> If the stack alignment is too high that is capped in assign_stack_local: >> >> /* Ignore alignment if it exceeds MAX_SUPPORTED_STACK_ALIGNMENT. */ >> if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT) >> { >> alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT; >> alignment = MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT; >> } >> >> I for one, would just assume that MAX_SUPPORTED_STACK_ALIGNMENT should >> be sufficient for all modes that need movmisalign_optab and friends. >> If it is not, an ICE would be just fine. > > Hmm. In some way we could better communicate with the user then > and do not allow under-aligning automatic vars? But the you > still have packed structs with BLKmode where the actual field > accesses will carry SImode even when not aligned(?) > Yes, that works also when unaligned. > > Please split it into the parts for the PR and parts making the > asserts not trigger. > Yes, will do. > The PR is already fixed, right? The assign_parm_find_stack_rtl hunk > is merely an optimization? > Hmmmm... You are right, I should have added that to the commit message... Of course the test cases try to verify the optimization. Thanks Bernd.
On August 15, 2019 4:52:24 PM GMT+02:00, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: >On 8/15/19 2:54 PM, Richard Biener wrote: >> On Thu, 15 Aug 2019, Bernd Edlinger wrote: >> >>>>>> >>>>>> Hmm. So your patch overrides user-alignment here. Woudln't it >>>>>> be better to do that more conciously by >>>>>> >>>>>> if (! DECL_USER_ALIGN (decl) >>>>>> || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl)) >>>>>> && targetm.slow_unaligned_access (DECL_MODE (decl), >align))) >>>>>> >>> >>> ? I don't know why that would be better? >>> If the value is underaligned no matter why, pretend it was declared >as >>> naturally aligned if that causes wrong code otherwise. >>> That was the idea here. >> >> It would be better because then we ignore it and use what we'd use >> by default rather than inventing sth new. And your patch suggests >> it might be needed to up align even w/o DECL_USER_ALIGN. >> > >Hmmm, you mean the constant 1.0i should not have DECL_USER_ALIGN set? >But it inherits the alignment from the destination variable, >apparently. Yes. I think it shouldn't inherit the alignment unless we are assembling a static initializer. > >did you mean >if (! DECL_USER_ALIGN (decl) > && align < GET_MODE_ALIGNMENT (DECL_MODE (decl)) > && ... >? > >I can give it a try. No, I meant || thus ignore DECL_USER_ALIGN if it is sth we have to satisfy with unaligned loads. > >>>>>> IMHO whatever code later fails to properly use unaligned loads >>>>>> should be fixed instead rather than ignoring user requested >alignment. >>>>>> >>>>>> Can you quote a short testcase that explains what exactly goes >wrong? >>>>>> The struct-layout ones are awkward to look at... >>>>>> >>>>> >>>>> Sure, >>>>> >>>>> $ cat test.c >>>>> _Complex float __attribute__((aligned(1))) cf; >>>>> >>>>> void foo (void) >>>>> { >>>>> cf = 1.0i; >>>>> } >>>>> >>>>> $ arm-linux-gnueabihf-gcc -S test.c >>>>> during RTL pass: expand >>>>> test.c: In function 'foo': >>>>> test.c:5:6: internal compiler error: in gen_movsf, at >config/arm/arm.md:7003 >>>>> 5 | cf = 1.0i; >>>>> | ~~~^~~~~~ >>>>> 0x7ba475 gen_movsf(rtx_def*, rtx_def*) >>>>> ../../gcc-trunk/gcc/config/arm/arm.md:7003 >>>>> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const >>>>> ../../gcc-trunk/gcc/recog.h:318 >>>>> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*) >>>>> ../../gcc-trunk/gcc/expr.c:3695 >>>>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*) >>>>> ../../gcc-trunk/gcc/expr.c:3791 >>>>> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*) >>>>> ../../gcc-trunk/gcc/expr.c:3490 >>>>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*) >>>>> ../../gcc-trunk/gcc/expr.c:3791 >>>>> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool) >>>>> ../../gcc-trunk/gcc/expr.c:5855 >>>>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) >>>>> ../../gcc-trunk/gcc/expr.c:5441 >>>> >>>> Huh, so why didn't it trigger >>>> >>>> /* Handle misaligned stores. */ >>>> mode = TYPE_MODE (TREE_TYPE (to)); >>>> if ((TREE_CODE (to) == MEM_REF >>>> || TREE_CODE (to) == TARGET_MEM_REF) >>>> && mode != BLKmode >>>> && !mem_ref_refers_to_non_mem_p (to) >>>> && ((align = get_object_alignment (to)) >>>> < GET_MODE_ALIGNMENT (mode)) >>>> && (((icode = optab_handler (movmisalign_optab, mode)) >>>> != CODE_FOR_nothing) >>>> || targetm.slow_unaligned_access (mode, align))) >>>> { >>>> >>>> ? (_Complex float is 32bit aligned it seems, the DECL_RTL for the >>>> var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] <var_decl >>>> 0x2aaaaaad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned. >>>> >>>> Ah, 'to' is a plain DECL here so the above handling is incomplete. >>>> IIRC component refs like __real cf = 0.f should be handled fine >>>> again(?). So, does adding || DECL_P (to) fix the case as well? >>>> >>> >>> So I tried this instead of the varasm.c change: >>> >>> Index: expr.c >>> =================================================================== >>> --- expr.c (revision 274487) >>> +++ expr.c (working copy) >>> @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool >nontem >>> /* Handle misaligned stores. */ >>> mode = TYPE_MODE (TREE_TYPE (to)); >>> if ((TREE_CODE (to) == MEM_REF >>> - || TREE_CODE (to) == TARGET_MEM_REF) >>> + || TREE_CODE (to) == TARGET_MEM_REF >>> + || DECL_P (to)) >>> && mode != BLKmode >>> - && !mem_ref_refers_to_non_mem_p (to) >>> + && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to)) >>> && ((align = get_object_alignment (to)) >>> < GET_MODE_ALIGNMENT (mode)) >>> && (((icode = optab_handler (movmisalign_optab, mode)) >>> >>> Result, yes, it fixes this test case >>> but then I run all struct-layout-1.exp there are sill cases. where >we have problems: >>> >>> In file included from >/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M >>> >/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h: >In function 'test2112':^M >>> >/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:23:10: >internal compiler error: in gen_movdf, at config/arm/arm.md:7107^M >>> >/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:62:3: >note: in definition of macro 'TX'^M >>> >/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:1: >note: in expansion of macro 'TCI'^M >>> >/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:294: >note: in expansion of macro 'F'^M >>> 0x7ba377 gen_movdf(rtx_def*, rtx_def*)^M >>> ../../gcc-trunk/gcc/config/arm/arm.md:7107^M >>> 0xa494c7 insn_gen_fn::operator()(rtx_def*, rtx_def*) const^M >>> ../../gcc-trunk/gcc/recog.h:318^M >>> 0xa494c7 emit_move_insn_1(rtx_def*, rtx_def*)^M >>> ../../gcc-trunk/gcc/expr.c:3695^M >>> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M >>> ../../gcc-trunk/gcc/expr.c:3791^M >>> 0xa49437 emit_move_complex_parts(rtx_def*, rtx_def*)^M >>> ../../gcc-trunk/gcc/expr.c:3490^M >>> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M >>> ../../gcc-trunk/gcc/expr.c:3791^M >>> 0xa50faf store_expr(tree_node*, rtx_def*, int, bool, bool)^M >>> ../../gcc-trunk/gcc/expr.c:5856^M >>> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M >>> ../../gcc-trunk/gcc/expr.c:5302^M >>> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M >>> ../../gcc-trunk/gcc/expr.c:4983^M >>> 0x9338af expand_gimple_stmt_1^M >>> ../../gcc-trunk/gcc/cfgexpand.c:3777^M >>> 0x9338af expand_gimple_stmt^M >>> ../../gcc-trunk/gcc/cfgexpand.c:3875^M >>> 0x939221 expand_gimple_basic_block^M >>> ../../gcc-trunk/gcc/cfgexpand.c:5915^M >>> 0x93af86 execute^M >>> ../../gcc-trunk/gcc/cfgexpand.c:6538^M >>> Please submit a full bug report,^M >>> >>> My personal gut feeling this will be more fragile than over-aligning >the >>> constants. >> >> As said the constant shouldn't end up under-aligned, the user cannot >> specify alignment of literal constants. Not sure what you mean >> with "over"-aligning. >> > > >Hmm wait a moment, I actually wanted _only_ to change the >DECL_ARTIFICIAL >that is built by build_constant_desc. It uses align_variable of >course, >but I totally missed that this also controls the alignment of normal >variables, sorry about the confusion here. > >I mean we should align the constant for the unaligned complex with >the natural alignment of the type-mode. Agreed. That wrong fix made >the variables ignore the alignment, which was of course not intended, >and instead I would need: > >Index: expr.c >=================================================================== >--- expr.c (revision 274531) >+++ expr.c (working copy) >@@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool >nontem > /* Handle misaligned stores. */ > mode = TYPE_MODE (TREE_TYPE (to)); > if ((TREE_CODE (to) == MEM_REF >- || TREE_CODE (to) == TARGET_MEM_REF) >+ || TREE_CODE (to) == TARGET_MEM_REF >+ || DECL_P (to)) > && mode != BLKmode >- && !mem_ref_refers_to_non_mem_p (to) >+ && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to)) > && ((align = get_object_alignment (to)) > < GET_MODE_ALIGNMENT (mode)) > && (((icode = optab_handler (movmisalign_optab, mode)) > >Index: varasm.c >=================================================================== >--- varasm.c (revision 274531) >+++ varasm.c (working copy) >@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3. If not see > #include "stmt.h" > #include "expr.h" > #include "expmed.h" >+#include "optabs.h" > #include "output.h" > #include "langhooks.h" > #include "debug.h" >@@ -3386,7 +3387,15 @@ build_constant_desc (tree exp) > if (TREE_CODE (exp) == STRING_CST) >SET_DECL_ALIGN (decl, targetm.constant_alignment (exp, DECL_ALIGN >(decl))); > else >- align_variable (decl, 0); >+ { >+ align_variable (decl, 0); >+ if (DECL_ALIGN (decl) < GET_MODE_ALIGNMENT (DECL_MODE (decl)) >+ && ((optab_handler (movmisalign_optab, DECL_MODE (decl)) >+ != CODE_FOR_nothing) >+ || targetm.slow_unaligned_access (DECL_MODE (decl), >+ DECL_ALIGN (decl)))) >+ SET_DECL_ALIGN (decl, GET_MODE_ALIGNMENT (DECL_MODE (decl))); >+ } > > /* Now construct the SYMBOL_REF and the MEM. */ > if (use_object_blocks_p ()) > >>> >>> >>>>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) >>>>> ../../gcc-trunk/gcc/expr.c:4983 >>>>> 0x93396f expand_gimple_stmt_1 >>>>> ../../gcc-trunk/gcc/cfgexpand.c:3777 >>>>> 0x93396f expand_gimple_stmt >>>>> ../../gcc-trunk/gcc/cfgexpand.c:3875 >>>>> 0x9392e1 expand_gimple_basic_block >>>>> ../../gcc-trunk/gcc/cfgexpand.c:5915 >>>>> 0x93b046 execute >>>>> ../../gcc-trunk/gcc/cfgexpand.c:6538 >>>>> Please submit a full bug report, >>>>> with preprocessed source if appropriate. >>>>> Please include the complete backtrace with any bug report. >>>>> See <https://gcc.gnu.org/bugs/> for instructions. >>>>> >>>>> Without the hunk in varasm.c of course. >>>>> >>>>> What happens is that expand_expr_real_2 returns a unaligned >mem_ref here: >>>>> >>>>> case COMPLEX_CST: >>>>> /* Handle evaluating a complex constant in a CONCAT target. >*/ >>>>> if (original_target && GET_CODE (original_target) == CONCAT) >>>>> { >>>>> [... this path not taken ...] >>> >>> BTW: this code block executes when the other ICE happens. >>> >>>>> } >>>>> >>>>> /* fall through */ >>>>> >>>>> case STRING_CST: >>>>> temp = expand_expr_constant (exp, 1, modifier); >>>>> >>>>> /* temp contains a constant address. >>>>> On RISC machines where a constant address isn't valid, >>>>> make some insns to get that address into a register. */ >>>>> if (modifier != EXPAND_CONST_ADDRESS >>>>> && modifier != EXPAND_INITIALIZER >>>>> && modifier != EXPAND_SUM >>>>> && ! memory_address_addr_space_p (mode, XEXP (temp, 0), >>>>> MEM_ADDR_SPACE >(temp))) >>>>> return replace_equiv_address (temp, >>>>> copy_rtx (XEXP (temp, 0))); >>>>> return temp; >>>>> >>>>> The result of expand_expr_real(..., EXPAND_NORMAL) ought to be >usable >>>>> by emit_move_insn, that is expected just *everywhere* and can't be >changed. >>>>> >>>>> This could probably be fixed in an ugly way in the COMPLEX_CST, >handler >>>>> but OTOH, I don't see any reason why this constant has to be >misaligned >>>>> when it can be easily aligned, which avoids the need for a >misaligned access. >>>> >>>> If the COMPLEX_CST happends to end up in unaligned memory then >that's >>>> of course a bug (unless the target requests that for all >COMPLEX_CSTs). >>>> That is, if the unalignment is triggered because the store is to an >>>> unaligned decl. >>>> >>>> But I think the issue is the above one? >>>> >>> >>> yes initially the constant seems to be unaligned. then it is >expanded, >>> and there is no special handling for unaligned constants in >expand_expr_real, >>> and then probably expand_assignment or store_expr seem not fully >prepared for >>> this either. >> >> With a cross I see the constant has regular aligned _Complex type >> so not sure how it can end up unaligned. >> > >Maybe a target configuration issue. >Not sure, I have configured mine this way: > >../gcc-trunk/configure >--prefix=/home/ed/gnu/arm-linux-gnueabihf-linux64 >--target=arm-linux-gnueabihf --enable-languages=all --with-arch=armv7-a >--with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard > >However it appears now there are two different errors, one is in >expand_assignment >which you found (I start to wonder if I should add you to the authors >section >of this patch), and a different one, which I have not yet simplified, >but you can easily try that for yourself: > >make check-gcc-c RUNTESTFLAGS="struct-layout-1.exp=*" > >it is okay when the test fails to execute but there should no internal >compiler errors. > > >>>>> >>>>> The problem is that the code that handles this misaligned access >>>>> is skipped because the mem_rtx has initially no MEM_ATTRS and >therefore >>>>> MEM_ALIGN == 32, and therefore the code that handles the unaligned >>>>> access is not taken. BUT before the mem_rtx is returned it is >>>>> set to MEM_ALIGN = 8 by set_mem_attributes, and we have an >assertion, >>>>> because the result from expand_expr_real(..., EXPAND_NORMAL) ought >to be >>>>> usable with emit_move_insn. >>>> >>>> yes, as said the _access_ determines the address should be aligned >>>> so we shouldn't end up setting MEM_ALIGN to 8 but to 32 according >>>> to the access type/mode. But we can't trust DECL_ALIGN of >>>> FUNCTION_DECLs but we _can_ trust users writing *(int *)fn >>>> (maybe for actual accesses we _can_ trust DECL_ALIGN, it's just >>>> we may not compute nonzero bits for the actual address because >>>> of function pointer mangling) >>>> (for accessing function code I'd say this would be premature >>>> optimization, but ...) >>>> >>> >>> Not a very nice solution, but it is not worth to spend much effort >>> in optimizing undefined behavior, I just want to avoid the ICE >>> at this time and would not trust the DECL_ALIGN either. >> >> So I meant >> >> Index: gcc/builtins.c >> =================================================================== >> --- gcc/builtins.c (revision 274534) >> +++ gcc/builtins.c (working copy) >> @@ -255,7 +255,8 @@ get_object_alignment_2 (tree exp, unsign >> >> /* Extract alignment information from the innermost object and >> possibly adjust bitpos and offset. */ >> - if (TREE_CODE (exp) == FUNCTION_DECL) >> + if (TREE_CODE (exp) == FUNCTION_DECL >> + && addr_p) >> { >> /* Function addresses can encode extra information besides >their >> alignment. However, if TARGET_PTRMEMFUNC_VBIT_LOCATION >> >> so we get at DECL_ALIGN of the FUNCTION_DECL (not sure if we >> can trust it). >> >>>> >>>> Still I think you can't simply override STACK_SLOT_ALIGNMENT just >because >>>> of the mode of an entry param, can you? If you can assume a bigger >>>> alignment then STACK_SLOT_ALIGNMENT should return it. >>>> >>> >>> I don't see a real problem here. All target except i386 and gcn >(whatever that is) >>> use the default for STACK_SLOT_ALIGNMENT which simply allows any >(large) align value >>> to rule the effective STACK_SLOT_ALIGNMENT. The user could have >simply declared >>> the local variable with the alignment that results in better code >FWIW. >>> >>> If the stack alignment is too high that is capped in >assign_stack_local: >>> >>> /* Ignore alignment if it exceeds MAX_SUPPORTED_STACK_ALIGNMENT. >*/ >>> if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT) >>> { >>> alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT; >>> alignment = MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT; >>> } >>> >>> I for one, would just assume that MAX_SUPPORTED_STACK_ALIGNMENT >should >>> be sufficient for all modes that need movmisalign_optab and friends. >>> If it is not, an ICE would be just fine. >> >> Hmm. In some way we could better communicate with the user then >> and do not allow under-aligning automatic vars? But the you >> still have packed structs with BLKmode where the actual field >> accesses will carry SImode even when not aligned(?) >> > >Yes, that works also when unaligned. > >> >> Please split it into the parts for the PR and parts making the >> asserts not trigger. >> > >Yes, will do. > >> The PR is already fixed, right? The assign_parm_find_stack_rtl hunk >> is merely an optimization? >> > >Hmmmm... You are right, I should have added that to the commit >message... > >Of course the test cases try to verify the optimization. > > >Thanks >Bernd.
2019-08-05 Bernd Edlinger <bernd.edlinger@hotmail.de> PR middle-end/89544 * builtins.c (expand_builtin_init_descriptor): Set memory alignment. * expr.c (expand_expr_real_1): Handle FUNCTION_DECL as unaligned. * function.c (assign_parm_find_stack_rtl): Use larger alignment when possible. (assign_parm_setup_reg): Handle misaligned stack arguments. (assign_parm_setup_stack): Allocate properly aligned stack slots. * varasm.c (align_variable): Align constants of misaligned types. * config/arm/arm.md (movdi, movsi, movhi, movhf, movsf, movdf): Check strict alignment restrictions on memory addresses. * config/arm/neon.md (movti, mov<VSTRUCT>, mov<VH>): Likewise. * config/arm/vec-common.md (mov<VALL>): Likewise. testsuite: 2019-08-05 Bernd Edlinger <bernd.edlinger@hotmail.de> PR middle-end/89544 * gcc.target/arm/unaligned-argument-1.c: New test. * gcc.target/arm/unaligned-argument-2.c: New test. Index: gcc/builtins.c =================================================================== --- gcc/builtins.c (revision 274168) +++ gcc/builtins.c (working copy) @@ -5756,6 +5756,7 @@ expand_builtin_init_descriptor (tree exp) r_descr = expand_normal (t_descr); m_descr = gen_rtx_MEM (BLKmode, r_descr); MEM_NOTRAP_P (m_descr) = 1; + set_mem_align (m_descr, GET_MODE_ALIGNMENT (ptr_mode)); r_func = expand_normal (t_func); r_chain = expand_normal (t_chain); Index: gcc/config/arm/arm.md =================================================================== --- gcc/config/arm/arm.md (revision 274168) +++ gcc/config/arm/arm.md (working copy) @@ -5824,6 +5824,12 @@ (match_operand:DI 1 "general_operand"))] "TARGET_EITHER" " + gcc_checking_assert (!MEM_P (operands[0]) + || MEM_ALIGN (operands[0]) + >= GET_MODE_ALIGNMENT (DImode)); + gcc_checking_assert (!MEM_P (operands[1]) + || MEM_ALIGN (operands[1]) + >= GET_MODE_ALIGNMENT (DImode)); if (can_create_pseudo_p ()) { if (!REG_P (operands[0])) @@ -6000,6 +6006,12 @@ { rtx base, offset, tmp; + gcc_checking_assert (!MEM_P (operands[0]) + || MEM_ALIGN (operands[0]) + >= GET_MODE_ALIGNMENT (SImode)); + gcc_checking_assert (!MEM_P (operands[1]) + || MEM_ALIGN (operands[1]) + >= GET_MODE_ALIGNMENT (SImode)); if (TARGET_32BIT || TARGET_HAVE_MOVT) { /* Everything except mem = const or mem = mem can be done easily. */ @@ -6489,6 +6501,12 @@ (match_operand:HI 1 "general_operand"))] "TARGET_EITHER" " + gcc_checking_assert (!MEM_P (operands[0]) + || MEM_ALIGN (operands[0]) + >= GET_MODE_ALIGNMENT (HImode)); + gcc_checking_assert (!MEM_P (operands[1]) + || MEM_ALIGN (operands[1]) + >= GET_MODE_ALIGNMENT (HImode)); if (TARGET_ARM) { if (can_create_pseudo_p ()) @@ -6898,6 +6916,12 @@ (match_operand:HF 1 "general_operand"))] "TARGET_EITHER" " + gcc_checking_assert (!MEM_P (operands[0]) + || MEM_ALIGN (operands[0]) + >= GET_MODE_ALIGNMENT (HFmode)); + gcc_checking_assert (!MEM_P (operands[1]) + || MEM_ALIGN (operands[1]) + >= GET_MODE_ALIGNMENT (HFmode)); if (TARGET_32BIT) { if (MEM_P (operands[0])) @@ -6962,6 +6986,12 @@ (match_operand:SF 1 "general_operand"))] "TARGET_EITHER" " + gcc_checking_assert (!MEM_P (operands[0]) + || MEM_ALIGN (operands[0]) + >= GET_MODE_ALIGNMENT (SFmode)); + gcc_checking_assert (!MEM_P (operands[1]) + || MEM_ALIGN (operands[1]) + >= GET_MODE_ALIGNMENT (SFmode)); if (TARGET_32BIT) { if (MEM_P (operands[0])) @@ -7057,6 +7087,12 @@ (match_operand:DF 1 "general_operand"))] "TARGET_EITHER" " + gcc_checking_assert (!MEM_P (operands[0]) + || MEM_ALIGN (operands[0]) + >= GET_MODE_ALIGNMENT (DFmode)); + gcc_checking_assert (!MEM_P (operands[1]) + || MEM_ALIGN (operands[1]) + >= GET_MODE_ALIGNMENT (DFmode)); if (TARGET_32BIT) { if (MEM_P (operands[0])) Index: gcc/config/arm/neon.md =================================================================== --- gcc/config/arm/neon.md (revision 274168) +++ gcc/config/arm/neon.md (working copy) @@ -127,6 +127,12 @@ (match_operand:TI 1 "general_operand"))] "TARGET_NEON" { + gcc_checking_assert (!MEM_P (operands[0]) + || MEM_ALIGN (operands[0]) + >= GET_MODE_ALIGNMENT (TImode)); + gcc_checking_assert (!MEM_P (operands[1]) + || MEM_ALIGN (operands[1]) + >= GET_MODE_ALIGNMENT (TImode)); if (can_create_pseudo_p ()) { if (!REG_P (operands[0])) @@ -139,6 +145,12 @@ (match_operand:VSTRUCT 1 "general_operand"))] "TARGET_NEON" { + gcc_checking_assert (!MEM_P (operands[0]) + || MEM_ALIGN (operands[0]) + >= GET_MODE_ALIGNMENT (<MODE>mode)); + gcc_checking_assert (!MEM_P (operands[1]) + || MEM_ALIGN (operands[1]) + >= GET_MODE_ALIGNMENT (<MODE>mode)); if (can_create_pseudo_p ()) { if (!REG_P (operands[0])) @@ -151,6 +163,12 @@ (match_operand:VH 1 "s_register_operand"))] "TARGET_NEON" { + gcc_checking_assert (!MEM_P (operands[0]) + || MEM_ALIGN (operands[0]) + >= GET_MODE_ALIGNMENT (<MODE>mode)); + gcc_checking_assert (!MEM_P (operands[1]) + || MEM_ALIGN (operands[1]) + >= GET_MODE_ALIGNMENT (<MODE>mode)); if (can_create_pseudo_p ()) { if (!REG_P (operands[0])) Index: gcc/config/arm/vec-common.md =================================================================== --- gcc/config/arm/vec-common.md (revision 274168) +++ gcc/config/arm/vec-common.md (working copy) @@ -26,6 +26,12 @@ "TARGET_NEON || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))" { + gcc_checking_assert (!MEM_P (operands[0]) + || MEM_ALIGN (operands[0]) + >= GET_MODE_ALIGNMENT (<MODE>mode)); + gcc_checking_assert (!MEM_P (operands[1]) + || MEM_ALIGN (operands[1]) + >= GET_MODE_ALIGNMENT (<MODE>mode)); if (can_create_pseudo_p ()) { if (!REG_P (operands[0])) Index: gcc/expr.c =================================================================== --- gcc/expr.c (revision 274168) +++ gcc/expr.c (working copy) @@ -10796,6 +10796,14 @@ expand_expr_real_1 (tree exp, rtx target, machine_ MEM_VOLATILE_P (op0) = 1; } + if (MEM_P (op0) && TREE_CODE (tem) == FUNCTION_DECL) + { + if (op0 == orig_op0) + op0 = copy_rtx (op0); + + set_mem_align (op0, BITS_PER_UNIT); + } + /* In cases where an aligned union has an unaligned object as a field, we might be extracting a BLKmode value from an integer-mode (e.g., SImode) object. Handle this case Index: gcc/function.c =================================================================== --- gcc/function.c (revision 274168) +++ gcc/function.c (working copy) @@ -2697,8 +2697,23 @@ assign_parm_find_stack_rtl (tree parm, struct assi intentionally forcing upward padding. Otherwise we have to come up with a guess at the alignment based on OFFSET_RTX. */ poly_int64 offset; - if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm) + if (data->locate.where_pad == PAD_NONE || data->entry_parm) align = boundary; + else if (data->locate.where_pad == PAD_UPWARD) + { + align = boundary; + /* If the argument offset is actually more aligned than the nominal + stack slot boundary, take advantage of that excess alignment. + Don't make any assumptions if STACK_POINTER_OFFSET is in use. */ + if (poly_int_rtx_p (offset_rtx, &offset) + && STACK_POINTER_OFFSET == 0) + { + unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT; + if (offset_align == 0 || offset_align > STACK_BOUNDARY) + offset_align = STACK_BOUNDARY; + align = MAX (align, offset_align); + } + } else if (poly_int_rtx_p (offset_rtx, &offset)) { align = least_bit_hwi (boundary); @@ -3127,6 +3142,7 @@ assign_parm_setup_reg (struct assign_parm_data_all int unsignedp = TYPE_UNSIGNED (TREE_TYPE (parm)); bool did_conversion = false; bool need_conversion, moved; + enum insn_code icode; rtx rtl; /* Store the parm in a pseudoregister during the function, but we may @@ -3188,7 +3204,6 @@ assign_parm_setup_reg (struct assign_parm_data_all conversion. We verify that this insn does not clobber any hard registers. */ - enum insn_code icode; rtx op0, op1; icode = can_extend_p (promoted_nominal_mode, data->passed_mode, @@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all did_conversion = true; } + else if (MEM_P (data->entry_parm) + && GET_MODE_ALIGNMENT (promoted_nominal_mode) + > MEM_ALIGN (data->entry_parm) + && (((icode = optab_handler (movmisalign_optab, + promoted_nominal_mode)) + != CODE_FOR_nothing) + || targetm.slow_unaligned_access (promoted_nominal_mode, + MEM_ALIGN (data->entry_parm)))) + { + if (icode != CODE_FOR_nothing) + emit_insn (GEN_FCN (icode) (parmreg, validated_mem)); + else + rtl = parmreg = extract_bit_field (validated_mem, + GET_MODE_BITSIZE (promoted_nominal_mode), 0, + unsignedp, parmreg, + promoted_nominal_mode, VOIDmode, false, NULL); + } else emit_move_insn (parmreg, validated_mem); @@ -3449,11 +3481,17 @@ assign_parm_setup_stack (struct assign_parm_data_a int align = STACK_SLOT_ALIGNMENT (data->passed_type, GET_MODE (data->entry_parm), TYPE_ALIGN (data->passed_type)); + if (align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)) + && targetm.slow_unaligned_access (GET_MODE (data->entry_parm), + align)) + align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)); data->stack_parm = assign_stack_local (GET_MODE (data->entry_parm), GET_MODE_SIZE (GET_MODE (data->entry_parm)), align); + align = MEM_ALIGN (data->stack_parm); set_mem_attributes (data->stack_parm, parm, 1); + set_mem_align (data->stack_parm, align); } dest = validize_mem (copy_rtx (data->stack_parm)); Index: gcc/testsuite/gcc.target/arm/unaligned-argument-1.c =================================================================== --- gcc/testsuite/gcc.target/arm/unaligned-argument-1.c (revision 0) +++ gcc/testsuite/gcc.target/arm/unaligned-argument-1.c (working copy) @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_arm_ok } */ +/* { dg-require-effective-target arm_ldrd_strd_ok } */ +/* { dg-options "-marm -mno-unaligned-access -O3" } */ + +struct s { + int a, b; +} __attribute__((aligned(8))); + +struct s f0; + +void f(int a, int b, int c, int d, struct s f) +{ + f0 = f; +} + +/* { dg-final { scan-assembler-times "ldrd" 1 } } */ +/* { dg-final { scan-assembler-times "strd" 1 } } */ +/* { dg-final { scan-assembler-times "stm" 0 } } */ Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c =================================================================== --- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c (revision 0) +++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c (working copy) @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_arm_ok } */ +/* { dg-require-effective-target arm_ldrd_strd_ok } */ +/* { dg-options "-marm -mno-unaligned-access -O3" } */ + +struct s { + int a, b; +} __attribute__((aligned(8))); + +struct s f0; + +void f(int a, int b, int c, int d, int e, struct s f) +{ + f0 = f; +} + +/* { dg-final { scan-assembler-times "ldrd" 0 } } */ +/* { dg-final { scan-assembler-times "strd" 0 } } */ +/* { dg-final { scan-assembler-times "stm" 1 } } */ Index: gcc/varasm.c =================================================================== --- gcc/varasm.c (revision 274168) +++ gcc/varasm.c (working copy) @@ -1085,6 +1085,10 @@ align_variable (tree decl, bool dont_output_data) } } + if (align < GET_MODE_ALIGNMENT (DECL_MODE (decl)) + && targetm.slow_unaligned_access (DECL_MODE (decl), align)) + align = GET_MODE_ALIGNMENT (DECL_MODE (decl)); + /* Reset the alignment in case we have made it tighter, so we can benefit from it in get_pointer_alignment. */ SET_DECL_ALIGN (decl, align);