From patchwork Tue Sep 21 09:49:02 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: 4.4/4.5 PATCH: PR middle-end/45678: [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse Date: Mon, 20 Sep 2010 23:49:02 -0000 From: Jakub Jelinek X-Patchwork-Id: 65289 Message-Id: <20100921094902.GQ1269@tyan-ft48-01.lab.bos.redhat.com> To: "H.J. Lu" Cc: Richard Henderson , Richard Guenther , gcc-patches@gcc.gnu.org On Mon, Sep 20, 2010 at 04:45:07PM -0700, H.J. Lu wrote: > Here is the patch for 4.4/4.5. fold_builtin_memory_op is very different > in 4.4/4.5. Simple backport doesn't work. Does this patch make > any senses? Your builtins.c part doesn't make sense. The code already handles the alignment, see the srctype = build_qualified_type (desttype, 0); if (src_align < (int) TYPE_ALIGN (srctype)) { if (AGGREGATE_TYPE_P (srctype) || SLOW_UNALIGNED_ACCESS (TYPE_MODE (srctype), src_align)) return NULL_TREE; srctype = build_variant_type_copy (srctype); TYPE_ALIGN (srctype) = src_align; TYPE_USER_ALIGN (srctype) = 1; TYPE_PACKED (srctype) = 1; } hunk and similar hunk for dsttype. The difference is just that 4.6 creates a valid unaligned MEM_REF while 4.5 creates a valid unaligned VCE. As you can see above, in 4.5 and earlier it was using a packed type properly no matter whether target was STRICT_ALIGNMENT or not. The problem is during expansion that the VCE isn't expanded properly. So, IMHO we need something like this (so far untested; 4.5 version): 2010-09-21 Jakub Jelinek PR middle-end/45678 * expr.c (expand_expr_real_1) : If op0 isn't sufficiently aligned and there is movmisalignM insn for mode, use it to load op0 into a temporary register. Backport from mainline 2010-09-20 Jakub Jelinek PR middle-end/45678 * cfgexpand.c (expand_one_stack_var_at): Limit alignment to crtl->max_used_stack_slot_alignment. 2010-09-21 Jakub Jelinek Backport from mainline 2010-09-17 Richard Guenther H.J. Lu PR middle-end/45678 * gcc.dg/torture/pr45678-1.c: New. * gcc.dg/torture/pr45678-2.c: Likewise. Jakub --- gcc/expr.c.jj 2010-09-20 22:42:42.000000000 +0200 +++ gcc/expr.c 2010-09-21 11:31:26.286778101 +0200 @@ -9387,10 +9387,32 @@ expand_expr_real_1 (tree exp, rtx target results. */ if (MEM_P (op0)) { + enum insn_code icode; op0 = copy_rtx (op0); if (TYPE_ALIGN_OK (type)) set_mem_align (op0, MAX (MEM_ALIGN (op0), TYPE_ALIGN (type))); + else if (mode != BLKmode + && MEM_ALIGN (op0) < GET_MODE_ALIGNMENT (mode) + /* If the target does have special handling for unaligned + loads of mode then use them. */ + && ((icode = optab_handler (movmisalign_optab, + mode)->insn_code) + != CODE_FOR_nothing)) + { + rtx reg, insn; + + op0 = adjust_address (op0, mode, 0); + /* We've already validated the memory, and we're creating a + new pseudo destination. The predicates really can't + fail. */ + reg = gen_reg_rtx (mode); + + /* Nor can the insn generator. */ + insn = GEN_FCN (icode) (reg, op0); + emit_insn (insn); + return reg; + } else if (STRICT_ALIGNMENT && mode != BLKmode && MEM_ALIGN (op0) < GET_MODE_ALIGNMENT (mode)) --- gcc/cfgexpand.c.jj 2010-06-11 11:06:01.000000000 +0200 +++ gcc/cfgexpand.c 2010-09-21 11:36:58.331377579 +0200 @@ -705,7 +705,7 @@ static void expand_one_stack_var_at (tree decl, HOST_WIDE_INT offset) { /* Alignment is unsigned. */ - unsigned HOST_WIDE_INT align; + unsigned HOST_WIDE_INT align, max_align; rtx x; /* If this fails, we've overflowed the stack frame. Error nicely? */ @@ -722,10 +722,9 @@ expand_one_stack_var_at (tree decl, HOST offset -= frame_phase; align = offset & -offset; align *= BITS_PER_UNIT; - if (align == 0) - align = STACK_BOUNDARY; - else if (align > MAX_SUPPORTED_STACK_ALIGNMENT) - align = MAX_SUPPORTED_STACK_ALIGNMENT; + max_align = crtl->max_used_stack_slot_alignment; + if (align == 0 || align > max_align) + align = max_align; DECL_ALIGN (decl) = align; DECL_USER_ALIGN (decl) = 0; --- gcc/testsuite/gcc.dg/torture/pr45678-1.c.jj 2010-09-21 11:37:37.744364834 +0200 +++ gcc/testsuite/gcc.dg/torture/pr45678-1.c 2010-09-17 16:40:41.000000000 +0200 @@ -0,0 +1,16 @@ +/* { dg-do run } */ + +typedef float V __attribute__ ((vector_size (16))); +V g; +float d[4] = { 4, 3, 2, 1 }; + +int +main () +{ + V e; + __builtin_memcpy (&e, &d, sizeof (d)); + V f = { 5, 15, 25, 35 }; + e = e * f; + g = e; + return 0; +} --- gcc/testsuite/gcc.dg/torture/pr45678-2.c.jj 2010-09-21 11:37:41.167614502 +0200 +++ gcc/testsuite/gcc.dg/torture/pr45678-2.c 2010-09-18 19:50:55.000000000 +0200 @@ -0,0 +1,16 @@ +/* { dg-do run } */ + +typedef float V __attribute__ ((vector_size (16))); +V g; + +int +main () +{ + float d[4] = { 4, 3, 2, 1 }; + V e; + __builtin_memcpy (&e, &d, sizeof (d)); + V f = { 5, 15, 25, 35 }; + e = e * f; + g = e; + return 0; +}