Message ID | 20100921094902.GQ1269@tyan-ft48-01.lab.bos.redhat.com |
---|---|
State | New |
Headers | show |
On Tue, Sep 21, 2010 at 2:49 AM, Jakub Jelinek <jakub@redhat.com> wrote: > On Mon, Sep 20, 2010 at 04:45:07PM -0700, H.J. Lu wrote: >> Here is the patch for 4.4/4.5. fold_builtin_memory_op is very different >> in 4.4/4.5. Simple backport doesn't work. Does this patch make >> any senses? > > Your builtins.c part doesn't make sense. The code already handles > the alignment, see the > srctype = build_qualified_type (desttype, 0); > if (src_align < (int) TYPE_ALIGN (srctype)) > { > if (AGGREGATE_TYPE_P (srctype) > || SLOW_UNALIGNED_ACCESS (TYPE_MODE (srctype), src_align)) > return NULL_TREE; > > srctype = build_variant_type_copy (srctype); > TYPE_ALIGN (srctype) = src_align; > TYPE_USER_ALIGN (srctype) = 1; > TYPE_PACKED (srctype) = 1; > } > hunk and similar hunk for dsttype. The difference is just that > 4.6 creates a valid unaligned MEM_REF while 4.5 creates a valid unaligned > VCE. As you can see above, in 4.5 and earlier it was using a packed type > properly no matter whether target was STRICT_ALIGNMENT or not. > The problem is during expansion that the VCE isn't expanded properly. > > So, IMHO we need something like this (so far untested; 4.5 version): > > 2010-09-21 Jakub Jelinek <jakub@redhat.com> > > PR middle-end/45678 > * expr.c (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: If > op0 isn't sufficiently aligned and there is movmisalignM > insn for mode, use it to load op0 into a temporary register. > > Backport from mainline > 2010-09-20 Jakub Jelinek <jakub@redhat.com> > > PR middle-end/45678 > * cfgexpand.c (expand_one_stack_var_at): Limit alignment to > crtl->max_used_stack_slot_alignment. > With your patch, gcc 4.5 generates: xorps %xmm0, %xmm0 movlps (%esp), %xmm0 movhps 8(%esp), %xmm0 on gcc.dg/torture/pr45678-2.c. Where does xorps come from?
On Tue, Sep 21, 2010 at 3:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Tue, Sep 21, 2010 at 2:49 AM, Jakub Jelinek <jakub@redhat.com> wrote: >> On Mon, Sep 20, 2010 at 04:45:07PM -0700, H.J. Lu wrote: >>> Here is the patch for 4.4/4.5. fold_builtin_memory_op is very different >>> in 4.4/4.5. Simple backport doesn't work. Does this patch make >>> any senses? >> >> Your builtins.c part doesn't make sense. The code already handles >> the alignment, see the >> srctype = build_qualified_type (desttype, 0); >> if (src_align < (int) TYPE_ALIGN (srctype)) >> { >> if (AGGREGATE_TYPE_P (srctype) >> || SLOW_UNALIGNED_ACCESS (TYPE_MODE (srctype), src_align)) >> return NULL_TREE; >> >> srctype = build_variant_type_copy (srctype); >> TYPE_ALIGN (srctype) = src_align; >> TYPE_USER_ALIGN (srctype) = 1; >> TYPE_PACKED (srctype) = 1; >> } >> hunk and similar hunk for dsttype. The difference is just that >> 4.6 creates a valid unaligned MEM_REF while 4.5 creates a valid unaligned >> VCE. As you can see above, in 4.5 and earlier it was using a packed type >> properly no matter whether target was STRICT_ALIGNMENT or not. >> The problem is during expansion that the VCE isn't expanded properly. >> >> So, IMHO we need something like this (so far untested; 4.5 version): >> >> 2010-09-21 Jakub Jelinek <jakub@redhat.com> >> >> PR middle-end/45678 >> * expr.c (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: If >> op0 isn't sufficiently aligned and there is movmisalignM >> insn for mode, use it to load op0 into a temporary register. >> >> Backport from mainline >> 2010-09-20 Jakub Jelinek <jakub@redhat.com> >> >> PR middle-end/45678 >> * cfgexpand.c (expand_one_stack_var_at): Limit alignment to >> crtl->max_used_stack_slot_alignment. >> > > With your patch, gcc 4.5 generates: > > xorps %xmm0, %xmm0 > movlps (%esp), %xmm0 > movhps 8(%esp), %xmm0 > > on gcc.dg/torture/pr45678-2.c. Where does xorps come from? It prevents a reformatting penalty I presume. Richard. > > > -- > H.J. >
On Tue, Sep 21, 2010 at 6:09 AM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Tue, Sep 21, 2010 at 3:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Tue, Sep 21, 2010 at 2:49 AM, Jakub Jelinek <jakub@redhat.com> wrote: >>> On Mon, Sep 20, 2010 at 04:45:07PM -0700, H.J. Lu wrote: >>>> Here is the patch for 4.4/4.5. fold_builtin_memory_op is very different >>>> in 4.4/4.5. Simple backport doesn't work. Does this patch make >>>> any senses? >>> >>> Your builtins.c part doesn't make sense. The code already handles >>> the alignment, see the >>> srctype = build_qualified_type (desttype, 0); >>> if (src_align < (int) TYPE_ALIGN (srctype)) >>> { >>> if (AGGREGATE_TYPE_P (srctype) >>> || SLOW_UNALIGNED_ACCESS (TYPE_MODE (srctype), src_align)) >>> return NULL_TREE; >>> >>> srctype = build_variant_type_copy (srctype); >>> TYPE_ALIGN (srctype) = src_align; >>> TYPE_USER_ALIGN (srctype) = 1; >>> TYPE_PACKED (srctype) = 1; >>> } >>> hunk and similar hunk for dsttype. The difference is just that >>> 4.6 creates a valid unaligned MEM_REF while 4.5 creates a valid unaligned >>> VCE. As you can see above, in 4.5 and earlier it was using a packed type >>> properly no matter whether target was STRICT_ALIGNMENT or not. >>> The problem is during expansion that the VCE isn't expanded properly. >>> >>> So, IMHO we need something like this (so far untested; 4.5 version): >>> >>> 2010-09-21 Jakub Jelinek <jakub@redhat.com> >>> >>> PR middle-end/45678 >>> * expr.c (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: If >>> op0 isn't sufficiently aligned and there is movmisalignM >>> insn for mode, use it to load op0 into a temporary register. >>> >>> Backport from mainline >>> 2010-09-20 Jakub Jelinek <jakub@redhat.com> >>> >>> PR middle-end/45678 >>> * cfgexpand.c (expand_one_stack_var_at): Limit alignment to >>> crtl->max_used_stack_slot_alignment. >>> >> >> With your patch, gcc 4.5 generates: >> >> xorps %xmm0, %xmm0 >> movlps (%esp), %xmm0 >> movhps 8(%esp), %xmm0 >> >> on gcc.dg/torture/pr45678-2.c. Where does xorps come from? > > It prevents a reformatting penalty I presume. > > Partial SSE register stall.
--- gcc/expr.c.jj 2010-09-20 22:42:42.000000000 +0200 +++ gcc/expr.c 2010-09-21 11:31:26.286778101 +0200 @@ -9387,10 +9387,32 @@ expand_expr_real_1 (tree exp, rtx target results. */ if (MEM_P (op0)) { + enum insn_code icode; op0 = copy_rtx (op0); if (TYPE_ALIGN_OK (type)) set_mem_align (op0, MAX (MEM_ALIGN (op0), TYPE_ALIGN (type))); + else if (mode != BLKmode + && MEM_ALIGN (op0) < GET_MODE_ALIGNMENT (mode) + /* If the target does have special handling for unaligned + loads of mode then use them. */ + && ((icode = optab_handler (movmisalign_optab, + mode)->insn_code) + != CODE_FOR_nothing)) + { + rtx reg, insn; + + op0 = adjust_address (op0, mode, 0); + /* We've already validated the memory, and we're creating a + new pseudo destination. The predicates really can't + fail. */ + reg = gen_reg_rtx (mode); + + /* Nor can the insn generator. */ + insn = GEN_FCN (icode) (reg, op0); + emit_insn (insn); + return reg; + } else if (STRICT_ALIGNMENT && mode != BLKmode && MEM_ALIGN (op0) < GET_MODE_ALIGNMENT (mode)) --- gcc/cfgexpand.c.jj 2010-06-11 11:06:01.000000000 +0200 +++ gcc/cfgexpand.c 2010-09-21 11:36:58.331377579 +0200 @@ -705,7 +705,7 @@ static void expand_one_stack_var_at (tree decl, HOST_WIDE_INT offset) { /* Alignment is unsigned. */ - unsigned HOST_WIDE_INT align; + unsigned HOST_WIDE_INT align, max_align; rtx x; /* If this fails, we've overflowed the stack frame. Error nicely? */ @@ -722,10 +722,9 @@ expand_one_stack_var_at (tree decl, HOST offset -= frame_phase; align = offset & -offset; align *= BITS_PER_UNIT; - if (align == 0) - align = STACK_BOUNDARY; - else if (align > MAX_SUPPORTED_STACK_ALIGNMENT) - align = MAX_SUPPORTED_STACK_ALIGNMENT; + max_align = crtl->max_used_stack_slot_alignment; + if (align == 0 || align > max_align) + align = max_align; DECL_ALIGN (decl) = align; DECL_USER_ALIGN (decl) = 0; --- gcc/testsuite/gcc.dg/torture/pr45678-1.c.jj 2010-09-21 11:37:37.744364834 +0200 +++ gcc/testsuite/gcc.dg/torture/pr45678-1.c 2010-09-17 16:40:41.000000000 +0200 @@ -0,0 +1,16 @@ +/* { dg-do run } */ + +typedef float V __attribute__ ((vector_size (16))); +V g; +float d[4] = { 4, 3, 2, 1 }; + +int +main () +{ + V e; + __builtin_memcpy (&e, &d, sizeof (d)); + V f = { 5, 15, 25, 35 }; + e = e * f; + g = e; + return 0; +} --- gcc/testsuite/gcc.dg/torture/pr45678-2.c.jj 2010-09-21 11:37:41.167614502 +0200 +++ gcc/testsuite/gcc.dg/torture/pr45678-2.c 2010-09-18 19:50:55.000000000 +0200 @@ -0,0 +1,16 @@ +/* { dg-do run } */ + +typedef float V __attribute__ ((vector_size (16))); +V g; + +int +main () +{ + float d[4] = { 4, 3, 2, 1 }; + V e; + __builtin_memcpy (&e, &d, sizeof (d)); + V f = { 5, 15, 25, 35 }; + e = e * f; + g = e; + return 0; +}