Patchwork 4.4/4.5 PATCH: PR middle-end/45678: [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

login
register
mail settings
Submitter Jakub Jelinek
Date Sept. 21, 2010, 9:49 a.m.
Message ID <20100921094902.GQ1269@tyan-ft48-01.lab.bos.redhat.com>
Download mbox | patch
Permalink /patch/65289/
State New
Headers show

Comments

Jakub Jelinek - Sept. 21, 2010, 9:49 a.m.
On Mon, Sep 20, 2010 at 04:45:07PM -0700, H.J. Lu wrote:
> Here is the patch for 4.4/4.5. fold_builtin_memory_op is very different
> in 4.4/4.5.  Simple backport doesn't work.  Does this patch make
> any senses?

Your builtins.c part doesn't make sense.  The code already handles
the alignment, see the
	  srctype = build_qualified_type (desttype, 0);
          if (src_align < (int) TYPE_ALIGN (srctype))                                                                                              
            {                                                                                                                                      
              if (AGGREGATE_TYPE_P (srctype)                                                                                                       
                  || SLOW_UNALIGNED_ACCESS (TYPE_MODE (srctype), src_align))                                                                       
                return NULL_TREE;                                                                                                                  
                                                                                                                                                   
              srctype = build_variant_type_copy (srctype);                                                                                         
              TYPE_ALIGN (srctype) = src_align;                                                                                                    
              TYPE_USER_ALIGN (srctype) = 1;                                                                                                       
              TYPE_PACKED (srctype) = 1;                                                                                                           
            }                                                                                                                                      
hunk and similar hunk for dsttype.  The difference is just that
4.6 creates a valid unaligned MEM_REF while 4.5 creates a valid unaligned
VCE.  As you can see above, in 4.5 and earlier it was using a packed type
properly no matter whether target was STRICT_ALIGNMENT or not.
The problem is during expansion that the VCE isn't expanded properly.

So, IMHO we need something like this (so far untested; 4.5 version):

2010-09-21  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/45678
	* expr.c (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: If
	op0 isn't sufficiently aligned and there is movmisalignM
	insn for mode, use it to load op0 into a temporary register.

	Backport from mainline
	2010-09-20  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/45678
	* cfgexpand.c (expand_one_stack_var_at): Limit alignment to
	crtl->max_used_stack_slot_alignment.

2010-09-21  Jakub Jelinek  <jakub@redhat.com>

	Backport from mainline
	2010-09-17  Richard Guenther  <rguenther@suse.de>
		    H.J. Lu  <hongjiu.lu@intel.com>

	PR middle-end/45678
	* gcc.dg/torture/pr45678-1.c: New.
	* gcc.dg/torture/pr45678-2.c: Likewise.


	Jakub
H.J. Lu - Sept. 21, 2010, 1:05 p.m.
On Tue, Sep 21, 2010 at 2:49 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Sep 20, 2010 at 04:45:07PM -0700, H.J. Lu wrote:
>> Here is the patch for 4.4/4.5. fold_builtin_memory_op is very different
>> in 4.4/4.5.  Simple backport doesn't work.  Does this patch make
>> any senses?
>
> Your builtins.c part doesn't make sense.  The code already handles
> the alignment, see the
>          srctype = build_qualified_type (desttype, 0);
>          if (src_align < (int) TYPE_ALIGN (srctype))
>            {
>              if (AGGREGATE_TYPE_P (srctype)
>                  || SLOW_UNALIGNED_ACCESS (TYPE_MODE (srctype), src_align))
>                return NULL_TREE;
>
>              srctype = build_variant_type_copy (srctype);
>              TYPE_ALIGN (srctype) = src_align;
>              TYPE_USER_ALIGN (srctype) = 1;
>              TYPE_PACKED (srctype) = 1;
>            }
> hunk and similar hunk for dsttype.  The difference is just that
> 4.6 creates a valid unaligned MEM_REF while 4.5 creates a valid unaligned
> VCE.  As you can see above, in 4.5 and earlier it was using a packed type
> properly no matter whether target was STRICT_ALIGNMENT or not.
> The problem is during expansion that the VCE isn't expanded properly.
>
> So, IMHO we need something like this (so far untested; 4.5 version):
>
> 2010-09-21  Jakub Jelinek  <jakub@redhat.com>
>
>        PR middle-end/45678
>        * expr.c (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: If
>        op0 isn't sufficiently aligned and there is movmisalignM
>        insn for mode, use it to load op0 into a temporary register.
>
>        Backport from mainline
>        2010-09-20  Jakub Jelinek  <jakub@redhat.com>
>
>        PR middle-end/45678
>        * cfgexpand.c (expand_one_stack_var_at): Limit alignment to
>        crtl->max_used_stack_slot_alignment.
>

With your patch, gcc 4.5 generates:

	xorps	%xmm0, %xmm0
	movlps	(%esp), %xmm0
	movhps	8(%esp), %xmm0

on gcc.dg/torture/pr45678-2.c. Where does xorps come from?
Richard Guenther - Sept. 21, 2010, 1:09 p.m.
On Tue, Sep 21, 2010 at 3:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Sep 21, 2010 at 2:49 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Mon, Sep 20, 2010 at 04:45:07PM -0700, H.J. Lu wrote:
>>> Here is the patch for 4.4/4.5. fold_builtin_memory_op is very different
>>> in 4.4/4.5.  Simple backport doesn't work.  Does this patch make
>>> any senses?
>>
>> Your builtins.c part doesn't make sense.  The code already handles
>> the alignment, see the
>>          srctype = build_qualified_type (desttype, 0);
>>          if (src_align < (int) TYPE_ALIGN (srctype))
>>            {
>>              if (AGGREGATE_TYPE_P (srctype)
>>                  || SLOW_UNALIGNED_ACCESS (TYPE_MODE (srctype), src_align))
>>                return NULL_TREE;
>>
>>              srctype = build_variant_type_copy (srctype);
>>              TYPE_ALIGN (srctype) = src_align;
>>              TYPE_USER_ALIGN (srctype) = 1;
>>              TYPE_PACKED (srctype) = 1;
>>            }
>> hunk and similar hunk for dsttype.  The difference is just that
>> 4.6 creates a valid unaligned MEM_REF while 4.5 creates a valid unaligned
>> VCE.  As you can see above, in 4.5 and earlier it was using a packed type
>> properly no matter whether target was STRICT_ALIGNMENT or not.
>> The problem is during expansion that the VCE isn't expanded properly.
>>
>> So, IMHO we need something like this (so far untested; 4.5 version):
>>
>> 2010-09-21  Jakub Jelinek  <jakub@redhat.com>
>>
>>        PR middle-end/45678
>>        * expr.c (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: If
>>        op0 isn't sufficiently aligned and there is movmisalignM
>>        insn for mode, use it to load op0 into a temporary register.
>>
>>        Backport from mainline
>>        2010-09-20  Jakub Jelinek  <jakub@redhat.com>
>>
>>        PR middle-end/45678
>>        * cfgexpand.c (expand_one_stack_var_at): Limit alignment to
>>        crtl->max_used_stack_slot_alignment.
>>
>
> With your patch, gcc 4.5 generates:
>
>        xorps   %xmm0, %xmm0
>        movlps  (%esp), %xmm0
>        movhps  8(%esp), %xmm0
>
> on gcc.dg/torture/pr45678-2.c. Where does xorps come from?

It prevents a reformatting penalty I presume.

Richard.

>
>
> --
> H.J.
>
H.J. Lu - Sept. 21, 2010, 1:13 p.m.
On Tue, Sep 21, 2010 at 6:09 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Sep 21, 2010 at 3:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Sep 21, 2010 at 2:49 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Mon, Sep 20, 2010 at 04:45:07PM -0700, H.J. Lu wrote:
>>>> Here is the patch for 4.4/4.5. fold_builtin_memory_op is very different
>>>> in 4.4/4.5.  Simple backport doesn't work.  Does this patch make
>>>> any senses?
>>>
>>> Your builtins.c part doesn't make sense.  The code already handles
>>> the alignment, see the
>>>          srctype = build_qualified_type (desttype, 0);
>>>          if (src_align < (int) TYPE_ALIGN (srctype))
>>>            {
>>>              if (AGGREGATE_TYPE_P (srctype)
>>>                  || SLOW_UNALIGNED_ACCESS (TYPE_MODE (srctype), src_align))
>>>                return NULL_TREE;
>>>
>>>              srctype = build_variant_type_copy (srctype);
>>>              TYPE_ALIGN (srctype) = src_align;
>>>              TYPE_USER_ALIGN (srctype) = 1;
>>>              TYPE_PACKED (srctype) = 1;
>>>            }
>>> hunk and similar hunk for dsttype.  The difference is just that
>>> 4.6 creates a valid unaligned MEM_REF while 4.5 creates a valid unaligned
>>> VCE.  As you can see above, in 4.5 and earlier it was using a packed type
>>> properly no matter whether target was STRICT_ALIGNMENT or not.
>>> The problem is during expansion that the VCE isn't expanded properly.
>>>
>>> So, IMHO we need something like this (so far untested; 4.5 version):
>>>
>>> 2010-09-21  Jakub Jelinek  <jakub@redhat.com>
>>>
>>>        PR middle-end/45678
>>>        * expr.c (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: If
>>>        op0 isn't sufficiently aligned and there is movmisalignM
>>>        insn for mode, use it to load op0 into a temporary register.
>>>
>>>        Backport from mainline
>>>        2010-09-20  Jakub Jelinek  <jakub@redhat.com>
>>>
>>>        PR middle-end/45678
>>>        * cfgexpand.c (expand_one_stack_var_at): Limit alignment to
>>>        crtl->max_used_stack_slot_alignment.
>>>
>>
>> With your patch, gcc 4.5 generates:
>>
>>        xorps   %xmm0, %xmm0
>>        movlps  (%esp), %xmm0
>>        movhps  8(%esp), %xmm0
>>
>> on gcc.dg/torture/pr45678-2.c. Where does xorps come from?
>
> It prevents a reformatting penalty I presume.
>
>

Partial SSE register stall.

Patch

--- gcc/expr.c.jj	2010-09-20 22:42:42.000000000 +0200
+++ gcc/expr.c	2010-09-21 11:31:26.286778101 +0200
@@ -9387,10 +9387,32 @@  expand_expr_real_1 (tree exp, rtx target
 	 results.  */
       if (MEM_P (op0))
 	{
+	  enum insn_code icode;
 	  op0 = copy_rtx (op0);
 
 	  if (TYPE_ALIGN_OK (type))
 	    set_mem_align (op0, MAX (MEM_ALIGN (op0), TYPE_ALIGN (type)));
+	  else if (mode != BLKmode
+		   && MEM_ALIGN (op0) < GET_MODE_ALIGNMENT (mode)
+		   /* If the target does have special handling for unaligned
+		      loads of mode then use them.  */
+		   && ((icode = optab_handler (movmisalign_optab,
+					       mode)->insn_code)
+		       != CODE_FOR_nothing))
+	      {
+		rtx reg, insn;
+
+		op0 = adjust_address (op0, mode, 0);
+		/* We've already validated the memory, and we're creating a
+		   new pseudo destination.  The predicates really can't
+		   fail.  */
+		reg = gen_reg_rtx (mode);
+
+		/* Nor can the insn generator.  */
+		insn = GEN_FCN (icode) (reg, op0);
+		emit_insn (insn);
+		return reg;
+	      }
 	  else if (STRICT_ALIGNMENT
 		   && mode != BLKmode
 		   && MEM_ALIGN (op0) < GET_MODE_ALIGNMENT (mode))
--- gcc/cfgexpand.c.jj	2010-06-11 11:06:01.000000000 +0200
+++ gcc/cfgexpand.c	2010-09-21 11:36:58.331377579 +0200
@@ -705,7 +705,7 @@  static void
 expand_one_stack_var_at (tree decl, HOST_WIDE_INT offset)
 {
   /* Alignment is unsigned.   */
-  unsigned HOST_WIDE_INT align;
+  unsigned HOST_WIDE_INT align, max_align;
   rtx x;
 
   /* If this fails, we've overflowed the stack frame.  Error nicely?  */
@@ -722,10 +722,9 @@  expand_one_stack_var_at (tree decl, HOST
       offset -= frame_phase;
       align = offset & -offset;
       align *= BITS_PER_UNIT;
-      if (align == 0)
-	align = STACK_BOUNDARY;
-      else if (align > MAX_SUPPORTED_STACK_ALIGNMENT)
-	align = MAX_SUPPORTED_STACK_ALIGNMENT;
+      max_align = crtl->max_used_stack_slot_alignment;
+      if (align == 0 || align > max_align)
+	align = max_align;
 
       DECL_ALIGN (decl) = align;
       DECL_USER_ALIGN (decl) = 0;
--- gcc/testsuite/gcc.dg/torture/pr45678-1.c.jj	2010-09-21 11:37:37.744364834 +0200
+++ gcc/testsuite/gcc.dg/torture/pr45678-1.c	2010-09-17 16:40:41.000000000 +0200
@@ -0,0 +1,16 @@ 
+/* { dg-do run } */
+
+typedef float V __attribute__ ((vector_size (16)));
+V g;
+float d[4] = { 4, 3, 2, 1 };
+
+int
+main ()
+{
+  V e;
+  __builtin_memcpy (&e, &d, sizeof (d));
+  V f = { 5, 15, 25, 35 };
+  e = e * f;
+  g = e;
+  return 0;
+}
--- gcc/testsuite/gcc.dg/torture/pr45678-2.c.jj	2010-09-21 11:37:41.167614502 +0200
+++ gcc/testsuite/gcc.dg/torture/pr45678-2.c	2010-09-18 19:50:55.000000000 +0200
@@ -0,0 +1,16 @@ 
+/* { dg-do run } */
+
+typedef float V __attribute__ ((vector_size (16)));
+V g;
+
+int
+main ()
+{
+  float d[4] = { 4, 3, 2, 1 };
+  V e;
+  __builtin_memcpy (&e, &d, sizeof (d));
+  V f = { 5, 15, 25, 35 };
+  e = e * f;
+  g = e;
+  return 0;
+}