Patchwork Revision 166517 caused PR 46414

login
register
mail settings
Submitter H.J. Lu
Date Nov. 10, 2010, 5:18 p.m.
Message ID <AANLkTin-DhMVb+HpGsr9pNQaYk3qrFp+BMFw2W+Pv9cr@mail.gmail.com>
Download mbox | patch
Permalink /patch/70653/
State New
Headers show

Comments

H.J. Lu - Nov. 10, 2010, 5:18 p.m.
2010/11/10 Jan Hubicka <hubicka@ucw.cz>:
>> On Wed, Nov 10, 2010 at 7:18 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> The complette unrilling seems sane here.
>> >> This seems like good idea to me and i am not quite sure why 32bit compilation does not do the
>> >> transofrm. I will check now.
>> >
>> > The reason is word size.  We compute move costs based on this, so vectorized moves are more expensive
>> > in 32bit cost model than 64bit.
>> > Perhaps there is way to hook vectorizer cost model in here?
>> >
>>
>> Shouldn't we use vector size to compute move cost for vector move?
>
> That would work for me.  I am not terribly familiar with vector costs here, could you try to patch
> estimate_move_cost in tree-inline.c?
> We are very simplistic here assuming that pretty much all operations have cost of 1, so I guess
> making all vector moves to have cost 1 is fine. The main reason for that function is to catch large
> structure copies.
>

How about this patch?
Jan Hubicka - Nov. 10, 2010, 5:31 p.m.
> 2010/11/10 Jan Hubicka <hubicka@ucw.cz>:
> >> On Wed, Nov 10, 2010 at 7:18 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> >> The complette unrilling seems sane here.
> >> >> This seems like good idea to me and i am not quite sure why 32bit compilation does not do the
> >> >> transofrm. I will check now.
> >> >
> >> > The reason is word size.  We compute move costs based on this, so vectorized moves are more expensive
> >> > in 32bit cost model than 64bit.
> >> > Perhaps there is way to hook vectorizer cost model in here?
> >> >
> >>
> >> Shouldn't we use vector size to compute move cost for vector move?
> >
> > That would work for me.  I am not terribly familiar with vector costs here, could you try to patch
> > estimate_move_cost in tree-inline.c?
> > We are very simplistic here assuming that pretty much all operations have cost of 1, so I guess
> > making all vector moves to have cost 1 is fine. The main reason for that function is to catch large
> > structure copies.
> >
> 
> How about this patch?
> +  if (TREE_CODE (type) == VECTOR_TYPE)
> +    {
> +      enum machine_mode inner = TYPE_MODE (TREE_TYPE (type));
> +      enum machine_mode simd
> +	= targetm.vectorize.preferred_simd_mode (inner);
> +      return GET_MODE_SIZE (TYPE_MODE (type)) / GET_MODE_SIZE (simd);
> +    }

You need to round up, so we don't end up with cost of 0 for small blocks. Other than that it seems fine to me.
We will probably eventually need to also update the expander to use vector instructions when possible to match
the costs, but I guess most of vectors we see in the code are already of proper size.

Honza
> +
>    size = int_size_in_bytes (type);
>  
>    if (size < 0 || size > MOVE_MAX_PIECES * MOVE_RATIO (!optimize_size))

Patch

diff --git a/gcc/testsuite/gcc.target/i386/recip-vec-sqrtf-avx.c b/gcc/testsuite/gcc.target/i386/recip-vec-sqrtf-avx.c
index 506df88..5a8e696 100644
--- a/gcc/testsuite/gcc.target/i386/recip-vec-sqrtf-avx.c
+++ b/gcc/testsuite/gcc.target/i386/recip-vec-sqrtf-avx.c
@@ -32,4 +32,4 @@  void t3(void)
 }
 
 /* Last loop is small enough to be fully unrolled.  */
-/* { dg-final { scan-assembler-times "vrsqrtps\[ \\t\]+\[^\n\]*%ymm" 4 } } */
+/* { dg-final { scan-assembler-times "vrsqrtps\[ \\t\]+\[^\n\]*%ymm" 6 } } */
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index fc470a7..f900a0e 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3249,6 +3249,14 @@  estimate_move_cost (tree type)
 
   gcc_assert (!VOID_TYPE_P (type));
 
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      enum machine_mode inner = TYPE_MODE (TREE_TYPE (type));
+      enum machine_mode simd
+	= targetm.vectorize.preferred_simd_mode (inner);
+      return GET_MODE_SIZE (TYPE_MODE (type)) / GET_MODE_SIZE (simd);
+    }
+
   size = int_size_in_bytes (type);
 
   if (size < 0 || size > MOVE_MAX_PIECES * MOVE_RATIO (!optimize_size))