diff mbox

RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)

Message ID 20150704141702.GS10247@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek July 4, 2015, 2:17 p.m. UTC
On Sat, Jul 04, 2015 at 09:20:04AM +0200, Jakub Jelinek wrote:
> On Fri, Jul 03, 2015 at 03:21:47PM +0200, Marek Polacek wrote:
> > This patch implements a new pass, called laddress, which deals with
> > lowering ADDR_EXPR assignments.  Such lowering ought to help the
> > vectorizer, but it also could expose more CSE opportunities, maybe
> > help reassoc, etc.  It's only active when optimize != 0.
> > 
> > So e.g.
> >   _1 = (sizetype) i_9;
> >   _7 = _1 * 4;
> >   _4 = &b + _7;
> > instead of
> >   _4 = &b[i_9];
> > 
> > This triggered 14105 times during the regtest and 6392 times during
> > the bootstrap.
> > 
> > The fallout (at least on x86_64) is surprisingly small, i.e. none, just
> > gcc.dg/vect/pr59984.c test (using -fopenmp-simd) ICEs, but that is due
> > to a bug in the vectorizer.  Jakub has a patch and knows the details.
> > As the test shows, we're now able to vectorize ADDR_EXPR of non-invariants
> > (that was the motivation of this pass).
> 
> Just FYI, while bootstrapping/regtesting your patch together with the one
> I've posted and another one to vectorize pr59984.c better, I've noticed there
> is another regression with your patch (reverting my patches doesn't help,
> disabling your gate does help):
> 
> +FAIL: libgomp.c/simd-3.c execution test
> +FAIL: libgomp.c++/simd-3.C execution test
> 
> on both x86_64-linux and i686-linux (at least on AVX capable box).
> Most likely hitting another latent vectorizer issue, haven't analyzed it
> yet.

Here is a fix for that, bootstrapped/regtested on x86_64-linux and
i686-linux on top of Marek's patch and my two other patches, ok for trunk?

The problem was that in loops that don't need any scalar post-loop the
GOMP_SIMD_LAST_LANE value was wrong - 0 instead of vf - 1.

2015-07-04  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/66718
	* tree-vect-stmts.c (vectorizable_call): Replace uses of
	GOMP_SIMD_LANE outside of loop with vf - 1 rather than 0.



	Jakub

Comments

Richard Biener July 4, 2015, 2:56 p.m. UTC | #1
On July 4, 2015 4:17:02 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Sat, Jul 04, 2015 at 09:20:04AM +0200, Jakub Jelinek wrote:
>> On Fri, Jul 03, 2015 at 03:21:47PM +0200, Marek Polacek wrote:
>> > This patch implements a new pass, called laddress, which deals with
>> > lowering ADDR_EXPR assignments.  Such lowering ought to help the
>> > vectorizer, but it also could expose more CSE opportunities, maybe
>> > help reassoc, etc.  It's only active when optimize != 0.
>> > 
>> > So e.g.
>> >   _1 = (sizetype) i_9;
>> >   _7 = _1 * 4;
>> >   _4 = &b + _7;
>> > instead of
>> >   _4 = &b[i_9];
>> > 
>> > This triggered 14105 times during the regtest and 6392 times during
>> > the bootstrap.
>> > 
>> > The fallout (at least on x86_64) is surprisingly small, i.e. none,
>just
>> > gcc.dg/vect/pr59984.c test (using -fopenmp-simd) ICEs, but that is
>due
>> > to a bug in the vectorizer.  Jakub has a patch and knows the
>details.
>> > As the test shows, we're now able to vectorize ADDR_EXPR of
>non-invariants
>> > (that was the motivation of this pass).
>> 
>> Just FYI, while bootstrapping/regtesting your patch together with the
>one
>> I've posted and another one to vectorize pr59984.c better, I've
>noticed there
>> is another regression with your patch (reverting my patches doesn't
>help,
>> disabling your gate does help):
>> 
>> +FAIL: libgomp.c/simd-3.c execution test
>> +FAIL: libgomp.c++/simd-3.C execution test
>> 
>> on both x86_64-linux and i686-linux (at least on AVX capable box).
>> Most likely hitting another latent vectorizer issue, haven't analyzed
>it
>> yet.
>
>Here is a fix for that, bootstrapped/regtested on x86_64-linux and
>i686-linux on top of Marek's patch and my two other patches, ok for
>trunk?

OK.

Thanks,
Richard.

>The problem was that in loops that don't need any scalar post-loop the
>GOMP_SIMD_LAST_LANE value was wrong - 0 instead of vf - 1.
>
>2015-07-04  Jakub Jelinek  <jakub@redhat.com>
>
>	PR tree-optimization/66718
>	* tree-vect-stmts.c (vectorizable_call): Replace uses of
>	GOMP_SIMD_LANE outside of loop with vf - 1 rather than 0.
>
>--- gcc/tree-vect-stmts.c.jj	2015-07-03 20:43:42.000000000 +0200
>+++ gcc/tree-vect-stmts.c	2015-07-04 14:08:18.659356110 +0200
>@@ -2601,6 +2601,30 @@ vectorizable_call (gimple gs, gimple_stm
>     lhs = gimple_call_lhs (STMT_VINFO_RELATED_STMT (stmt_info));
>   else
>     lhs = gimple_call_lhs (stmt);
>+
>+  if (gimple_call_internal_p (stmt)
>+      && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE)
>+    {
>+      /* Replace uses of the lhs of GOMP_SIMD_LANE call outside the
>loop
>+	 with vf - 1 rather than 0, that is the last iteration of the
>+	 vectorized loop.  */
>+      imm_use_iterator iter;
>+      use_operand_p use_p;
>+      gimple use_stmt;
>+      FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
>+	{
>+	  basic_block use_bb = gimple_bb (use_stmt);
>+	  if (use_bb
>+	      && !flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
>use_bb))
>+	    {
>+	      FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
>+		SET_USE (use_p, build_int_cst (TREE_TYPE (lhs),
>+					       ncopies * nunits_out - 1));
>+	      update_stmt (use_stmt);
>+	    }
>+	}
>+    }
>+
>   new_stmt = gimple_build_assign (lhs, build_zero_cst (type));
>   set_vinfo_for_stmt (new_stmt, stmt_info);
>   set_vinfo_for_stmt (stmt, NULL);
>
>
>	Jakub
diff mbox

Patch

--- gcc/tree-vect-stmts.c.jj	2015-07-03 20:43:42.000000000 +0200
+++ gcc/tree-vect-stmts.c	2015-07-04 14:08:18.659356110 +0200
@@ -2601,6 +2601,30 @@  vectorizable_call (gimple gs, gimple_stm
     lhs = gimple_call_lhs (STMT_VINFO_RELATED_STMT (stmt_info));
   else
     lhs = gimple_call_lhs (stmt);
+
+  if (gimple_call_internal_p (stmt)
+      && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE)
+    {
+      /* Replace uses of the lhs of GOMP_SIMD_LANE call outside the loop
+	 with vf - 1 rather than 0, that is the last iteration of the
+	 vectorized loop.  */
+      imm_use_iterator iter;
+      use_operand_p use_p;
+      gimple use_stmt;
+      FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
+	{
+	  basic_block use_bb = gimple_bb (use_stmt);
+	  if (use_bb
+	      && !flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo), use_bb))
+	    {
+	      FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
+		SET_USE (use_p, build_int_cst (TREE_TYPE (lhs),
+					       ncopies * nunits_out - 1));
+	      update_stmt (use_stmt);
+	    }
+	}
+    }
+
   new_stmt = gimple_build_assign (lhs, build_zero_cst (type));
   set_vinfo_for_stmt (new_stmt, stmt_info);
   set_vinfo_for_stmt (stmt, NULL);