diff mbox series

Fix PR92537

Message ID nycvar.YFH.7.76.1911200953150.5566@zhemvz.fhfr.qr
State New
Headers show
Series Fix PR92537 | expand

Commit Message

Richard Biener Nov. 20, 2019, 8:54 a.m. UTC
There's a bit left in this PR which the following fixes.  The root
only became external late and the check more naturally belongs to
the new place.

Boostrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-11-20  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/92537
	* tree-vect-slp.c (vect_analyze_slp_instance): Move CTOR
	vectorization validity check...
	(vect_slp_analyze_operations): ... here.

	* gfortran.dg/pr92537.f90: New testcase.

Comments

Richard Sandiford Nov. 20, 2019, 9:54 a.m. UTC | #1
Richard Biener <rguenther@suse.de> writes:
> There's a bit left in this PR which the following fixes.  The root
> only became external late and the check more naturally belongs to
> the new place.
>
> Boostrap and regtest running on x86_64-unknown-linux-gnu.

FWIW, I was just about to post the patch below before seeing your
message. :-)  Thought I might as well post it anyway just in case.

I guess the slight advantage to keeping the vect_analyze_slp_instance
test is that we can still free child nodes at that point, so it might
be more efficient to catch it "early".  It probably doesn't make much
differnce in practice though.

Tested on aarch64-linux-gnu.

Thanks,
Richard


2019-11-20  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	PR tree-optimization/92537
	* tree-vect-slp.c (vect_slp_analyze_node_operations): Don't
	allow the root node to be external.

gcc/testsuite/
	PR tree-optimization/92537
	* gcc.dg/vect/pr92537.c: New test.
	* gfortran.dg/vect/pr92537.f90: Likewise.

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2019-11-18 15:21:12.919993766 +0000
+++ gcc/tree-vect-slp.c	2019-11-20 09:48:57.145101295 +0000
@@ -2848,7 +2848,7 @@ vect_slp_analyze_node_operations (vec_in
   slp_tree child;
 
   if (SLP_TREE_DEF_TYPE (node) != vect_internal_def)
-    return true;
+    return node != SLP_INSTANCE_TREE (node_instance);
 
   /* If we already analyzed the exact same set of scalar stmts we're done.
      We share the generated vector stmts for those.
Index: gcc/testsuite/gcc.dg/vect/pr92537.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/pr92537.c	2019-11-20 09:48:57.145101295 +0000
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+int a[64];
+int b, c, e;
+short d;
+short f[64];
+void g() {
+  b = 0;
+  c = d >> 3;
+  for (; b < 64 - 1; b++) {
+    e = 0;
+    e -= a[b] * c;
+    f[b] = e;
+  }
+}
Index: gcc/testsuite/gfortran.dg/vect/pr92537.f90
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gfortran.dg/vect/pr92537.f90	2019-11-20 09:48:57.145101295 +0000
@@ -0,0 +1,32 @@
+! { dg-do compile } */
+! { dg-additional-options "-fno-inline -march=skylake" }
+
+MODULE pr93527
+  implicit none
+  integer, parameter :: wp = kind (1.d0)
+  interface p_min
+     module procedure p_min_wp
+  end interface
+contains
+  subroutine foo (pr)
+    real(wp), pointer     :: pr(:)
+    integer  ::  nzd
+    real(wp) ::  pmin
+    real(wp) ::  pmin_diag
+    integer  ::  i
+    nzd  = 15
+    allocate (pr(nzd))
+    pmin_diag = 4000._wp
+    pmin = p_min(pmin_diag)
+    pmin = min (pmin,pmin_diag)
+    pr(1) = log(pmin)
+    do i=1,nzd-1
+       pr(i+1) = log(pmin) + i
+    end do
+  end subroutine foo
+  function p_min_wp (x) result (p_min)
+    real(wp), intent(in) :: x
+    real(wp)             :: p_min
+    p_min = x
+  end function p_min_wp
+end module pr93527
Richard Biener Nov. 20, 2019, 10:51 a.m. UTC | #2
On Wed, 20 Nov 2019, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > There's a bit left in this PR which the following fixes.  The root
> > only became external late and the check more naturally belongs to
> > the new place.
> >
> > Boostrap and regtest running on x86_64-unknown-linux-gnu.
> 
> FWIW, I was just about to post the patch below before seeing your
> message. :-)  Thought I might as well post it anyway just in case.

I was thinking whether it makes sense to have external root nodes
and they probably can only appear with the CTORs right now (otherwise
they'll be stores).  So possibly generalizing the check like you
do makes sense, still the check belongs in vect_slp_analyze_operations ;)

> I guess the slight advantage to keeping the vect_analyze_slp_instance
> test is that we can still free child nodes at that point, so it might
> be more efficient to catch it "early".  It probably doesn't make much
> differnce in practice though.

But if the root is external there are no child nodes (well, besides
that root).  But yeah, having the check twice looked odd to me.

> Tested on aarch64-linux-gnu.

I've installed my variant before seeing you mail, so...

I also have a prototype for finding "random" roots for SLP
vectorization (just a bunch of same stmts) which runs into
the same issue (but also very many others ;))

Thanks,
Richard.

> 
> Thanks,
> Richard
> 
> 
> 2019-11-20  Richard Sandiford  <richard.sandiford@arm.com>
> 
> gcc/
> 	PR tree-optimization/92537
> 	* tree-vect-slp.c (vect_slp_analyze_node_operations): Don't
> 	allow the root node to be external.
> 
> gcc/testsuite/
> 	PR tree-optimization/92537
> 	* gcc.dg/vect/pr92537.c: New test.
> 	* gfortran.dg/vect/pr92537.f90: Likewise.
> 
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c	2019-11-18 15:21:12.919993766 +0000
> +++ gcc/tree-vect-slp.c	2019-11-20 09:48:57.145101295 +0000
> @@ -2848,7 +2848,7 @@ vect_slp_analyze_node_operations (vec_in
>    slp_tree child;
>  
>    if (SLP_TREE_DEF_TYPE (node) != vect_internal_def)
> -    return true;
> +    return node != SLP_INSTANCE_TREE (node_instance);
>  
>    /* If we already analyzed the exact same set of scalar stmts we're done.
>       We share the generated vector stmts for those.
> Index: gcc/testsuite/gcc.dg/vect/pr92537.c
> ===================================================================
> --- /dev/null	2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr92537.c	2019-11-20 09:48:57.145101295 +0000
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3" } */
> +
> +int a[64];
> +int b, c, e;
> +short d;
> +short f[64];
> +void g() {
> +  b = 0;
> +  c = d >> 3;
> +  for (; b < 64 - 1; b++) {
> +    e = 0;
> +    e -= a[b] * c;
> +    f[b] = e;
> +  }
> +}
> Index: gcc/testsuite/gfortran.dg/vect/pr92537.f90
> ===================================================================
> --- /dev/null	2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gfortran.dg/vect/pr92537.f90	2019-11-20 09:48:57.145101295 +0000
> @@ -0,0 +1,32 @@
> +! { dg-do compile } */
> +! { dg-additional-options "-fno-inline -march=skylake" }
> +
> +MODULE pr93527
> +  implicit none
> +  integer, parameter :: wp = kind (1.d0)
> +  interface p_min
> +     module procedure p_min_wp
> +  end interface
> +contains
> +  subroutine foo (pr)
> +    real(wp), pointer     :: pr(:)
> +    integer  ::  nzd
> +    real(wp) ::  pmin
> +    real(wp) ::  pmin_diag
> +    integer  ::  i
> +    nzd  = 15
> +    allocate (pr(nzd))
> +    pmin_diag = 4000._wp
> +    pmin = p_min(pmin_diag)
> +    pmin = min (pmin,pmin_diag)
> +    pr(1) = log(pmin)
> +    do i=1,nzd-1
> +       pr(i+1) = log(pmin) + i
> +    end do
> +  end subroutine foo
> +  function p_min_wp (x) result (p_min)
> +    real(wp), intent(in) :: x
> +    real(wp)             :: p_min
> +    p_min = x
> +  end function p_min_wp
> +end module pr93527
>
diff mbox series

Patch

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	(revision 278477)
+++ gcc/tree-vect-slp.c	(working copy)
@@ -2253,18 +2264,6 @@  vect_analyze_slp_instance (vec_info *vin
 	  matches[group_size / const_max_nunits * const_max_nunits] = false;
 	  vect_free_slp_tree (node, false);
 	}
-      else if (constructor
-	       && SLP_TREE_DEF_TYPE (node) != vect_internal_def)
-	{
-	  /* CONSTRUCTOR vectorization relies on a vector stmt being
-	     generated, that doesn't work for fully external ones.  */
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "Build SLP failed: CONSTRUCTOR of external "
-			     "or constant elements\n");
-	  vect_free_slp_tree (node, false);
-	  return false;
-	}
       else
 	{
 	  /* Create a new SLP instance.  */
@@ -2939,7 +2939,12 @@  vect_slp_analyze_operations (vec_info *v
       if (!vect_slp_analyze_node_operations (vinfo,
 					     SLP_INSTANCE_TREE (instance),
 					     instance, visited, lvisited,
-					     &cost_vec))
+					     &cost_vec)
+	  /* Instances with a root stmt require vectorized defs for the
+	     SLP tree root.  */
+	  || (SLP_INSTANCE_ROOT_STMT (instance)
+	      && (SLP_TREE_DEF_TYPE (SLP_INSTANCE_TREE (instance))
+		  != vect_internal_def)))
         {
 	  slp_tree node = SLP_INSTANCE_TREE (instance);
 	  stmt_vec_info stmt_info = SLP_TREE_SCALAR_STMTS (node)[0];
Index: gcc/testsuite/gfortran.dg/pr92537.f90
===================================================================
--- gcc/testsuite/gfortran.dg/pr92537.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr92537.f90	(working copy)
@@ -0,0 +1,32 @@ 
+! { dg-do compile }
+! { dg-options "-O2 -ftree-vectorize -fno-inline" }
+! { dg-additional-options "-march=skylake" { target x86_64-*-* i?86-*-* } }
+MODULE pr93527
+  implicit none
+  integer, parameter :: wp = kind (1.d0)
+  interface p_min
+     module procedure p_min_wp
+  end interface
+contains
+  subroutine foo (pr)
+    real(wp), pointer     :: pr(:)
+    integer  ::  nzd
+    real(wp) ::  pmin
+    real(wp) ::  pmin_diag
+    integer  ::  i
+    nzd  = 15
+    allocate (pr(nzd))
+    pmin_diag = 4000._wp
+    pmin = p_min(pmin_diag)
+    pmin = min (pmin,pmin_diag)
+    pr(1) = log(pmin)
+    do i=1,nzd-1
+       pr(i+1) = log(pmin) + i
+    end do
+  end subroutine foo
+  function p_min_wp (x) result (p_min)
+    real(wp), intent(in) :: x
+    real(wp)             :: p_min
+    p_min = x
+  end function p_min_wp
+end module pr93527