Patchwork [Fortran] Fix PR 50690

login
register
mail settings
Submitter Thomas Koenig
Date Oct. 21, 2011, 5:39 p.m.
Message ID <4EA1AE61.6000003@netcologne.de>
Download mbox | patch
Permalink /patch/121031/
State New
Headers show

Comments

Thomas Koenig - Oct. 21, 2011, 5:39 p.m.
Jakub Jelinek wrote:
> Though, what could be done is just special case OpenMP workshare regions,
> insert everything into BLOCK local vars unless in OpenMP workshare, in that
> case put the BLOCK with the temporary around the workshare rather than
> inside of it.  In the case of omp parallel workshare it would need
> to go in between omp parallel and omp workshare.

Well, here's a patch which implements this concept.  I chose to insert
the BLOCK in a separate pass because it was the cleanest way to avoid
infinite recursion when inserting a block.

Regression-tested.  OK for trunk?

	Thomas

2011-10-21  Thomas Koenig  <tkoenig@gcc.gnu.org>

         PR fortran/50690
         * frontend-passes.c (workshare_level):  New variable.
         (create_var):  Put the newly created variable into the block
         around the WORKSHARE.
         (enclose_workshare):  New callback function to enclose
         WORKSHAREs in blocks.
         (optimize_namespace):  Use it.
         (gfc_code_walker):  Save/restore current namespace when
         following a BLOCK.  Keep track of workshare level.

2011-10-21  Thomas Koenig  <tkoenig@gcc.gnu.org>

         PR fortran/50690
         * gfortran.dg/gomp/workshare2.f90:  New test.
! { dg-do run }
! { dg-options "-ffrontend-optimize" }
! PR 50690 - this used to ICE because workshare could not handle
! BLOCKs.
program foo
  implicit none
  real, parameter :: eps = 3e-7
  integer :: i
  real :: A(10), B(5), C(10)
  B(1) = 3.344
  call random_number(a)
  c = a
  !$omp parallel default(shared)
  !$omp workshare
   A(:) = A(:)*cos(B(1))+A(:)*cos(B(1))
  !$omp end workshare nowait
  !$omp end parallel ! sync is implied here
!  c = c*cos(b(1)) + c*cos(b(1))
!  if (any(abs(a-c) > eps)) call abort
end program foo
Thomas Koenig - Oct. 25, 2011, 5:39 a.m.
Ping ** 0.571428

> Jakub Jelinek wrote:
>> Though, what could be done is just special case OpenMP workshare regions,
>> insert everything into BLOCK local vars unless in OpenMP workshare, in
>> that
>> case put the BLOCK with the temporary around the workshare rather than
>> inside of it. In the case of omp parallel workshare it would need
>> to go in between omp parallel and omp workshare.
>
> Well, here's a patch which implements this concept. I chose to insert
> the BLOCK in a separate pass because it was the cleanest way to avoid
> infinite recursion when inserting a block.
>
> Regression-tested. OK for trunk?
>
> Thomas
>
> 2011-10-21 Thomas Koenig <tkoenig@gcc.gnu.org>
>
> PR fortran/50690
> * frontend-passes.c (workshare_level): New variable.
> (create_var): Put the newly created variable into the block
> around the WORKSHARE.
> (enclose_workshare): New callback function to enclose
> WORKSHAREs in blocks.
> (optimize_namespace): Use it.
> (gfc_code_walker): Save/restore current namespace when
> following a BLOCK. Keep track of workshare level.
>
> 2011-10-21 Thomas Koenig <tkoenig@gcc.gnu.org>
>
> PR fortran/50690
> * gfortran.dg/gomp/workshare2.f90: New test.
>
>
Mikael Morin - Oct. 30, 2011, 7:37 p.m.
On Tuesday 25 October 2011 07:39:44 Thomas Koenig wrote:
> Ping ** 0.571428
> 
Let's keep Jakub CC-ed for mixes of OpenMP and frontend optimizations. ;-)

There are two commented lines in the testcase. Is it expected?
Otherwise doesn't look too bad...

Mikael

> > Jakub Jelinek wrote:
> >> Though, what could be done is just special case OpenMP workshare
> >> regions, insert everything into BLOCK local vars unless in OpenMP
> >> workshare, in that
> >> case put the BLOCK with the temporary around the workshare rather than
> >> inside of it. In the case of omp parallel workshare it would need
> >> to go in between omp parallel and omp workshare.
> > 
> > Well, here's a patch which implements this concept. I chose to insert
> > the BLOCK in a separate pass because it was the cleanest way to avoid
> > infinite recursion when inserting a block.
> > 
> > Regression-tested. OK for trunk?
> > 
> > Thomas
> > 
> > 2011-10-21 Thomas Koenig <tkoenig@gcc.gnu.org>
> > 
> > PR fortran/50690
> > * frontend-passes.c (workshare_level): New variable.
> > (create_var): Put the newly created variable into the block
> > around the WORKSHARE.
> > (enclose_workshare): New callback function to enclose
> > WORKSHAREs in blocks.
> > (optimize_namespace): Use it.
> > (gfc_code_walker): Save/restore current namespace when
> > following a BLOCK. Keep track of workshare level.
> > 
> > 2011-10-21 Thomas Koenig <tkoenig@gcc.gnu.org>
> > 
> > PR fortran/50690
> > * gfortran.dg/gomp/workshare2.f90: New test.
Tobias Burnus - Oct. 30, 2011, 10:53 p.m.
Mikael Morin wrote:
> Let's keep Jakub CC-ed for mixes of OpenMP and frontend optimizations. ;-)
>
> There are two commented lines in the testcase. Is it expected?
> Otherwise doesn't look too bad...

I had also a glance at the patch - and it looks reasonable; in 
particular, I failed to generate a failing test case.

Besides the
   !$omp parallel
   !$omp workshare
one should also add a test for - it can also be in the same file -:
   !$omp parallel workshare

Regarding the test case: You cannot place a "dg-do run" OpenMP test case 
into gcc/testsuite/gfortran.dg/gomp/workshare2.f90; you have to use 
libgomp/testsuite/libgomp.fortran. The former directory only allows 
"dg-do compile" test cases. (By the way, running "make check" for 
libgomp does never harm.)

Jakub, what do you think?

Tobias

>>> Jakub Jelinek wrote:
>>>> Though, what could be done is just special case OpenMP workshare
>>>> regions, insert everything into BLOCK local vars unless in OpenMP
>>>> workshare, in that
>>>> case put the BLOCK with the temporary around the workshare rather than
>>>> inside of it. In the case of omp parallel workshare it would need
>>>> to go in between omp parallel and omp workshare.
>>> Well, here's a patch which implements this concept. I chose to insert
>>> the BLOCK in a separate pass because it was the cleanest way to avoid
>>> infinite recursion when inserting a block.
>>>
>>> Regression-tested. OK for trunk?
>>>
>>> Thomas
>>>
>>> 2011-10-21 Thomas Koenig<tkoenig@gcc.gnu.org>
>>>
>>> PR fortran/50690
>>> * frontend-passes.c (workshare_level): New variable.
>>> (create_var): Put the newly created variable into the block
>>> around the WORKSHARE.
>>> (enclose_workshare): New callback function to enclose
>>> WORKSHAREs in blocks.
>>> (optimize_namespace): Use it.
>>> (gfc_code_walker): Save/restore current namespace when
>>> following a BLOCK. Keep track of workshare level.
>>>
>>> 2011-10-21 Thomas Koenig<tkoenig@gcc.gnu.org>
>>>
>>> PR fortran/50690
>>> * gfortran.dg/gomp/workshare2.f90: New test.
Tobias Burnus - Oct. 31, 2011, 11:53 a.m.
Tobias Burnus wrote:
> I had also a glance at the patch - and it looks reasonable; in 
> particular, I failed to generate a failing test case.

Actually, the test case is *not* OK.

If one compiles the original test case of the PR (or your 
workshare2.f90) with "-O" and looks at "-fdump-tree-original", one finds:

     #pragma omp parallel default(shared)
       {
         {
           real(kind=4) __var_1;
           {
             #pragma omp single
               {
                 __var_1 = __builtin_cosf (b[0])
               }
...
                 #pragma omp for schedule(static) nowait
                 for (S.1 = 1; S.1 <= 5; S.1 = S.1 + 1)
                   {
                     a[S.1 + -1] = a[S.1 + -1] * D.1730 + a[S.1 + -1] * 
D.1731;

Thus, __var_1 is a thread-local variable; however, COS() is not executed 
in all threads but only in one due to the omp single: "The single 
construct specifies that the associated structured block is executed by 
only one of the threads in the team" (2.5.3 single Construct, OpenMP 3.1).

Jakub remarks that omp single is what we expand to omp workshare if it 
is not simple enough for us.

  * * *

With the test case below, the dump looks OK, but the FE optimization 
does not combine the two cos() calls - I have no idea why. The dump 
looks as:

   #pragma omp parallel default(shared)
     {
                 D.1743 = __builtin_cosf (b[0]);
                 D.1745 = __builtin_cosf (b[0]);
...
                   #pragma omp for schedule(static) nowait
                   for (S.2 = 1; S.2 <= 10; S.2 = S.2 + 1)
                       a[S.2 + D.1750] = a[S.2 + D.1748] * D.1743 + 
a[S.2 + D.1749] * D.1745;

Tobias

PS: The test case is:

program workshare
   implicit none
   real, parameter :: eps = 3e-7
   integer :: j
   real :: A(10,5), B(5)
   B(1) = 3.344
   call random_number(a)
   !$omp parallel default(shared)
   !$omp workshare
   forall (j=1:5)
     A(:,j) = A(:,j)*cos(B(1))+A(:,j)*cos(B(1))
   end forall
   !$omp end workshare
   !$omp end parallel
   print *, A
end program workshare

subroutine parallel_workshare
   implicit none
   real, parameter :: eps = 3e-7
   integer :: j
   real :: A(10,5), B(5)
   B(1) = 3.344
   call random_number(a)
   !$omp parallel workshare default(shared)
   forall (j=1:5)
     A(:,j) = A(:,j)*cos(B(1))+A(:,j)*cos(B(1))
   end forall
   !$omp end parallel workshare
   print *, A
end subroutine parallel_workshare

Patch

Index: frontend-passes.c
===================================================================
--- frontend-passes.c	(Revision 180063)
+++ frontend-passes.c	(Arbeitskopie)
@@ -66,6 +66,10 @@  static gfc_namespace *current_ns;
 
 static int forall_level;
 
+/* If we are within an OMP WORKSHARE or OMP PARALLEL WORKSHARE.  */
+
+static int workshare_level;
+
 /* Entry point - run all passes for a namespace.  So far, only an
    optimization pass is run.  */
 
@@ -245,8 +249,16 @@  create_var (gfc_expr * e)
   gfc_namespace *ns;
   int i;
 
+  /* Special treatment for WORKSHARE: The variable goes into the block
+     created by the earlier pass around it.  */
+
+  if (workshare_level > 0)
+    {
+      ns = current_ns;
+      changed_statement = current_code;
+    }
   /* If the block hasn't already been created, do so.  */
-  if (inserted_block == NULL)
+  else if (inserted_block == NULL)
     {
       inserted_block = XCNEW (gfc_code);
       inserted_block->op = EXEC_BLOCK;
@@ -497,6 +509,38 @@  convert_do_while (gfc_code **c, int *walk_subtrees
   return 0;
 }
 
+/* Callback function to enclose OMP workshares into BLOCKs.  This is done
+   so that later front end optimization can insert temporary variables into
+   the outer block scope.  */
+
+static int
+enclose_workshare (gfc_code **c, int *walk_subtrees,
+		   void *data ATTRIBUTE_UNUSED)
+{
+  gfc_code *co;
+  gfc_code *new_block;
+  gfc_namespace *ns;
+
+  co = *c;
+
+  if (co->op != EXEC_OMP_WORKSHARE && co->op != EXEC_OMP_PARALLEL_WORKSHARE)
+    return 0;
+
+  /* Create the block.  */
+  new_block = XCNEW (gfc_code);
+  new_block->op = EXEC_BLOCK;
+  new_block->loc = co->loc;
+  ns = gfc_build_block_ns (current_ns);
+  new_block->ext.block.ns = ns;
+  new_block->ext.block.assoc = NULL;
+  ns->code = co;
+
+  /* Insert the BLOCK at the right position.  */
+  *c = new_block;
+  *walk_subtrees = false;
+  return 0;
+}
+
 /* Optimize a namespace, including all contained namespaces.  */
 
 static void
@@ -507,6 +551,12 @@  optimize_namespace (gfc_namespace *ns)
   forall_level = 0;
 
   gfc_code_walker (&ns->code, convert_do_while, dummy_expr_callback, NULL);
+  if (gfc_option.gfc_flag_openmp)
+    {
+      workshare_level = 0;
+      gfc_code_walker (&ns->code, enclose_workshare, dummy_expr_callback, NULL);
+    }
+
   gfc_code_walker (&ns->code, cfe_code, cfe_expr_0, NULL);
   gfc_code_walker (&ns->code, optimize_code, optimize_expr, NULL);
 
@@ -1148,6 +1198,7 @@  gfc_code_walker (gfc_code **c, walk_code_fn_t code
 	  gfc_code *b;
 	  gfc_actual_arglist *a;
 	  gfc_code *co;
+	  gfc_namespace *save_ns;
 	  gfc_association_list *alist;
 
 	  /* There might be statement insertions before the current code,
@@ -1159,7 +1210,11 @@  gfc_code_walker (gfc_code **c, walk_code_fn_t code
 	    {
 
 	    case EXEC_BLOCK:
+	      save_ns = current_ns;
+	      current_ns = co->ext.block.ns;
 	      WALK_SUBCODE (co->ext.block.ns->code);
+	      current_ns = save_ns;
+
 	      for (alist = co->ext.block.assoc; alist; alist = alist->next)
 		WALK_SUBEXPR (alist->target);
 	      break;
@@ -1329,14 +1384,18 @@  gfc_code_walker (gfc_code **c, walk_code_fn_t code
 	      WALK_SUBEXPR (co->ext.dt->extra_comma);
 	      break;
 
+	    case EXEC_OMP_PARALLEL_WORKSHARE:
+	    case EXEC_OMP_WORKSHARE:
+	      workshare_level ++;
+
+	      /* Fall through.  */
+
 	    case EXEC_OMP_DO:
 	    case EXEC_OMP_PARALLEL:
 	    case EXEC_OMP_PARALLEL_DO:
 	    case EXEC_OMP_PARALLEL_SECTIONS:
-	    case EXEC_OMP_PARALLEL_WORKSHARE:
 	    case EXEC_OMP_SECTIONS:
 	    case EXEC_OMP_SINGLE:
-	    case EXEC_OMP_WORKSHARE:
 	    case EXEC_OMP_END_SINGLE:
 	    case EXEC_OMP_TASK:
 	      if (co->ext.omp_clauses)
@@ -1365,6 +1424,9 @@  gfc_code_walker (gfc_code **c, walk_code_fn_t code
 	  if (co->op == EXEC_FORALL)
 	    forall_level --;
 
+	  if (co->op == EXEC_OMP_WORKSHARE
+	      || co->op == EXEC_OMP_PARALLEL_WORKSHARE)
+	    workshare_level --;
 	}
     }
   return 0;