Patchwork [fortran] Function call optimization

login
register
mail settings
Submitter Thomas Koenig
Date March 20, 2011, 5:10 p.m.
Message ID <4D863504.5000609@netcologne.de>
Download mbox | patch
Permalink /patch/87667/
State New
Headers show

Comments

Thomas Koenig - March 20, 2011, 5:10 p.m.
Hi Mikael,


> - Array elementals: here it's hard to tell which one is faster; save function
> calls to a temporary and use the temporary or do multiple function calls every
> time (but without temporary).

I have removed elementals for now. It would probably be best to create
a scalar temporary variable during scalarzation.

> - Transpose optimization: here the transpose call is changed into a direct
> array access, so your patch will definitely make things worse. Even if
> transpose calls are used multiple times as actual argument for example, better
> create multiple descriptors than copy the whole lot to a temp. The common
> function elimination should be disabled (IMO) in this case.

Removed for transpose.


> Please use more descriptive names.
> There are 3 cfe* functions and only one has a comment explaining what cfe
> means.

Changed.


> Is it expected that you allow allocatables with -faggressive-function-
> elimination?

No, changed.

>> +
>> +         if (!(*e)->value.function.esym->attr.pure
>> +&&  !(*e)->value.function.esym->attr.implicit_pure)
>> +           return 0;
>> +       }
>> +    }
>> +
>> +  if ((*e)->value.function.isym)
>> +    {
>> +      if ((*e)->value.function.isym->id == GFC_ISYM_CONVERSION)
>> +       return 0;
>> +
>> +      if (! (*e)->value.function.isym->pure
>> +&&  !(*e)->value.function.isym->elemental)
> Tobias' comment also applies here, even if there is no intrinsic impure
> elemental: the following code in intrinsic.c makes the elemental check
> redundant.
>        next_sym->pure = (cl != CLASS_IMPURE);
>        next_sym->elemental = (cl == CLASS_ELEMENTAL);

Changed.


>> +  for (i=expr_count-1; i>=1; i--)
>> +    {
> Tiny optimization here ;-):
> /* Don't bother if the expression has been factored already.  */

Applied.

> if ((*(expr_array[i]))->expr_type == EXPR_VARIABLE)
>    continue;
>> +      newvar = NULL;
>



 >> +
 > Small comment here, or better, more descriptive name
>> +static int
>> +cfe_code (gfc_code **c, int *walk_subtrees ATTRIBUTE_UNUSED,
>> +         void *data ATTRIBUTE_UNUSED)
>> +{

Better comment added.

I have also factored out the code with gfc_dep_compare_expression
into its own function.

Here is the new version of the patch.  Regression-tested.  OK for trunk?

	Thomas

2010-03-20  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/22572
	* gfortran.h (gfc_option_t) : Add
	flag_aggressive_function_elimination.
	(gfc_dep_compare_functions):  Add prototype.
	* lang.opt: Add faggressive-function-elimination.
	* invoke.texi: Document -faggressive-function-elimination.
	* frontend_passes (expr_array):  New static variable.
	(expr_size):  Likewise.
	(expr_count):  Likewise.
	(current_code):  Likewise.
	(current_ns):  Likewise.
	(gfc_run_passes):  Allocate and free space for expressions.
	(cfe_register_funcs):  New function.
	(create_var):  New function.
	(cfc_expr_0):  New function.
	(cfe_code):  New function.
	(optimize_namespace):  Invoke gfc_code_walker with cfe_code
	and cfe_expr_0.
	* dependency.c (gfc_dep_compare_functions):  New function.
	(gfc_dep_compare_expr):  Use it.
	* options.c (gfc_init_options):  Handle
	flag_aggressive_function_elimination.
	(gfc_handle_option):  Likewise.

2010-03-20  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/22572
	* gfortran.dg/function_optimize_1.f90:  New test.
	* gfortran.dg/function_optimize_2.f90:  New test.
! { dg-do compile }
! { dg-options "-O -fdump-tree-original" }
program main
  implicit none
  real, dimension(2,2) :: a, b, c, d
  integer :: i
  real :: x, z
  character(60) :: line
  real, external :: ext_func
  interface
     elemental function element(x)
       real, intent(in) :: x
       real :: elem
     end function element
     pure function mypure(x)
       real, intent(in) :: x
       integer :: mypure
     end function mypure
     elemental impure function elem_impure(x)
       real, intent(in) :: x
       real :: elem_impure
     end function elem_impure
  end interface

  data a /2., 3., 5., 7./
  data b /11., 13., 17., 23./
  write (unit=line, fmt='(4F7.2)') matmul(a,b) + matmul(a,b)
  z = sin(x) + cos(x) + sin(x) + cos(x)
  print *,z
  x = ext_func(a) + 23 + ext_func(a)
  print *,d,x
  z = element(x) + element(x)
  print *,z
  i = mypure(x) - mypure(x)
  print *,i
  z = elem_impure(x) - elem_impure(x)
  print *,z
end program main
! { dg-final { scan-tree-dump-times "matmul_r4" 1 "original" } }
! { dg-final { scan-tree-dump-times "__builtin_sinf" 1 "original" } }
! { dg-final { scan-tree-dump-times "__builtin_cosf" 1 "original" } }
! { dg-final { scan-tree-dump-times "ext_func" 2 "original" } }
! { dg-final { scan-tree-dump-times "element" 1 "original" } }
! { dg-final { scan-tree-dump-times "mypure" 1 "original" } }
! { dg-final { scan-tree-dump-times "elem_impure" 2 "original" } }
! { dg-final { cleanup-tree-dump "original" } }
! { dg-do compile }
! { dg-options "-O -faggressive-function-elimination -fdump-tree-original" }
program main
  implicit none
  real, dimension(2,2) :: a, b, c, d
  real :: x, z
  integer :: i
  character(60) :: line
  real, external :: ext_func
  interface
     elemental function element(x)
       real, intent(in) :: x
       real :: elem
     end function element
     pure function mypure(x)
       real, intent(in) :: x
       integer :: mypure
     end function mypure
     elemental impure function elem_impure(x)
       real, intent(in) :: x
       real :: elem_impure
     end function elem_impure
  end interface

  data a /2., 3., 5., 7./
  data b /11., 13., 17., 23./
  write (unit=line, fmt='(4F7.2)') matmul(a,b) + matmul(a,b)
  x = 1.2
  z = sin(x) + cos(x) + sin(x) + cos(x)
  print *,z
  x = ext_func(a) + 23 + ext_func(a)
  print *,d,x
  z = element(x) + element(x)
  print *,z
  i = mypure(x) - mypure(x)
  print *,i
  z = elem_impure(x) - elem_impure(x)
  print *,z
end program main
! { dg-final { scan-tree-dump-times "matmul_r4" 1 "original" } }
! { dg-final { scan-tree-dump-times "__builtin_sinf" 1 "original" } }
! { dg-final { scan-tree-dump-times "__builtin_cosf" 1 "original" } }
! { dg-final { scan-tree-dump-times "ext_func" 1 "original" } }
! { dg-final { scan-tree-dump-times "element" 1 "original" } }
! { dg-final { scan-tree-dump-times "mypure" 1 "original" } }
! { dg-final { scan-tree-dump-times "elem_impure" 1 "original" } }
! { dg-final { cleanup-tree-dump "original" } }
Mikael Morin - March 20, 2011, 11:32 p.m.
> Here is the new version of the patch.  Regression-tested.  OK for trunk?
OK this time.

Thank you.

Mikael
Thomas Koenig - March 21, 2011, 7:15 a.m.
Hi Mikael,
>> Here is the new version of the patch.  Regression-tested.  OK for trunk?
> OK this time.
>
> Thank you.

Sende          fortran/ChangeLog
Sende          fortran/dependency.c
Sende          fortran/frontend-passes.c
Sende          fortran/gfortran.h
Sende          fortran/invoke.texi
Sende          fortran/lang.opt
Sende          fortran/options.c
Sende          testsuite/ChangeLog
Hinzufügen     testsuite/gfortran.dg/function_optimize_1.f90
Hinzufügen     testsuite/gfortran.dg/function_optimize_2.f90
Übertrage Daten ..........
Revision 171207 übertragen.

Thanks for the review!

	Thomas

Patch

Index: gfortran.h
===================================================================
--- gfortran.h	(Revision 170960)
+++ gfortran.h	(Arbeitskopie)
@@ -2232,6 +2232,7 @@  typedef struct
   int flag_whole_file;
   int flag_protect_parens;
   int flag_realloc_lhs;
+  int flag_aggressive_function_elimination;
 
   int fpe;
   int rtcheck;
@@ -2865,6 +2866,7 @@  void gfc_global_used (gfc_gsymbol *, locus *);
 gfc_namespace* gfc_build_block_ns (gfc_namespace *);
 
 /* dependency.c */
+int gfc_dep_compare_functions (gfc_expr *, gfc_expr *, bool);
 int gfc_dep_compare_expr (gfc_expr *, gfc_expr *);
 
 /* check.c */
Index: lang.opt
===================================================================
--- lang.opt	(Revision 170960)
+++ lang.opt	(Arbeitskopie)
@@ -278,6 +278,10 @@  d
 Fortran Joined
 ; Documented in common.opt
 
+faggressive-function-elimination
+Fortran
+Eliminate multiple function invokations also for impure functions
+
 falign-commons
 Fortran
 Enable alignment of COMMON blocks
Index: invoke.texi
===================================================================
--- invoke.texi	(Revision 170960)
+++ invoke.texi	(Arbeitskopie)
@@ -1468,6 +1468,18 @@  need to be in effect.
 An allocatable left-hand side of an intrinsic assignment is automatically
 (re)allocated if it is either unallocated or has a different shape. The
 option is enabled by default except when @option{-std=f95} is given.
+
+@item -faggressive-function-elimination
+@opindex @code{faggressive-function-elimination}
+@cindex Elimination of functions with identical argument lists
+Functions with identical argument lists are eliminated within
+statements, regardless of whether these functions are marked
+@code{PURE} or not. For example, in
+@smallexample
+  a = f(b,c) + f(b,c)
+@end smallexample
+there will only be a single call to @code{f}.
+
 @end table
 
 @xref{Code Gen Options,,Options for Code Generation Conventions,
@@ -1475,7 +1487,6 @@  gcc,Using the GNU Compiler Collection (GCC)}, for
 offered by the GBE
 shared by @command{gfortran}, @command{gcc}, and other GNU compilers.
 
-
 @c man end
 
 @node Environment Variables
Index: frontend-passes.c
===================================================================
--- frontend-passes.c	(Revision 170960)
+++ frontend-passes.c	(Arbeitskopie)
@@ -40,6 +40,21 @@  static bool optimize_trim (gfc_expr *);
 
 static int count_arglist;
 
+/* Pointer to an array of gfc_expr ** we operate on, plus its size
+   and counter.  */
+
+static gfc_expr ***expr_array;
+static int expr_size, expr_count;
+
+/* Pointer to the gfc_code we currently work on - to be able to insert
+   a statement before.  */
+
+static gfc_code **current_code;
+
+/* The namespace we are currently dealing with.  */
+
+gfc_namespace *current_ns;
+
 /* Entry point - run all passes for a namespace.  So far, only an
    optimization pass is run.  */
 
@@ -48,9 +63,16 @@  gfc_run_passes (gfc_namespace *ns)
 {
   if (optimize)
     {
+      expr_size = 20;
+      expr_array = XNEWVEC(gfc_expr **, expr_size);
+
       optimize_namespace (ns);
       if (gfc_option.dump_fortran_optimized)
 	gfc_dump_parse_tree (ns, stdout);
+
+      /* FIXME: The following should be XDELETEVEC(expr_array);
+      but we cannot do that because it depends on free.  */
+      gfc_free (expr_array);
     }
 }
 
@@ -106,11 +128,214 @@  optimize_expr (gfc_expr **e, int *walk_subtrees AT
   return 0;
 }
 
+
+/* Callback function for common function elimination, called from cfe_expr_0.
+   Put all eligible function expressions into expr_array.  We can't do
+   allocatable functions.  */
+
+static int
+cfe_register_funcs (gfc_expr **e, int *walk_subtrees ATTRIBUTE_UNUSED,
+	  void *data ATTRIBUTE_UNUSED)
+{
+  if ((*e)->expr_type != EXPR_FUNCTION)
+    return 0;
+
+  /* We don't do character functions (yet).  */
+  if ((*e)->ts.type == BT_CHARACTER)
+    return 0;
+
+  /* If we don't know the shape at compile time, we do not create a temporary
+     variable to hold the intermediate result.  FIXME: Change this later when
+     allocation on assignment works for intrinsics.  */
+
+  if ((*e)->rank > 0 && (*e)->shape == NULL)
+    return 0;
+  
+  /* Skip the test for pure functions if -faggressive-function-elimination
+     is specified.  */
+  if ((*e)->value.function.esym)
+    {
+      if ((*e)->value.function.esym->attr.allocatable)
+	return 0;
+
+      /* Don't create an array temporary for elemental functions.  */
+      if ((*e)->value.function.esym->attr.elemental && (*e)->rank > 0)
+	return 0;
+
+      /* Only eliminate potentially impure functions if the
+	 user specifically requested it.  */
+      if (!gfc_option.flag_aggressive_function_elimination
+	  && !(*e)->value.function.esym->attr.pure
+	  && !(*e)->value.function.esym->attr.implicit_pure)
+	return 0;
+    }
+
+  if ((*e)->value.function.isym)
+    {
+      /* Conversions are handled on the fly by the middle end,
+	 transpose during trans-* stages.  */
+      if ((*e)->value.function.isym->id == GFC_ISYM_CONVERSION
+	  || (*e)->value.function.isym->id == GFC_ISYM_TRANSPOSE)
+	return 0;
+
+      /* Don't create an array temporary for elemental functions,
+	 as this would be wasteful of memory.
+	 FIXME: Create a scalar temporary during scalarization.  */
+      if ((*e)->value.function.isym->elemental && (*e)->rank > 0)
+	return 0;
+
+      if (!(*e)->value.function.isym->pure)
+	return 0;
+    }
+
+  if (expr_count >= expr_size)
+    {
+      expr_size += expr_size;
+      expr_array = XRESIZEVEC(gfc_expr **, expr_array, expr_size);
+    }
+  expr_array[expr_count] = e;
+  expr_count ++;
+  return 0;
+}
+
+/* Returns a new expression (a variable) to be used in place of the old one,
+   with an an assignment statement before the current statement to set
+   the value of the variable.  */
+
+static gfc_expr*
+create_var (gfc_expr * e)
+{
+  char name[GFC_MAX_SYMBOL_LEN +1];
+  static int num = 1;
+  gfc_symtree *symtree;
+  gfc_symbol *symbol;
+  gfc_expr *result;
+  gfc_code *n;
+  int i;
+
+  sprintf(name, "__var_%d",num++);
+  if (gfc_get_sym_tree (name, current_ns, &symtree, false) != 0)
+    gcc_unreachable ();
+
+  symbol = symtree->n.sym;
+  symbol->ts = e->ts;
+  symbol->as = gfc_get_array_spec ();
+  symbol->as->rank = e->rank;
+  symbol->as->type = AS_EXPLICIT;
+  for (i=0; i<e->rank; i++)
+    {
+      gfc_expr *p, *q;
+      
+      p = gfc_get_constant_expr (BT_INTEGER, gfc_default_integer_kind,
+				 &(e->where));
+      mpz_set_si (p->value.integer, 1);
+      symbol->as->lower[i] = p;
+	  
+      q = gfc_get_constant_expr (BT_INTEGER, gfc_index_integer_kind,
+				 &(e->where));
+      mpz_set (q->value.integer, e->shape[i]);
+      symbol->as->upper[i] = q;
+    }
+
+  symbol->attr.flavor = FL_VARIABLE;
+  symbol->attr.referenced = 1;
+  symbol->attr.dimension = e->rank > 0;
+  gfc_commit_symbol (symbol);
+
+  result = gfc_get_expr ();
+  result->expr_type = EXPR_VARIABLE;
+  result->ts = e->ts;
+  result->rank = e->rank;
+  result->shape = gfc_copy_shape (e->shape, e->rank);
+  result->symtree = symtree;
+  result->where = e->where;
+  if (e->rank > 0)
+    {
+      result->ref = gfc_get_ref ();
+      result->ref->type = REF_ARRAY;
+      result->ref->u.ar.type = AR_FULL;
+      result->ref->u.ar.where = e->where;
+      result->ref->u.ar.as = symbol->as;
+    }
+
+  /* Generate the new assignment.  */
+  n = XCNEW (gfc_code);
+  n->op = EXEC_ASSIGN;
+  n->loc = (*current_code)->loc;
+  n->next = *current_code;
+  n->expr1 = gfc_copy_expr (result);
+  n->expr2 = e;
+  *current_code = n;
+
+  return result;
+}
+
+/* Callback function for the code walker for doing common function
+   elimination.  This builds up the list of functions in the expression
+   and goes through them to detect duplicates, which it then replaces
+   by variables.  */
+
+static int
+cfe_expr_0 (gfc_expr **e, int *walk_subtrees,
+	  void *data ATTRIBUTE_UNUSED)
+{
+  int i,j;
+  gfc_expr *newvar;
+
+  expr_count = 0;
+
+  gfc_expr_walker (e, cfe_register_funcs, NULL);
+
+  /* Walk backwards through all the functions to make sure we
+     catch the leaf functions first.  */
+  for (i=expr_count-1; i>=1; i--)
+    {
+      /* Skip if the function has been replaced by a variable already.  */
+      if ((*(expr_array[i]))->expr_type == EXPR_VARIABLE)
+	continue;
+
+      newvar = NULL;
+      for (j=i-1; j>=0; j--)
+	{
+	  if (gfc_dep_compare_functions(*(expr_array[i]),
+					*(expr_array[j]), true)	== 0)
+	    {
+	      if (newvar == NULL)
+		newvar = create_var (*(expr_array[i]));
+	      gfc_free (*(expr_array[j]));
+	      *(expr_array[j]) = gfc_copy_expr (newvar);
+	    }
+	}
+      if (newvar)
+	*(expr_array[i]) = newvar;
+    }
+
+  /* We did all the necessary walking in this function.  */
+  *walk_subtrees = 0;
+  return 0;
+}
+
+/* Callback function for common function elimination, called from
+   gfc_code_walker.  This keeps track of the current code, in order
+   to insert statements as needed.  */
+
+static int
+cfe_code (gfc_code **c, int *walk_subtrees ATTRIBUTE_UNUSED,
+	  void *data ATTRIBUTE_UNUSED)
+{
+  current_code = c;
+  return 0;
+}
+
 /* Optimize a namespace, including all contained namespaces.  */
 
 static void
 optimize_namespace (gfc_namespace *ns)
 {
+
+  current_ns = ns;
+
+  gfc_code_walker (&ns->code, cfe_code, cfe_expr_0, NULL);
   gfc_code_walker (&ns->code, optimize_code, optimize_expr, NULL);
 
   for (ns = ns->contained; ns; ns = ns->sibling)
Index: dependency.c
===================================================================
--- dependency.c	(Revision 170960)
+++ dependency.c	(Arbeitskopie)
@@ -177,16 +177,60 @@  gfc_are_identical_variables (gfc_expr *e1, gfc_exp
   return true;
 }
 
+/* Compare two functions for equality.  Only the first expression is known
+   to be a function.  Returns 0 if e1==e2, -2 otherwise.  If impure_ok is
+   false, only return 0 for pure functions.  */
+
+int
+gfc_dep_compare_functions (gfc_expr *e1, gfc_expr *e2, bool impure_ok)
+{
+
+  gfc_actual_arglist *args1;
+  gfc_actual_arglist *args2;
+  
+  if (e2->expr_type != EXPR_FUNCTION)
+    return -2;
+
+  if ((e1->value.function.esym && e2->value.function.esym
+       && e1->value.function.esym == e2->value.function.esym
+       && (e1->value.function.esym->result->attr.pure || impure_ok))
+       || (e1->value.function.isym && e2->value.function.isym
+	   && e1->value.function.isym == e2->value.function.isym
+	   && (e1->value.function.isym->pure || impure_ok)))
+    {
+      args1 = e1->value.function.actual;
+      args2 = e2->value.function.actual;
+
+      /* Compare the argument lists for equality.  */
+      while (args1 && args2)
+	{
+	  /*  Bitwise xor, since C has no non-bitwise xor operator.  */
+	  if ((args1->expr == NULL) ^ (args2->expr == NULL))
+	    return -2;
+	  
+	  if (args1->expr != NULL && args2->expr != NULL
+	      && gfc_dep_compare_expr (args1->expr, args2->expr) != 0)
+	    return -2;
+	  
+	  args1 = args1->next;
+	  args2 = args2->next;
+	}
+      return (args1 || args2) ? -2 : 0;
+    }
+      else
+	return -2;      
+}
+
 /* Compare two values.  Returns 0 if e1 == e2, -1 if e1 < e2, +1 if e1 > e2,
    and -2 if the relationship could not be determined.  */
 
 int
 gfc_dep_compare_expr (gfc_expr *e1, gfc_expr *e2)
 {
+  int i;
+  gfc_expr *n1, *n2;
   gfc_actual_arglist *args1;
   gfc_actual_arglist *args2;
-  int i;
-  gfc_expr *n1, *n2;
 
   n1 = NULL;
   n2 = NULL;
@@ -399,36 +443,7 @@  gfc_dep_compare_expr (gfc_expr *e1, gfc_expr *e2)
       return -2;
 
     case EXPR_FUNCTION:
-
-      /* PURE functions can be compared for argument equality.  */
-      if ((e1->value.function.esym && e2->value.function.esym
-	   && e1->value.function.esym == e2->value.function.esym
-	   && e1->value.function.esym->result->attr.pure)
-	  || (e1->value.function.isym && e2->value.function.isym
-	      && e1->value.function.isym == e2->value.function.isym
-	      && e1->value.function.isym->pure))
-	{
-	  args1 = e1->value.function.actual;
-	  args2 = e2->value.function.actual;
-
-	  /* Compare the argument lists for equality.  */
-	  while (args1 && args2)
-	    {
-	      /*  Bitwise xor, since C has no non-bitwise xor operator.  */
-	      if ((args1->expr == NULL) ^ (args2->expr == NULL))
-		return -2;
-
-	      if (args1->expr != NULL && args2->expr != NULL
-		  && gfc_dep_compare_expr (args1->expr, args2->expr) != 0)
-		return -2;
-
-	      args1 = args1->next;
-	      args2 = args2->next;
-	    }
-	  return (args1 || args2) ? -2 : 0;
-	}
-      else
-	return -2;
+      return gfc_dep_compare_functions (e1, e2, false);
       break;
 
     default:
Index: options.c
===================================================================
--- options.c	(Revision 170960)
+++ options.c	(Arbeitskopie)
@@ -150,6 +150,7 @@  gfc_init_options (unsigned int decoded_options_cou
   gfc_option.flag_align_commons = 1;
   gfc_option.flag_protect_parens = 1;
   gfc_option.flag_realloc_lhs = -1;
+  gfc_option.flag_aggressive_function_elimination = 0;
   
   gfc_option.fpe = 0;
   gfc_option.rtcheck = 0;
@@ -972,6 +973,10 @@  gfc_handle_option (size_t scode, const char *arg,
       gfc_option.flag_align_commons = value;
       break;
 
+    case  OPT_faggressive_function_elimination:
+      gfc_option.flag_aggressive_function_elimination = value;
+      break;
+
     case OPT_fprotect_parens:
       gfc_option.flag_protect_parens = value;
       break;