From patchwork Sun Mar 20 17:10:28 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Koenig X-Patchwork-Id: 87667 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 81D40B6EE8 for ; Mon, 21 Mar 2011 04:10:46 +1100 (EST) Received: (qmail 373 invoked by alias); 20 Mar 2011 17:10:43 -0000 Received: (qmail 359 invoked by uid 22791); 20 Mar 2011 17:10:41 -0000 X-SWARE-Spam-Status: No, hits=1.4 required=5.0 tests=AWL, BAYES_20, KAM_STOCKGEN, RCVD_IN_DNSWL_NONE, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from cc-smtpout1.netcologne.de (HELO cc-smtpout1.netcologne.de) (89.1.8.211) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 20 Mar 2011 17:10:34 +0000 Received: from cc-smtpin2.netcologne.de (cc-smtpin2.netcologne.de [89.1.8.202]) by cc-smtpout1.netcologne.de (Postfix) with ESMTP id AF1A012730; Sun, 20 Mar 2011 18:10:30 +0100 (CET) Received: from [192.168.0.197] (xdsl-78-35-141-147.netcologne.de [78.35.141.147]) by cc-smtpin2.netcologne.de (Postfix) with ESMTPSA id 5987A11EA4; Sun, 20 Mar 2011 18:10:28 +0100 (CET) Message-ID: <4D863504.5000609@netcologne.de> Date: Sun, 20 Mar 2011 18:10:28 +0100 From: Thomas Koenig User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; de; rv:1.9.2.14) Gecko/20110221 SUSE/3.1.8 Thunderbird/3.1.8 MIME-Version: 1.0 To: "fortran@gcc.gnu.org" CC: gcc-patches Subject: Re: [patch, fortran] Function call optimization References: <46835D054D115449BBE84934E2F2000907821C71@ARLEXCHVS02.lst.link.l-3com.com> <4D83E95A.6020901@net-b.de> <4D84FD2C.60607@netcologne.de> <201103192133.40015.mikael.morin@sfr.fr> In-Reply-To: <201103192133.40015.mikael.morin@sfr.fr> Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi Mikael, > - Array elementals: here it's hard to tell which one is faster; save function > calls to a temporary and use the temporary or do multiple function calls every > time (but without temporary). I have removed elementals for now. It would probably be best to create a scalar temporary variable during scalarzation. > - Transpose optimization: here the transpose call is changed into a direct > array access, so your patch will definitely make things worse. Even if > transpose calls are used multiple times as actual argument for example, better > create multiple descriptors than copy the whole lot to a temp. The common > function elimination should be disabled (IMO) in this case. Removed for transpose. > Please use more descriptive names. > There are 3 cfe* functions and only one has a comment explaining what cfe > means. Changed. > Is it expected that you allow allocatables with -faggressive-function- > elimination? No, changed. >> + >> + if (!(*e)->value.function.esym->attr.pure >> +&& !(*e)->value.function.esym->attr.implicit_pure) >> + return 0; >> + } >> + } >> + >> + if ((*e)->value.function.isym) >> + { >> + if ((*e)->value.function.isym->id == GFC_ISYM_CONVERSION) >> + return 0; >> + >> + if (! (*e)->value.function.isym->pure >> +&& !(*e)->value.function.isym->elemental) > Tobias' comment also applies here, even if there is no intrinsic impure > elemental: the following code in intrinsic.c makes the elemental check > redundant. > next_sym->pure = (cl != CLASS_IMPURE); > next_sym->elemental = (cl == CLASS_ELEMENTAL); Changed. >> + for (i=expr_count-1; i>=1; i--) >> + { > Tiny optimization here ;-): > /* Don't bother if the expression has been factored already. */ Applied. > if ((*(expr_array[i]))->expr_type == EXPR_VARIABLE) > continue; >> + newvar = NULL; > >> + > Small comment here, or better, more descriptive name >> +static int >> +cfe_code (gfc_code **c, int *walk_subtrees ATTRIBUTE_UNUSED, >> + void *data ATTRIBUTE_UNUSED) >> +{ Better comment added. I have also factored out the code with gfc_dep_compare_expression into its own function. Here is the new version of the patch. Regression-tested. OK for trunk? Thomas 2010-03-20 Thomas Koenig PR fortran/22572 * gfortran.h (gfc_option_t) : Add flag_aggressive_function_elimination. (gfc_dep_compare_functions): Add prototype. * lang.opt: Add faggressive-function-elimination. * invoke.texi: Document -faggressive-function-elimination. * frontend_passes (expr_array): New static variable. (expr_size): Likewise. (expr_count): Likewise. (current_code): Likewise. (current_ns): Likewise. (gfc_run_passes): Allocate and free space for expressions. (cfe_register_funcs): New function. (create_var): New function. (cfc_expr_0): New function. (cfe_code): New function. (optimize_namespace): Invoke gfc_code_walker with cfe_code and cfe_expr_0. * dependency.c (gfc_dep_compare_functions): New function. (gfc_dep_compare_expr): Use it. * options.c (gfc_init_options): Handle flag_aggressive_function_elimination. (gfc_handle_option): Likewise. 2010-03-20 Thomas Koenig PR fortran/22572 * gfortran.dg/function_optimize_1.f90: New test. * gfortran.dg/function_optimize_2.f90: New test. ! { dg-do compile } ! { dg-options "-O -fdump-tree-original" } program main implicit none real, dimension(2,2) :: a, b, c, d integer :: i real :: x, z character(60) :: line real, external :: ext_func interface elemental function element(x) real, intent(in) :: x real :: elem end function element pure function mypure(x) real, intent(in) :: x integer :: mypure end function mypure elemental impure function elem_impure(x) real, intent(in) :: x real :: elem_impure end function elem_impure end interface data a /2., 3., 5., 7./ data b /11., 13., 17., 23./ write (unit=line, fmt='(4F7.2)') matmul(a,b) + matmul(a,b) z = sin(x) + cos(x) + sin(x) + cos(x) print *,z x = ext_func(a) + 23 + ext_func(a) print *,d,x z = element(x) + element(x) print *,z i = mypure(x) - mypure(x) print *,i z = elem_impure(x) - elem_impure(x) print *,z end program main ! { dg-final { scan-tree-dump-times "matmul_r4" 1 "original" } } ! { dg-final { scan-tree-dump-times "__builtin_sinf" 1 "original" } } ! { dg-final { scan-tree-dump-times "__builtin_cosf" 1 "original" } } ! { dg-final { scan-tree-dump-times "ext_func" 2 "original" } } ! { dg-final { scan-tree-dump-times "element" 1 "original" } } ! { dg-final { scan-tree-dump-times "mypure" 1 "original" } } ! { dg-final { scan-tree-dump-times "elem_impure" 2 "original" } } ! { dg-final { cleanup-tree-dump "original" } } ! { dg-do compile } ! { dg-options "-O -faggressive-function-elimination -fdump-tree-original" } program main implicit none real, dimension(2,2) :: a, b, c, d real :: x, z integer :: i character(60) :: line real, external :: ext_func interface elemental function element(x) real, intent(in) :: x real :: elem end function element pure function mypure(x) real, intent(in) :: x integer :: mypure end function mypure elemental impure function elem_impure(x) real, intent(in) :: x real :: elem_impure end function elem_impure end interface data a /2., 3., 5., 7./ data b /11., 13., 17., 23./ write (unit=line, fmt='(4F7.2)') matmul(a,b) + matmul(a,b) x = 1.2 z = sin(x) + cos(x) + sin(x) + cos(x) print *,z x = ext_func(a) + 23 + ext_func(a) print *,d,x z = element(x) + element(x) print *,z i = mypure(x) - mypure(x) print *,i z = elem_impure(x) - elem_impure(x) print *,z end program main ! { dg-final { scan-tree-dump-times "matmul_r4" 1 "original" } } ! { dg-final { scan-tree-dump-times "__builtin_sinf" 1 "original" } } ! { dg-final { scan-tree-dump-times "__builtin_cosf" 1 "original" } } ! { dg-final { scan-tree-dump-times "ext_func" 1 "original" } } ! { dg-final { scan-tree-dump-times "element" 1 "original" } } ! { dg-final { scan-tree-dump-times "mypure" 1 "original" } } ! { dg-final { scan-tree-dump-times "elem_impure" 1 "original" } } ! { dg-final { cleanup-tree-dump "original" } } Index: gfortran.h =================================================================== --- gfortran.h (Revision 170960) +++ gfortran.h (Arbeitskopie) @@ -2232,6 +2232,7 @@ typedef struct int flag_whole_file; int flag_protect_parens; int flag_realloc_lhs; + int flag_aggressive_function_elimination; int fpe; int rtcheck; @@ -2865,6 +2866,7 @@ void gfc_global_used (gfc_gsymbol *, locus *); gfc_namespace* gfc_build_block_ns (gfc_namespace *); /* dependency.c */ +int gfc_dep_compare_functions (gfc_expr *, gfc_expr *, bool); int gfc_dep_compare_expr (gfc_expr *, gfc_expr *); /* check.c */ Index: lang.opt =================================================================== --- lang.opt (Revision 170960) +++ lang.opt (Arbeitskopie) @@ -278,6 +278,10 @@ d Fortran Joined ; Documented in common.opt +faggressive-function-elimination +Fortran +Eliminate multiple function invokations also for impure functions + falign-commons Fortran Enable alignment of COMMON blocks Index: invoke.texi =================================================================== --- invoke.texi (Revision 170960) +++ invoke.texi (Arbeitskopie) @@ -1468,6 +1468,18 @@ need to be in effect. An allocatable left-hand side of an intrinsic assignment is automatically (re)allocated if it is either unallocated or has a different shape. The option is enabled by default except when @option{-std=f95} is given. + +@item -faggressive-function-elimination +@opindex @code{faggressive-function-elimination} +@cindex Elimination of functions with identical argument lists +Functions with identical argument lists are eliminated within +statements, regardless of whether these functions are marked +@code{PURE} or not. For example, in +@smallexample + a = f(b,c) + f(b,c) +@end smallexample +there will only be a single call to @code{f}. + @end table @xref{Code Gen Options,,Options for Code Generation Conventions, @@ -1475,7 +1487,6 @@ gcc,Using the GNU Compiler Collection (GCC)}, for offered by the GBE shared by @command{gfortran}, @command{gcc}, and other GNU compilers. - @c man end @node Environment Variables Index: frontend-passes.c =================================================================== --- frontend-passes.c (Revision 170960) +++ frontend-passes.c (Arbeitskopie) @@ -40,6 +40,21 @@ static bool optimize_trim (gfc_expr *); static int count_arglist; +/* Pointer to an array of gfc_expr ** we operate on, plus its size + and counter. */ + +static gfc_expr ***expr_array; +static int expr_size, expr_count; + +/* Pointer to the gfc_code we currently work on - to be able to insert + a statement before. */ + +static gfc_code **current_code; + +/* The namespace we are currently dealing with. */ + +gfc_namespace *current_ns; + /* Entry point - run all passes for a namespace. So far, only an optimization pass is run. */ @@ -48,9 +63,16 @@ gfc_run_passes (gfc_namespace *ns) { if (optimize) { + expr_size = 20; + expr_array = XNEWVEC(gfc_expr **, expr_size); + optimize_namespace (ns); if (gfc_option.dump_fortran_optimized) gfc_dump_parse_tree (ns, stdout); + + /* FIXME: The following should be XDELETEVEC(expr_array); + but we cannot do that because it depends on free. */ + gfc_free (expr_array); } } @@ -106,11 +128,214 @@ optimize_expr (gfc_expr **e, int *walk_subtrees AT return 0; } + +/* Callback function for common function elimination, called from cfe_expr_0. + Put all eligible function expressions into expr_array. We can't do + allocatable functions. */ + +static int +cfe_register_funcs (gfc_expr **e, int *walk_subtrees ATTRIBUTE_UNUSED, + void *data ATTRIBUTE_UNUSED) +{ + if ((*e)->expr_type != EXPR_FUNCTION) + return 0; + + /* We don't do character functions (yet). */ + if ((*e)->ts.type == BT_CHARACTER) + return 0; + + /* If we don't know the shape at compile time, we do not create a temporary + variable to hold the intermediate result. FIXME: Change this later when + allocation on assignment works for intrinsics. */ + + if ((*e)->rank > 0 && (*e)->shape == NULL) + return 0; + + /* Skip the test for pure functions if -faggressive-function-elimination + is specified. */ + if ((*e)->value.function.esym) + { + if ((*e)->value.function.esym->attr.allocatable) + return 0; + + /* Don't create an array temporary for elemental functions. */ + if ((*e)->value.function.esym->attr.elemental && (*e)->rank > 0) + return 0; + + /* Only eliminate potentially impure functions if the + user specifically requested it. */ + if (!gfc_option.flag_aggressive_function_elimination + && !(*e)->value.function.esym->attr.pure + && !(*e)->value.function.esym->attr.implicit_pure) + return 0; + } + + if ((*e)->value.function.isym) + { + /* Conversions are handled on the fly by the middle end, + transpose during trans-* stages. */ + if ((*e)->value.function.isym->id == GFC_ISYM_CONVERSION + || (*e)->value.function.isym->id == GFC_ISYM_TRANSPOSE) + return 0; + + /* Don't create an array temporary for elemental functions, + as this would be wasteful of memory. + FIXME: Create a scalar temporary during scalarization. */ + if ((*e)->value.function.isym->elemental && (*e)->rank > 0) + return 0; + + if (!(*e)->value.function.isym->pure) + return 0; + } + + if (expr_count >= expr_size) + { + expr_size += expr_size; + expr_array = XRESIZEVEC(gfc_expr **, expr_array, expr_size); + } + expr_array[expr_count] = e; + expr_count ++; + return 0; +} + +/* Returns a new expression (a variable) to be used in place of the old one, + with an an assignment statement before the current statement to set + the value of the variable. */ + +static gfc_expr* +create_var (gfc_expr * e) +{ + char name[GFC_MAX_SYMBOL_LEN +1]; + static int num = 1; + gfc_symtree *symtree; + gfc_symbol *symbol; + gfc_expr *result; + gfc_code *n; + int i; + + sprintf(name, "__var_%d",num++); + if (gfc_get_sym_tree (name, current_ns, &symtree, false) != 0) + gcc_unreachable (); + + symbol = symtree->n.sym; + symbol->ts = e->ts; + symbol->as = gfc_get_array_spec (); + symbol->as->rank = e->rank; + symbol->as->type = AS_EXPLICIT; + for (i=0; irank; i++) + { + gfc_expr *p, *q; + + p = gfc_get_constant_expr (BT_INTEGER, gfc_default_integer_kind, + &(e->where)); + mpz_set_si (p->value.integer, 1); + symbol->as->lower[i] = p; + + q = gfc_get_constant_expr (BT_INTEGER, gfc_index_integer_kind, + &(e->where)); + mpz_set (q->value.integer, e->shape[i]); + symbol->as->upper[i] = q; + } + + symbol->attr.flavor = FL_VARIABLE; + symbol->attr.referenced = 1; + symbol->attr.dimension = e->rank > 0; + gfc_commit_symbol (symbol); + + result = gfc_get_expr (); + result->expr_type = EXPR_VARIABLE; + result->ts = e->ts; + result->rank = e->rank; + result->shape = gfc_copy_shape (e->shape, e->rank); + result->symtree = symtree; + result->where = e->where; + if (e->rank > 0) + { + result->ref = gfc_get_ref (); + result->ref->type = REF_ARRAY; + result->ref->u.ar.type = AR_FULL; + result->ref->u.ar.where = e->where; + result->ref->u.ar.as = symbol->as; + } + + /* Generate the new assignment. */ + n = XCNEW (gfc_code); + n->op = EXEC_ASSIGN; + n->loc = (*current_code)->loc; + n->next = *current_code; + n->expr1 = gfc_copy_expr (result); + n->expr2 = e; + *current_code = n; + + return result; +} + +/* Callback function for the code walker for doing common function + elimination. This builds up the list of functions in the expression + and goes through them to detect duplicates, which it then replaces + by variables. */ + +static int +cfe_expr_0 (gfc_expr **e, int *walk_subtrees, + void *data ATTRIBUTE_UNUSED) +{ + int i,j; + gfc_expr *newvar; + + expr_count = 0; + + gfc_expr_walker (e, cfe_register_funcs, NULL); + + /* Walk backwards through all the functions to make sure we + catch the leaf functions first. */ + for (i=expr_count-1; i>=1; i--) + { + /* Skip if the function has been replaced by a variable already. */ + if ((*(expr_array[i]))->expr_type == EXPR_VARIABLE) + continue; + + newvar = NULL; + for (j=i-1; j>=0; j--) + { + if (gfc_dep_compare_functions(*(expr_array[i]), + *(expr_array[j]), true) == 0) + { + if (newvar == NULL) + newvar = create_var (*(expr_array[i])); + gfc_free (*(expr_array[j])); + *(expr_array[j]) = gfc_copy_expr (newvar); + } + } + if (newvar) + *(expr_array[i]) = newvar; + } + + /* We did all the necessary walking in this function. */ + *walk_subtrees = 0; + return 0; +} + +/* Callback function for common function elimination, called from + gfc_code_walker. This keeps track of the current code, in order + to insert statements as needed. */ + +static int +cfe_code (gfc_code **c, int *walk_subtrees ATTRIBUTE_UNUSED, + void *data ATTRIBUTE_UNUSED) +{ + current_code = c; + return 0; +} + /* Optimize a namespace, including all contained namespaces. */ static void optimize_namespace (gfc_namespace *ns) { + + current_ns = ns; + + gfc_code_walker (&ns->code, cfe_code, cfe_expr_0, NULL); gfc_code_walker (&ns->code, optimize_code, optimize_expr, NULL); for (ns = ns->contained; ns; ns = ns->sibling) Index: dependency.c =================================================================== --- dependency.c (Revision 170960) +++ dependency.c (Arbeitskopie) @@ -177,16 +177,60 @@ gfc_are_identical_variables (gfc_expr *e1, gfc_exp return true; } +/* Compare two functions for equality. Only the first expression is known + to be a function. Returns 0 if e1==e2, -2 otherwise. If impure_ok is + false, only return 0 for pure functions. */ + +int +gfc_dep_compare_functions (gfc_expr *e1, gfc_expr *e2, bool impure_ok) +{ + + gfc_actual_arglist *args1; + gfc_actual_arglist *args2; + + if (e2->expr_type != EXPR_FUNCTION) + return -2; + + if ((e1->value.function.esym && e2->value.function.esym + && e1->value.function.esym == e2->value.function.esym + && (e1->value.function.esym->result->attr.pure || impure_ok)) + || (e1->value.function.isym && e2->value.function.isym + && e1->value.function.isym == e2->value.function.isym + && (e1->value.function.isym->pure || impure_ok))) + { + args1 = e1->value.function.actual; + args2 = e2->value.function.actual; + + /* Compare the argument lists for equality. */ + while (args1 && args2) + { + /* Bitwise xor, since C has no non-bitwise xor operator. */ + if ((args1->expr == NULL) ^ (args2->expr == NULL)) + return -2; + + if (args1->expr != NULL && args2->expr != NULL + && gfc_dep_compare_expr (args1->expr, args2->expr) != 0) + return -2; + + args1 = args1->next; + args2 = args2->next; + } + return (args1 || args2) ? -2 : 0; + } + else + return -2; +} + /* Compare two values. Returns 0 if e1 == e2, -1 if e1 < e2, +1 if e1 > e2, and -2 if the relationship could not be determined. */ int gfc_dep_compare_expr (gfc_expr *e1, gfc_expr *e2) { + int i; + gfc_expr *n1, *n2; gfc_actual_arglist *args1; gfc_actual_arglist *args2; - int i; - gfc_expr *n1, *n2; n1 = NULL; n2 = NULL; @@ -399,36 +443,7 @@ gfc_dep_compare_expr (gfc_expr *e1, gfc_expr *e2) return -2; case EXPR_FUNCTION: - - /* PURE functions can be compared for argument equality. */ - if ((e1->value.function.esym && e2->value.function.esym - && e1->value.function.esym == e2->value.function.esym - && e1->value.function.esym->result->attr.pure) - || (e1->value.function.isym && e2->value.function.isym - && e1->value.function.isym == e2->value.function.isym - && e1->value.function.isym->pure)) - { - args1 = e1->value.function.actual; - args2 = e2->value.function.actual; - - /* Compare the argument lists for equality. */ - while (args1 && args2) - { - /* Bitwise xor, since C has no non-bitwise xor operator. */ - if ((args1->expr == NULL) ^ (args2->expr == NULL)) - return -2; - - if (args1->expr != NULL && args2->expr != NULL - && gfc_dep_compare_expr (args1->expr, args2->expr) != 0) - return -2; - - args1 = args1->next; - args2 = args2->next; - } - return (args1 || args2) ? -2 : 0; - } - else - return -2; + return gfc_dep_compare_functions (e1, e2, false); break; default: Index: options.c =================================================================== --- options.c (Revision 170960) +++ options.c (Arbeitskopie) @@ -150,6 +150,7 @@ gfc_init_options (unsigned int decoded_options_cou gfc_option.flag_align_commons = 1; gfc_option.flag_protect_parens = 1; gfc_option.flag_realloc_lhs = -1; + gfc_option.flag_aggressive_function_elimination = 0; gfc_option.fpe = 0; gfc_option.rtcheck = 0; @@ -972,6 +973,10 @@ gfc_handle_option (size_t scode, const char *arg, gfc_option.flag_align_commons = value; break; + case OPT_faggressive_function_elimination: + gfc_option.flag_aggressive_function_elimination = value; + break; + case OPT_fprotect_parens: gfc_option.flag_protect_parens = value; break;