From patchwork Wed Jul 18 11:17:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrill Tkachov X-Patchwork-Id: 945672 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-481794-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=foss.arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="d8QSf4e0"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41VvlP6Syhz9s0n for ; Wed, 18 Jul 2018 21:18:08 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=Ev9283GlCNgg3X30e 7mA4k+iUPoI6IoHj+kgQJwklcGf6Lfspe0R4Dgci0qgR+IGL230NtvF+9ZWFVsKY DpigQIa5ifXxxGVpazw+6C2iq/lX7SVZ5A/Y9jq43dsLEgS3oc8XoNQMm5pV2KHe TM+zD75LQT2NCzKtRKs9ZH9//s= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=600uwoQHr2VMDL49Oj1Q065 Q36A=; b=d8QSf4e0n7IrQJY8alEzMWNHEkTIlrdsokK+o1V1tQXILNqwoJNo3QY 3dT7b4zdvD1FGYFSJeAOlv46JQsdqsrqo7BM4kUI3PWIpJuWAuvrhXnjz46NaaW4 65lyqWsUlYr8ip0sEdSr4sl2L8W6j971siMIiNm8D+rhvq3Lz+ZI= Received: (qmail 78579 invoked by alias); 18 Jul 2018 11:17:59 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 78548 invoked by uid 89); 18 Jul 2018 11:17:58 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH autolearn=ham version=3.3.2 spammy=benefits X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 18 Jul 2018 11:17:56 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 06E0180D; Wed, 18 Jul 2018 04:17:55 -0700 (PDT) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 186833F318; Wed, 18 Jul 2018 04:17:53 -0700 (PDT) Message-ID: <5B4F21E0.3060307@foss.arm.com> Date: Wed, 18 Jul 2018 12:17:52 +0100 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Janne Blomqvist , Thomas Koenig CC: Richard Biener , "fortran@gcc.gnu.org" , GCC Patches Subject: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics References: <5B4DE283.9060100@foss.arm.com> <5B4DF325.2050609@foss.arm.com> <9d0cf3dc-8c5c-bbb2-960c-386b2c936a50@netcologne.de> In-Reply-To: Hi all, Thank you for the feedback so far. This version of the patch doesn't try to emit fmin/fmax function calls but instead emits MIN/MAX_EXPR sequences unconditionally. I think a source of confusion in the original proposal (for me at least) was that on aarch64 (that I primarily work on) we implement the fmin/fmax optabs and therefore these calls are expanded to a single instruction. But on x86_64 these optabs are not implemented and therefore expand to actual library calls. Therefore at -O3 (no -ffast-math) I saw a gain on aarch64. But I measured today on x86_64 and saw a regression. Thomas and Janne suggested that the Fortran standard does not impose a requirement on NaN handling for the min/max intrinsics, which would make emitting MIN/MAX_EXPR sequences unconditionally a valid approach. However, the gfortran.dg/nan_1.f90 test checks that handling of NaN values in these intrinsics follows the IEEE semantics (min (nan, 2.0) == 2.0, for example). This is not required by the standard, but is the existing gfortran behaviour. If we end up always emitting MIN/MAX_EXPR sequences, like this version of the patch does, then that test fails on some configurations of x86_64 and not others (for me it FAILs by default, but passes with -march=native on my machine) and passes on AArch64. This is expected since MIN/MAX_EXPR doesn't enforce IEEE behaviour on its arguments. However, by always emitting MIN/MAX_EXPR the gfc_conv_intrinsic_minmax function is simplified and, perhaps more importantly, generates faster code in the -O3 case. With this patch I see performance improvement on 521.wrf on both AArch64 (3.7%) and x86_64 (5.4%). Thomas, Janne, would this relaxation of NaN handling be acceptable given the benefits mentioned above? If so, what would be the recommended adjustment to the nan_1.f90 test? Thanks, Kyrill 2018-07-18 Kyrylo Tkachov * trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR sequence to calculate the min/max. 2018-07-18 Kyrylo Tkachov * gfortran.dg/max_float.f90: New test. * gfortran.dg/min_float.f90: Likewise. * gfortran.dg/minmax_integer.f90: Likewise. diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..e5a1f1ddabeedc7b9f473db11e70f29548fc69ac 100644 --- a/gcc/fortran/trans-intrinsic.c +++ b/gcc/fortran/trans-intrinsic.c @@ -3874,14 +3874,11 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr) minmax (a1, a2, a3, ...) { mvar = a1; - if (a2 .op. mvar || isnan (mvar)) - mvar = a2; - if (a3 .op. mvar || isnan (mvar)) - mvar = a3; + mvar = MIN/MAX_EXPR (mvar, a2); + mvar = MIN/MAX_EXPR (mvar, a3); ... - return mvar - } - */ + return mvar; + } */ /* TODO: Mismatching types can occur when specific names are used. These should be handled during resolution. */ @@ -3891,7 +3888,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op) tree tmp; tree mvar; tree val; - tree thencase; tree *args; tree type; gfc_actual_arglist *argexpr; @@ -3912,55 +3908,37 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op) mvar = gfc_create_var (type, "M"); gfc_add_modify (&se->pre, mvar, args[0]); - for (i = 1, argexpr = argexpr->next; i < nargs; i++) - { - tree cond, isnan; + for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next) + { + tree cond = NULL_TREE; val = args[i]; /* Handle absent optional arguments by ignoring the comparison. */ if (argexpr->expr->expr_type == EXPR_VARIABLE && argexpr->expr->symtree->n.sym->attr.optional && TREE_CODE (val) == INDIRECT_REF) - cond = fold_build2_loc (input_location, + { + cond = fold_build2_loc (input_location, NE_EXPR, logical_type_node, TREE_OPERAND (val, 0), build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0)); - else - { - cond = NULL_TREE; - + } + else if (!VAR_P (val) && !TREE_CONSTANT (val)) /* Only evaluate the argument once. */ - if (!VAR_P (val) && !TREE_CONSTANT (val)) - val = gfc_evaluate_now (val, &se->pre); - } - - thencase = build2_v (MODIFY_EXPR, mvar, convert (type, val)); + val = gfc_evaluate_now (val, &se->pre); - tmp = fold_build2_loc (input_location, op, logical_type_node, - convert (type, val), mvar); + tree calc; - /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to - __builtin_isnan might be made dependent on that module being loaded, - to help performance of programs that don't rely on IEEE semantics. */ - if (FLOAT_TYPE_P (TREE_TYPE (mvar))) - { - isnan = build_call_expr_loc (input_location, - builtin_decl_explicit (BUILT_IN_ISNAN), - 1, mvar); - tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR, - logical_type_node, tmp, - fold_convert (logical_type_node, isnan)); - } - tmp = build3_v (COND_EXPR, tmp, thencase, - build_empty_stmt (input_location)); + tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR; + calc = fold_build2_loc (input_location, code, type, + convert (type, val), mvar); + tmp = build2_v (MODIFY_EXPR, mvar, calc); if (cond != NULL_TREE) tmp = build3_v (COND_EXPR, cond, tmp, build_empty_stmt (input_location)); - gfc_add_expr_to_block (&se->pre, tmp); - argexpr = argexpr->next; } se->expr = mvar; } diff --git a/gcc/testsuite/gfortran.dg/max_float.f90 b/gcc/testsuite/gfortran.dg/max_float.f90 new file mode 100644 index 0000000000000000000000000000000000000000..a3a5d4f5df29cfa9c4e3abc2c18e7d3de1169fc3 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/max_float.f90 @@ -0,0 +1,19 @@ +! { dg-do compile } +! { dg-options "-O2 -fdump-tree-optimized" } + +subroutine fool (a, b, c, d, e, f, g, h) + real (kind=16) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foo (a, b, c, d, e, f, g, h) + real (kind=8) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foof (a, b, c, d, e, f, g, h) + real (kind=4) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +! { dg-final { scan-tree-dump-times "MAX_EXPR " 21 "optimized" } } diff --git a/gcc/testsuite/gfortran.dg/min_float.f90 b/gcc/testsuite/gfortran.dg/min_float.f90 new file mode 100644 index 0000000000000000000000000000000000000000..41bd6b3c4062f364791841f7097f9a5c00782ec8 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/min_float.f90 @@ -0,0 +1,19 @@ +! { dg-do compile } +! { dg-options "-O2 -fdump-tree-optimized" } + +subroutine fool (a, b, c, d, e, f, g, h) + real (kind=16) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foo (a, b, c, d, e, f, g, h) + real (kind=8) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foof (a, b, c, d, e, f, g, h) + real (kind=4) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +! { dg-final { scan-tree-dump-times "MIN_EXPR " 21 "optimized" } } diff --git a/gcc/testsuite/gfortran.dg/minmax_integer.f90 b/gcc/testsuite/gfortran.dg/minmax_integer.f90 new file mode 100644 index 0000000000000000000000000000000000000000..5b6be38c7055ce4e8620cf75ec7d8a182436b24f --- /dev/null +++ b/gcc/testsuite/gfortran.dg/minmax_integer.f90 @@ -0,0 +1,15 @@ +! { dg-do compile } +! { dg-options "-O2 -fdump-tree-optimized" } + +subroutine foo (a, b, c, d, e, f, g, h) + integer (kind=4) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foof (a, b, c, d, e, f, g, h) + integer (kind=4) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +! { dg-final { scan-tree-dump-times "MIN_EXPR" 7 "optimized" } } +! { dg-final { scan-tree-dump-times "MAX_EXPR" 7 "optimized" } }