From patchwork Thu May 6 10:17:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tobias Burnus X-Patchwork-Id: 1474897 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FbTz138dZz9sT6 for ; Thu, 6 May 2021 20:17:17 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 76ACA3839C41; Thu, 6 May 2021 10:17:15 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id BD7A5385DC3B for ; Thu, 6 May 2021 10:17:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org BD7A5385DC3B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Tobias_Burnus@mentor.com IronPort-SDR: 7X4P1epK58DQZfmogj/YzcJmNgBmttwFKjQsECx3wQp451wt8V4kQiW8qQ9tDOB02k/wtOe+wO nswlYuDJS3aLPD8Lvf1rUWHmMvOTGT1owZusY1ncXpYb08E0TRsRH/AcRyKgD4PerSlFXkhpE+ x6mHj2Ld66usY3YvYV1pgcttCU9E37n1gQM0ee8iAmWpJAqzFafC+EBA3WntW33o9umY7KzoPN 5fggYXEIJz5irgppVhLoKXfP+phdxd++JprelvHeHuzNi0hhRcTNbs2OerJvE+IoF6lHCA6qQl tQA= X-IronPort-AV: E=Sophos;i="5.82,277,1613462400"; d="diff'?scan'208";a="61047108" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 06 May 2021 02:17:10 -0800 IronPort-SDR: quOeLopJ2hX7NzDlX1EsE/27y/5WzrcuLIqTHV9xHGKmW8rXRKN72YvnW/pegonulK0W8t5+rJ IjatpBSbpqzBoSoItyU0+HyDIyaGL9m6vncgRKoEZPBYKFg75IiQYN/eP5vXcTEbZ5PaOFfcFY +b1lJRuL7C0evQVIo0bDIyc+eYdhkg94VHoobAJyY3G4WQ+BVEmZN9NS1gqqAan42qVP/GuV0n AqmS+oWnSvlia5UmMFx9s6U1hNr8XnchEIzHbupKGDsFifBtItQeMOQe356Fu8Av22il/GG/Rz w8s= To: gcc-patches , Jakub Jelinek , Tom de Vries From: Tobias Burnus Subject: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and || Message-ID: Date: Thu, 6 May 2021 12:17:03 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 Content-Language: en-US X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-02.mgc.mentorg.com (139.181.222.2) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" The complex/float && and || reduction patch missed a target testcase (→ attached) which revealed that also a SIMT needed some special handling, but just runs on non-SIMT systems. The omp-low.c patch is rather simple - and I think it semantically okay. [Note to the change: It looks more completed than it is: - moving 'zero' decl out of the 'if' block - moving that if block before the 'if (sctx.is_simt)' block - 'if (is_fp_and_or)' to the 'if (sctx.is_simt)' block.] I think at least the testcase should be added, possibly also the omp-low.c change – albeit I get a later ICE (see below), which needs either an XFAIL or a fix. * * * ICE with NVPTX: When the device lto1 starts, it fails when expanding the intrinsic XCHG_BFLY function. We have 'ivar' = complex float, which at rtx level is converted to a concatenation (via gen_reg_rtx()). In omp-low.c: IFN_GOMP_SIMT_XCHG_BFLY (TREE_TYPE(ivar), ivar, simt_lane) Later in expand_GOMP_SIMT_XCHG_BFLY, we call: 371 expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops); which fails by running into unreachable of 'expand_insn' 7844 if (!maybe_expand_insn (icode, nops, ops)) 7845 gcc_unreachable (); icode = CODE_FOR_omp_simt_xchg_bfly nops = 3 (gdb) p ops[0]->type $3 = EXPAND_OUTPUT (gdb) p debug(ops[0]->value) (concat:SC (reg:SF 85) (reg:SF 86)) (gdb) p ops[1]->type $5 = EXPAND_INPUT (gdb) p debug(ops[1]->value) (concat:SC (reg:SF 26 [ orfc ]) (reg:SF 27 [ orfc+4 ])) (gdb) p ops[2]->type $7 = EXPAND_INPUT (gdb) p debug(ops[2]->value) (reg:SI 52 [ _74 ]) The mentioned concat happens in How to fix this? Or does this fall into the same category as PR100321 (fixed by: r12-395, Disable SIMT for user-defined reduction) with its follow-up PR 100408? Small testcase is: _Complex float rcf[1024]; int reduction_or () { _Complex float orfc = 0; for (int i=0; i < 1024; ++i) orfc = orfc || rcf[i]; return __real__ orfc; } Tobias ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf OpenMP: Fix SIMT for complex/float reduction with && and || gcc/ChangeLog: * omp-low.c (lower_rec_input_clauses): Also handle SIMT part for complex/float recution with && and ||. libgomp/ChangeLog: * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing complex/floating-point || + && recduction with 'omp target'. gcc/omp-low.c | 30 ++-- .../testsuite/libgomp.c-c++-common/reduction-5.c | 192 +++++++++++++++++++++ 2 files changed, 210 insertions(+), 12 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 26ceaf7..46220c5 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -6432,28 +6432,34 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist, gimplify_assign (unshare_expr (ivar), x, &llist[0]); - if (sctx.is_simt) - { - if (!simt_lane) - simt_lane = create_tmp_var (unsigned_type_node); - x = build_call_expr_internal_loc - (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY, - TREE_TYPE (ivar), 2, ivar, simt_lane); - x = build2 (code, TREE_TYPE (ivar), ivar, x); - gimplify_assign (ivar, x, &llist[2]); - } tree ivar2 = ivar; tree ref2 = ref; + tree zero = NULL_TREE; if (is_fp_and_or) { - tree zero = build_zero_cst (TREE_TYPE (ivar)); + zero = build_zero_cst (TREE_TYPE (ivar)); ivar2 = fold_build2_loc (clause_loc, NE_EXPR, integer_type_node, ivar, zero); ref2 = fold_build2_loc (clause_loc, NE_EXPR, integer_type_node, ref, zero); } - x = build2 (code, TREE_TYPE (ref), ref2, ivar2); + if (sctx.is_simt) + { + if (!simt_lane) + simt_lane = create_tmp_var (unsigned_type_node); + x = build_call_expr_internal_loc + (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY, + TREE_TYPE (ivar), 2, ivar, simt_lane); + if (is_fp_and_or) + x = fold_build2_loc (clause_loc, NE_EXPR, + integer_type_node, x, zero); + x = build2 (code, TREE_TYPE (ivar2), ivar2, x); + if (is_fp_and_or) + x = fold_convert (TREE_TYPE (ivar), x); + gimplify_assign (ivar, x, &llist[2]); + } + x = build2 (code, TREE_TYPE (ref2), ref2, ivar2); if (is_fp_and_or) x = fold_convert (TREE_TYPE (ref), x); ref = build_outer_var_ref (var, ctx); diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c new file mode 100644 index 0000000..346c882 --- /dev/null +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c @@ -0,0 +1,192 @@ +/* C / C++'s logical AND and OR operators take any scalar argument + which compares (un)equal to 0 - the result 1 or 0 and of type int. + + In this testcase, the int result is again converted to a floating-poing + or complex type. + + While having a floating-point/complex array element with || and && can make + sense, having a non-integer/non-bool reduction variable is odd but valid. + + Test: FP reduction variable + FP array. */ + +#define N 1024 +_Complex float rcf[N]; +_Complex double rcd[N]; +float rf[N]; +double rd[N]; + +int +reduction_or () +{ + float orf = 0; + double ord = 0; + _Complex float orfc = 0; + _Complex double ordc = 0; + + #pragma omp target parallel reduction(||: orf) + for (int i=0; i < N; ++i) + orf = orf || rf[i]; + + #pragma omp target parallel for reduction(||: ord) + for (int i=0; i < N; ++i) + ord = ord || rcd[i]; + + #pragma omp target parallel for simd reduction(||: orfc) + for (int i=0; i < N; ++i) + orfc = orfc || rcf[i]; + + #pragma omp target parallel loop reduction(||: ordc) + for (int i=0; i < N; ++i) + ordc = ordc || rcd[i]; + + return orf + ord + __real__ orfc + __real__ ordc; +} + +int +reduction_or_teams () +{ + float orf = 0; + double ord = 0; + _Complex float orfc = 0; + _Complex double ordc = 0; + + #pragma omp target teams distribute parallel for reduction(||: orf) + for (int i=0; i < N; ++i) + orf = orf || rf[i]; + + #pragma omp target teams distribute parallel for simd reduction(||: ord) + for (int i=0; i < N; ++i) + ord = ord || rcd[i]; + + #pragma omp target teams distribute parallel for reduction(||: orfc) + for (int i=0; i < N; ++i) + orfc = orfc || rcf[i]; + + #pragma omp target teams distribute parallel for simd reduction(||: ordc) + for (int i=0; i < N; ++i) + ordc = ordc || rcd[i]; + + return orf + ord + __real__ orfc + __real__ ordc; +} + +int +reduction_and () +{ + float andf = 1; + double andd = 1; + _Complex float andfc = 1; + _Complex double anddc = 1; + + #pragma omp target parallel reduction(&&: andf) + for (int i=0; i < N; ++i) + andf = andf && rf[i]; + + #pragma omp target parallel for reduction(&&: andd) + for (int i=0; i < N; ++i) + andd = andd && rcd[i]; + + #pragma omp target parallel for simd reduction(&&: andfc) + for (int i=0; i < N; ++i) + andfc = andfc && rcf[i]; + + #pragma omp target parallel loop reduction(&&: anddc) + for (int i=0; i < N; ++i) + anddc = anddc && rcd[i]; + + return andf + andd + __real__ andfc + __real__ anddc; +} + +int +reduction_and_teams () +{ + float andf = 1; + double andd = 1; + _Complex float andfc = 1; + _Complex double anddc = 1; + + #pragma omp target teams distribute parallel for reduction(&&: andf) + for (int i=0; i < N; ++i) + andf = andf && rf[i]; + + #pragma omp target teams distribute parallel for simd reduction(&&: andd) + for (int i=0; i < N; ++i) + andd = andd && rcd[i]; + + #pragma omp target teams distribute parallel for reduction(&&: andfc) + for (int i=0; i < N; ++i) + andfc = andfc && rcf[i]; + + #pragma omp target teams distribute parallel for simd reduction(&&: anddc) + for (int i=0; i < N; ++i) + anddc = anddc && rcd[i]; + + return andf + andd + __real__ andfc + __real__ anddc; +} + +int +main () +{ + for (int i = 0; i < N; ++i) + { + rf[i] = 0; + rd[i] = 0; + rcf[i] = 0; + rcd[i] = 0; + } + + if (reduction_or () != 0) + __builtin_abort (); + if (reduction_or_teams () != 0) + __builtin_abort (); + if (reduction_and () != 0) + __builtin_abort (); + if (reduction_and_teams () != 0) + __builtin_abort (); + + rf[10] = 1.0; + rd[15] = 1.0; + rcf[10] = 1.0; + rcd[15] = 1.0i; + + if (reduction_or () != 4) + __builtin_abort (); + if (reduction_or_teams () != 4) + __builtin_abort (); + if (reduction_and () != 0) + __builtin_abort (); + if (reduction_and_teams () != 0) + __builtin_abort (); + + for (int i = 0; i < N; ++i) + { + rf[i] = 1; + rd[i] = 1; + rcf[i] = 1; + rcd[i] = 1; + } + + if (reduction_or () != 4) + __builtin_abort (); + if (reduction_or_teams () != 4) + __builtin_abort (); + if (reduction_and () != 4) + __builtin_abort (); + if (reduction_and_teams () != 4) + __builtin_abort (); + + rf[10] = 0.0; + rd[15] = 0.0; + rcf[10] = 0.0; + rcd[15] = 0.0; + + if (reduction_or () != 4) + __builtin_abort (); + if (reduction_or_teams () != 4) + __builtin_abort (); + if (reduction_and () != 0) + __builtin_abort (); + if (reduction_and_teams () != 0) + __builtin_abort (); + + return 0; +}