[{"id":3188720,"web_url":"http://patchwork.ozlabs.org/comment/3188720/","msgid":"<nycvar.YFH.7.77.849.2309271145330.5561@jbgna.fhfr.qr>","list_archive_url":null,"date":"2023-09-27T11:46:32","subject":"Re: [PATCH]middle-end Fold vec_cond into conditional ternary or\n binary operation when sharing operand [PR109154]","submitter":{"id":4338,"url":"http://patchwork.ozlabs.org/api/people/4338/","name":"Richard Biener","email":"rguenther@suse.de"},"content":"On Wed, 27 Sep 2023, Tamar Christina wrote:\n\n> Hi All,\n> \n> When we have a vector conditional on a masked target which is doing a selection\n> on the result of a conditional operation where one of the operands of the\n> conditional operation is the other operand of the select, then we can fold the\n> vector conditional into the operation.\n> \n> Concretely this transforms\n> \n>   c = mask1 ? (masked_op mask2 a b) : b\n> \n> into\n> \n>   c = masked_op (mask1 & mask2) a b\n> \n> The mask is then propagated upwards by the compiler.  In the SVE case we don't\n> end up needing a mask AND here since `mask2` will end up in the instruction\n> creating `mask` which gives us a natural &.\n> \n> Such transformations are more common now in GCC 13+ as PRE has not started\n> unsharing of common code in case it can make one branch fully independent.\n> \n> e.g. in this case `b` becomes a loop invariant value after PRE.\n> \n> This transformation removes the extra select for masked architectures but\n> doesn't fix the general case.\n> \n> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.\n> \n> Ok for master?\n> \n> Thanks,\n> Tamar\n> \n> gcc/ChangeLog:\n> \n> \tPR tree-optimization/109154\n> \t* match.pd: Add new cond_op rule.\n> \n> gcc/testsuite/ChangeLog:\n> \n> \tPR tree-optimization/109154\n> \t* gcc.target/aarch64/sve/pre_cond_share_1.c: New test.\n> \n> --- inline copy of patch -- \n> diff --git a/gcc/match.pd b/gcc/match.pd\n> index 8ebde06dcd4b26d694826cffad0fb17e1136600a..20b9ea211385d9cc3876a5002f771267533e8868 100644\n> --- a/gcc/match.pd\n> +++ b/gcc/match.pd\n> @@ -8827,6 +8827,30 @@ and,\n>    (IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1)\n>     (IFN_COND_ADD (bit_and @0 @2) @1 @3 @1))\n>  \n> +/* Detect simplification for vector condition folding where\n> +\n> +  c = mask1 ? (masked_op mask2 a b) : b\n> +\n> +  into\n> +\n> +  c = masked_op (mask1 & mask2) a b\n> +\n> +  where the operation can be partially applied to one operand. */\n> +\n> +(for cond_op (COND_BINARY)\n> + (simplify\n> +  (vec_cond @0\n> +   (cond_op:s @1 @2 @3 @4) @3)\n> +  (cond_op (BIT_AND_EXPR @1 @0) @2 @3 @4)))\n\n(bit_and ..., not BIT_AND_EXPR please\n\n> +\n> +/* And same for ternary expressions.  */\n> +\n> +(for cond_op (COND_TERNARY)\n> + (simplify\n> +  (vec_cond @0\n> +   (cond_op:s @1 @2 @3 @4 @5) @4)\n> +  (cond_op (BIT_AND_EXPR @1 @0) @2 @3 @4 @5)))\n\nlikewise\n\nOK with that change.\n\nThanks,\nRichard.\n\n> +\n>  /* For pointers @0 and @2 and nonnegative constant offset @1, look for\n>     expressions like:\n>  \n> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pre_cond_share_1.c b/gcc/testsuite/gcc.target/aarch64/sve/pre_cond_share_1.c\n> new file mode 100644\n> index 0000000000000000000000000000000000000000..b51d0f298ea1fcf556365fe4afc875ebcd67584b\n> --- /dev/null\n> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pre_cond_share_1.c\n> @@ -0,0 +1,132 @@\n> +/* { dg-do compile } */\n> +/* { dg-options \"-Ofast -fdump-tree-optimized\" } */\n> +\n> +#include <stdint.h>\n> +#include <stddef.h>\n> +#include <math.h>\n> +#include <float.h>\n> +\n> +typedef struct __attribute__((__packed__)) _Atom {\n> +    float x, y, z;\n> +    int32_t type;\n> +} Atom;\n> +\n> +typedef struct __attribute__((__packed__)) _FFParams {\n> +    int32_t hbtype;\n> +    float radius;\n> +    float hphb;\n> +    float elsc;\n> +} FFParams;\n> +\n> +#ifndef PPWI\n> +#define PPWI (64)\n> +#endif\n> +\n> +#ifndef ITERS\n> +#define ITERS 8\n> +#endif\n> +\n> +#define DIFF_TOLERANCE_PCT 0.025f\n> +\n> +#define POSES_SIZE 393216\n> +#define PROTEIN_SIZE 938\n> +#define LIGAND_SIZE 26\n> +#define FORCEFIELD_SIZE 34\n> +\n> +#define ZERO 0.0f\n> +#define QUARTER 0.25f\n> +#define HALF 0.5f\n> +#define ONE 1.0f\n> +#define TWO 2.0f\n> +#define FOUR 4.0f\n> +#define CNSTNT 45.0f\n> +\n> +// Energy evaluation parameters\n> +#define HBTYPE_F 70\n> +#define HBTYPE_E 69\n> +#define HARDNESS 38.0f\n> +#define NPNPDIST 5.5f\n> +#define NPPDIST 1.0f\n> +\n> +void\n> +fasten_main(size_t group, size_t ntypes, size_t nposes, size_t natlig, size_t natpro,        //\n> +            const Atom *protein, const Atom *ligand,                                         //\n> +            const float *transforms_0, const float *transforms_1, const float *transforms_2, //\n> +            const float *transforms_3, const float *transforms_4, const float *transforms_5, //\n> +            const FFParams *forcefield, float *energies                                      //\n> +) {\n> +\n> +    float etot[PPWI];\n> +    float lpos_x[PPWI];\n> +\n> +    for (int l = 0; l < PPWI; l++) {\n> +        etot[l] = 0.f;\n> +        lpos_x[l] = 0.f;\n> +    }\n> +\n> +    // Loop over ligand atoms\n> +    for (int il = 0; il < natlig; il++) {\n> +        // Load ligand atom data\n> +        const Atom l_atom = ligand[il];\n> +        const FFParams l_params = forcefield[l_atom.type];\n> +        const int lhphb_ltz = l_params.hphb < 0.f;\n> +        const int lhphb_gtz = l_params.hphb > 0.f;\n> +\n> +        // Transform ligand atom\n> +\n> +        // Loop over protein atoms\n> +        for (int ip = 0; ip < natpro; ip++) {\n> +            // Load protein atom data\n> +            const Atom p_atom = protein[ip];\n> +            const FFParams p_params = forcefield[p_atom.type];\n> +\n> +            const float radij = p_params.radius + l_params.radius;\n> +            const float r_radij = ONE / radij;\n> +\n> +            const float elcdst = (p_params.hbtype == HBTYPE_F && l_params.hbtype == HBTYPE_F) ? FOUR\n> +                                                                                              : TWO;\n> +            const float elcdst1 = (p_params.hbtype == HBTYPE_F && l_params.hbtype == HBTYPE_F)\n> +                                  ? QUARTER : HALF;\n> +            const int type_E = ((p_params.hbtype == HBTYPE_E || l_params.hbtype == HBTYPE_E));\n> +\n> +            const int phphb_ltz = p_params.hphb < 0.f;\n> +            const int phphb_gtz = p_params.hphb > 0.f;\n> +            const int phphb_nz = p_params.hphb != 0.f;\n> +            const float p_hphb = p_params.hphb * (phphb_ltz && lhphb_gtz ? -ONE : ONE);\n> +            const float l_hphb = l_params.hphb * (phphb_gtz && lhphb_ltz ? -ONE : ONE);\n> +            const float distdslv = (phphb_ltz ? (lhphb_ltz ? NPNPDIST : NPPDIST) : (lhphb_ltz\n> +                                                                                    ? NPPDIST\n> +                                                                                    : -FLT_MAX));\n> +            const float r_distdslv = ONE / distdslv;\n> +\n> +            const float chrg_init = l_params.elsc * p_params.elsc;\n> +            const float dslv_init = p_hphb + l_hphb;\n> +\n> +            for (int l = 0; l < PPWI; l++) {\n> +                // Calculate distance between atoms\n> +                const float x = lpos_x[l] - p_atom.x;\n> +                const float distij = (x * x);\n> +\n> +                // Calculate the sum of the sphere radii\n> +                const float distbb = distij - radij;\n> +\n> +                const int zone1 = (distbb < ZERO);\n> +\n> +                // Calculate formal and dipole charge interactions\n> +                float chrg_e = chrg_init * ((zone1 ? ONE : (ONE - distbb * elcdst1)) *\n> +                                            (distbb < elcdst ? ONE : ZERO));\n> +                float neg_chrg_e = -fabsf(chrg_e);\n> +                chrg_e = type_E ? neg_chrg_e : chrg_e;\n> +                etot[l] += chrg_e * CNSTNT;\n> +            }\n> +        }\n> +    }\n> +\n> +    // Write result\n> +    for (int l = 0; l < PPWI; l++) {\n> +        energies[group * PPWI + l] = etot[l] * HALF;\n> +    }\n> +}\n> +\n> +/* { dg-final { scan-tree-dump-times {\\.COND_MUL} 1 \"optimized\" } } */\n> +/* { dg-final { scan-tree-dump-times {\\.VCOND} 1 \"optimized\" } } */\n> \n> \n> \n> \n>","headers":{"Return-Path":"<gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>","X-Original-To":["incoming@patchwork.ozlabs.org","gcc-patches@gcc.gnu.org"],"Delivered-To":["patchwork-incoming@legolas.ozlabs.org","gcc-patches@gcc.gnu.org"],"Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256\n header.s=susede2_rsa header.b=nCoeLxdI;\n\tdkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256\n header.s=susede2_ed25519 header.b=KAGYZ8QW;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org\n (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;\n envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;\n receiver=patchwork.ozlabs.org)","sourceware.org;\n dmarc=pass (p=none dis=none) header.from=suse.de","sourceware.org; spf=pass smtp.mailfrom=suse.de"],"Received":["from server2.sourceware.org (server2.sourceware.org\n [IPv6:2620:52:3:1:0:246e:9693:128c])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4RwZZy0ZGGz1yp0\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 27 Sep 2023 21:46:48 +1000 (AEST)","from server2.sourceware.org (localhost [IPv6:::1])\n\tby sourceware.org (Postfix) with ESMTP id DA0D0385F001\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 27 Sep 2023 11:46:46 +0000 (GMT)","from smtp-out1.suse.de (smtp-out1.suse.de\n [IPv6:2001:67c:2178:6::1c])\n by sourceware.org (Postfix) with ESMTPS id E6658385276D\n for <gcc-patches@gcc.gnu.org>; Wed, 27 Sep 2023 11:46:33 +0000 (GMT)","from relay2.suse.de (relay2.suse.de [149.44.160.134])\n by smtp-out1.suse.de (Postfix) with ESMTP id 1F8552197D;\n Wed, 27 Sep 2023 11:46:33 +0000 (UTC)","from wotan.suse.de (wotan.suse.de [10.160.0.1])\n (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n (No client certificate requested)\n by relay2.suse.de (Postfix) with ESMTPS id F260C2C142;\n Wed, 27 Sep 2023 11:46:32 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.4.2 sourceware.org E6658385276D","DKIM-Signature":["v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_rsa;\n t=1695815193;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n in-reply-to:in-reply-to:references:references;\n bh=1bKE216XRT8LsrA/r1UyfNtnJJ/Ujyh4YPkdEpR62G4=;\n b=nCoeLxdIA7MaVBCxBzzllnEND63QUUQnu/x30tnG4Mvjof+E+5yRBSulHEf2/40J6ceDuB\n WDqnrnDOOK8wj60XaBnlI/094efE6gCh3kj5YJoSpPm5Fu19y4p7MFe2LnAI+s+iey9zor\n Q4AOiPCx/UsbB8ixinYWs8q7CECTf/c=","v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_ed25519; t=1695815193;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n in-reply-to:in-reply-to:references:references;\n bh=1bKE216XRT8LsrA/r1UyfNtnJJ/Ujyh4YPkdEpR62G4=;\n b=KAGYZ8QWvD7Jm6dVjsTJcCEJp+flEfrd3nZJEPO88MvgiqotU2Fe5mpgBm8hCUQbRksoOE\n 5sDawAaL9M44OAAg=="],"Date":"Wed, 27 Sep 2023 11:46:32 +0000 (UTC)","From":"Richard Biener <rguenther@suse.de>","To":"Tamar Christina <tamar.christina@arm.com>","cc":"gcc-patches@gcc.gnu.org, nd@arm.com, jlaw@ventanamicro.com","Subject":"Re: [PATCH]middle-end Fold vec_cond into conditional ternary or\n binary operation when sharing operand [PR109154]","In-Reply-To":"<patch-17719-tamar@arm.com>","Message-ID":"<nycvar.YFH.7.77.849.2309271145330.5561@jbgna.fhfr.qr>","References":"<patch-17719-tamar@arm.com>","User-Agent":"Alpine 2.22 (LSU 394 2020-01-19)","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII","X-Spam-Status":"No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,\n DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT,\n SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6","X-Spam-Checker-Version":"SpamAssassin 3.4.6 (2021-04-09) on\n server2.sourceware.org","X-BeenThere":"gcc-patches@gcc.gnu.org","X-Mailman-Version":"2.1.30","Precedence":"list","List-Id":"Gcc-patches mailing list <gcc-patches.gcc.gnu.org>","List-Unsubscribe":"<https://gcc.gnu.org/mailman/options/gcc-patches>,\n <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>","List-Archive":"<https://gcc.gnu.org/pipermail/gcc-patches/>","List-Post":"<mailto:gcc-patches@gcc.gnu.org>","List-Help":"<mailto:gcc-patches-request@gcc.gnu.org?subject=help>","List-Subscribe":"<https://gcc.gnu.org/mailman/listinfo/gcc-patches>,\n <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>","Errors-To":"gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org"}}]