VECT: Add COND_LEN_* operations for loop control with length targets

From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

Hi, Richard and Richi.

This patch is adding cond_len_* operations pattern for target support loop control with length.

These patterns will be used in these following case:

1. Integer division:
   void
   f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int n)
   {
     for (int i = 0; i < n; ++i)
      {
        a[i] = b[i] / c[i];
      }
   }

  ARM SVE IR:

  ...
  max_mask_36 = .WHILE_ULT (0, bnd.5_32, { 0, ... });

  Loop:
  ...
  # loop_mask_29 = PHI <next_mask_37(4), max_mask_36(3)>
  ...
  vect__4.8_28 = .MASK_LOAD (_33, 32B, loop_mask_29);
  ...
  vect__6.11_25 = .MASK_LOAD (_20, 32B, loop_mask_29);
  vect__8.12_24 = .COND_DIV (loop_mask_29, vect__4.8_28, vect__6.11_25, vect__4.8_28);
  ...
  .MASK_STORE (_1, 32B, loop_mask_29, vect__8.12_24);
  ...
  next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
  ...

  For target like RVV who support loop control with length, we want to see IR as follows:

  Loop:
  ...
  # loop_len_29 = SELECT_VL
  ...
  vect__4.8_28 = .LEN_MASK_LOAD (_33, 32B, loop_len_29);
  ...
  vect__6.11_25 = .LEN_MASK_LOAD (_20, 32B, loop_len_29);
  vect__8.12_24 = .COND_LEN_DIV (dummp_mask, vect__4.8_28, vect__6.11_25, vect__4.8_28, loop_len_29);
  ...
  .LEN_MASK_STORE (_1, 32B, loop_len_29, vect__8.12_24);
  ...
  next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
  ...

  Notice here, we use dummp_mask = { -1, -1, .... , -1 }

2. Integer conditional division:
   Similar case with (1) but with condtion:
   void
   f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int32_t * cond, int n)
   {
     for (int i = 0; i < n; ++i)
       {
         if (cond[i])
         a[i] = b[i] / c[i];
       }
   }

   ARM SVE:
   ...
   max_mask_76 = .WHILE_ULT (0, bnd.6_52, { 0, ... });

   Loop:
   ...
   # loop_mask_55 = PHI <next_mask_77(5), max_mask_76(4)>
   ...
   vect__4.9_56 = .MASK_LOAD (_51, 32B, loop_mask_55);
   mask__29.10_58 = vect__4.9_56 != { 0, ... };
   vec_mask_and_61 = loop_mask_55 & mask__29.10_58;
   ...
   vect__6.13_62 = .MASK_LOAD (_24, 32B, vec_mask_and_61);
   ...
   vect__8.16_66 = .MASK_LOAD (_1, 32B, vec_mask_and_61);
   vect__10.17_68 = .COND_DIV (vec_mask_and_61, vect__6.13_62, vect__8.16_66, vect__6.13_62);
   ...
   .MASK_STORE (_2, 32B, vec_mask_and_61, vect__10.17_68);
   ...
   next_mask_77 = .WHILE_ULT (_3, bnd.6_52, { 0, ... });

   Here, ARM SVE use vec_mask_and_61 = loop_mask_55 & mask__29.10_58; to gurantee the correct result.

   However, target with length control can not perform this elegant flow, for RVV, we would expect:

   Loop:
   ...
   loop_len_55 = SELECT_VL
   ...
   mask__29.10_58 = vect__4.9_56 != { 0, ... };
   ...
   vect__10.17_68 = .COND_LEN_DIV (mask__29.10_58, vect__6.13_62, vect__8.16_66, vect__6.13_62, loop_len_55);
   ...

   Here we expect COND_LEN_DIV predicated by a real mask which is the outcome of comparison: mask__29.10_58 = vect__4.9_56 != { 0, ... };
   and a real length which is produced by loop control : loop_len_55 = SELECT_VL

3. conditional Floating-point operations (no -ffast-math):

    void
    f (float *restrict a, float *restrict b, int32_t *restrict cond, int n)
    {
      for (int i = 0; i < n; ++i)
        {
          if (cond[i])
          a[i] = b[i] + a[i];
        }
    }

  ARM SVE IR:
  max_mask_70 = .WHILE_ULT (0, bnd.6_46, { 0, ... });

  ...
  # loop_mask_49 = PHI <next_mask_71(4), max_mask_70(3)>
  ...
  mask__27.10_52 = vect__4.9_50 != { 0, ... };
  vec_mask_and_55 = loop_mask_49 & mask__27.10_52;
  ...
  vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56);
  ...
  next_mask_71 = .WHILE_ULT (_22, bnd.6_46, { 0, ... });
  ...

  For RVV, we would expect IR:

  ...
  loop_len_49 = SELECT_VL
  ...
  mask__27.10_52 = vect__4.9_50 != { 0, ... };
  ...
  vect__9.17_62 = .COND_LEN_ADD (mask__27.10_52, vect__6.13_56, vect__8.16_60, vect__6.13_56, loop_len_49);
  ...

4. Conditional un-ordered reduction:

   int32_t
   f (int32_t *restrict a, 
   int32_t *restrict cond, int n)
   {
     int32_t result = 0;
     for (int i = 0; i < n; ++i)
       {
           if (cond[i])
         result += a[i];
       }
     return result;
   }

   ARM SVE IR:

     Loop:
     # vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)>
     ...
     # loop_mask_40 = PHI <next_mask_58(4), max_mask_57(3)>
     ...
     mask__17.11_43 = vect__4.10_41 != { 0, ... };
     vec_mask_and_46 = loop_mask_40 & mask__17.11_43;
     ...
     vect__33.16_51 = .COND_ADD (vec_mask_and_46, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37);
     ...
     next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... });
     ...

     Epilogue:
     _53 = .REDUC_PLUS (vect__33.16_51); [tail call]

   For RVV, we expect:

    Loop:
     # vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)>
     ...
     loop_len_40 = SELECT_VL
     ...
     mask__17.11_43 = vect__4.10_41 != { 0, ... };
     ...
     vect__33.16_51 = .COND_LEN_ADD (mask__17.11_43, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37, loop_len_40);
     ...
     next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... });
     ...

     Epilogue:
     _53 = .REDUC_PLUS (vect__33.16_51); [tail call]

     I name these patterns as "cond_len_*" since I want the length operand comes after mask operand and all other operands except length operand
     same order as "cond_*" patterns. Such order will make life easier in the following loop vectorizer support.

gcc/ChangeLog:

        * doc/md.texi: Add COND_LEN_* operations for loop control with length.
        * internal-fn.cc (cond_len_unary_direct): Ditto.
        (cond_len_binary_direct): Ditto.
        (cond_len_ternary_direct): Ditto.
        (expand_cond_len_unary_optab_fn): Ditto.
        (expand_cond_len_binary_optab_fn): Ditto.
        (expand_cond_len_ternary_optab_fn): Ditto.
        (direct_cond_len_unary_optab_supported_p): Ditto.
        (direct_cond_len_binary_optab_supported_p): Ditto.
        (direct_cond_len_ternary_optab_supported_p): Ditto.
        * internal-fn.def (COND_LEN_ADD): Ditto.
        (COND_LEN_SUB): Ditto.
        (COND_LEN_MUL): Ditto.
        (COND_LEN_DIV): Ditto.
        (COND_LEN_MOD): Ditto.
        (COND_LEN_RDIV): Ditto.
        (COND_LEN_MIN): Ditto.
        (COND_LEN_MAX): Ditto.
        (COND_LEN_FMIN): Ditto.
        (COND_LEN_FMAX): Ditto.
        (COND_LEN_AND): Ditto.
        (COND_LEN_IOR): Ditto.
        (COND_LEN_XOR): Ditto.
        (COND_LEN_SHL): Ditto.
        (COND_LEN_SHR): Ditto.
        (COND_LEN_FMA): Ditto.
        (COND_LEN_FMS): Ditto.
        (COND_LEN_FNMA): Ditto.
        (COND_LEN_FNMS): Ditto.
        (COND_LEN_NEG): Ditto.
        * optabs.def (OPTAB_D): Ditto.

---
 gcc/doc/md.texi     | 80 +++++++++++++++++++++++++++++++++++++++++++++
 gcc/internal-fn.cc  | 15 +++++++++
 gcc/internal-fn.def | 38 +++++++++++++++++++++
 gcc/optabs.def      | 24 ++++++++++++++
 4 files changed, 157 insertions(+)

Message ID	20230707040326.9223-1-juzhe.zhong@rivai.ai
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=<UNKNOWN>) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qy0Bt5Wnfz20bY for <incoming@patchwork.ozlabs.org>; Fri, 7 Jul 2023 14:03:59 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7252C385B537 for <incoming@patchwork.ozlabs.org>; Fri, 7 Jul 2023 04:03:57 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbguseast1.qq.com (smtpbguseast1.qq.com [54.204.34.129]) by sourceware.org (Postfix) with ESMTPS id 43BF23858D38 for <gcc-patches@gcc.gnu.org>; Fri, 7 Jul 2023 04:03:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 43BF23858D38 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp69t1688702610tbehlqjz Received: from server1.localdomain ( [58.60.1.22]) by bizesmtp.qq.com (ESMTP) with id ; Fri, 07 Jul 2023 12:03:29 +0800 (CST) X-QQ-SSF: 01400000000000G0T000000A0000000 X-QQ-FEAT: bhet8yMU7vksJZLtwn+C/obAu35iK6Brlk2d168Bok0dvyRzsS8ip3vOF+BjV t5SKXrycnQP5XPWUZ+dhGS1cnk59iq1F40wv6rSce4R4aeeXHTBJTGESyHjdAjgX9q6fQXZ Q3x8ZQswpu07gP5AHEbR3SQYsWSmzEuUVWSwdVRxoqIXTPq0C/WO3F3k71m92ezP3FMevB+ XnEdmUahXsxLJPPRMT0nodBAeY0wm6B79a35GQWZ/DxaEjnyDf+VB9V54mZAvt9hXzPoj1N Bc3T6RLoIN64YNKFbMEOjyQm0+3QqzNuetktVl+E7kH8MGT5LRaEOBWniKPfVh8bw6AvH+T 4x2m+Qcene4FYUSGLkMhLrSj+I56led9bh7Fbr/wDW5fsImpdAa+dC+lUTv8wW9dgF7xo73 LlJ5hyxowOI= X-QQ-GoodBg: 2 X-BIZMAIL-ID: 14142116349717981750 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong <juzhe.zhong@rivai.ai> Subject: [PATCH] VECT: Add COND_LEN_* operations for loop control with length targets Date: Fri, 7 Jul 2023 12:03:26 +0800 Message-Id: <20230707040326.9223-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	VECT: Add COND_LEN_* operations for loop control with length targets \| expand VECT: Add COND_LEN_* operations for loop control with length targets

VECT: Add COND_LEN_* operations for loop control with length targets

Commit Message

Comments

Patch