From patchwork Wed Jun 9 09:19:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Prathamesh Kulkarni X-Patchwork-Id: 1489764 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=WLblpnnt; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4G0M7V57gSz9sRN for ; Wed, 9 Jun 2021 19:21:58 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AAE5A385E454 for ; Wed, 9 Jun 2021 09:21:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AAE5A385E454 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1623230516; bh=khEs+sxt9xwWlG0XJ7uUG+qDmOPWvmJdqlbnn7k59y0=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=WLblpnnt2dr6Tw+MYeJy3pWZNAlXfn3Ot6zAKgD7KzsSazU01URwnKxuyxp2WJ9M6 QyBk4YwfAcNL4B19C4JL4BMUBXxfS2eLxno5EbfPDcFek/iaqCWM38mJS8Up7CiDkn yugp7fSrT/UKyjM0sWOuq8SDljlgJ8AIO3co1bOg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by sourceware.org (Postfix) with ESMTPS id 15378385E02C for ; Wed, 9 Jun 2021 09:20:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 15378385E02C Received: by mail-pf1-x430.google.com with SMTP id m7so5013081pfa.10 for ; Wed, 09 Jun 2021 02:20:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=khEs+sxt9xwWlG0XJ7uUG+qDmOPWvmJdqlbnn7k59y0=; b=V6ugrAvnlJLcMLQwd7GVgsxZUT77XLP/g11JigSMBCAT5VEns5B9268qdktSEGt8Gh 2lSrdmRQF9LJcWOjmKbUuzr+mexe4fGjjJR+vDwte//s1HFOwod6zPzeLGY46/tzNk/e K9ueg2jnNL4j8LS83BP4zmExy0dFTKu1FqAXBC6mymcz//AY9D+MGlqyf4TEWYWtEZ5J RM9sth7laHuREanHLArncd2m+PbUzmBIvf9WFc6hjl4z1vzK/6ddruW3570E5ggYReYQ nft8V8W0ms9BEB/6/uGsghc8M7PH4MZd6ij6wrlfFsqetITWH6vCNgOFliGq3srpUGCP kIOg== X-Gm-Message-State: AOAM5317sYA5/KP0UugepX7NaNWVVhrHFEFrLztd7zLXe9i55rvdQiHm J4kEW9keXCEqsjF8rSYMbqzcpsBiKepxmGNHutBbNsSJXUA= X-Google-Smtp-Source: ABdhPJzo5Bfjz2F+jb9y4aJCcw6bAI2PJWuNwiHx69bPDXmfP+wW6Z4eJMzFGfZALvGBwTpFrPlKKbFRMXg43FgUA7w= X-Received: by 2002:a63:8f46:: with SMTP id r6mr2980151pgn.182.1623230434857; Wed, 09 Jun 2021 02:20:34 -0700 (PDT) MIME-Version: 1.0 Date: Wed, 9 Jun 2021 14:49:58 +0530 Message-ID: Subject: [ARM NEON] PR66791: Replace builtins in vceq_* (a, b) with a == b. To: gcc Patches , Kyrill Tkachov X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Prathamesh Kulkarni via Gcc-patches From: Prathamesh Kulkarni Reply-To: Prathamesh Kulkarni Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, The attached patch replaces calls to _builtin_neon_vceq (a, b) with a == b for integral variants, and for fp variants it gates the equality comparison on __FAST_MATH__ because for fp variants a == b results in much longer code than __builtin_neon_vceqv2sf. which simply emits vceq.f32. However both produce vceq.f32 with -ffast-math. Thanks, Prathamesh 2021-06-09 Prathamesh Kulkarni * config/arm/arm_neon.h (vceq_s8): Replace builtin with __a == __b. (vceq_s16): Likewise. (vceq_s32): Likewise. (vceq_u8): Likewise. (vceq_u16): Likewise. (vceq_u32): Likewise. (vceq_p8): Likewise. (vceqq_s8): Likewise. (vceqq_s16): Likewise. (vceqq_s32): Likewise. (vceqq_u8): Likewise. (vceqq_u16): Likewise. (vceqq_u32): Likewise. (vceqq_p8): Likewise. (vceq_f32): Gate __a == __b on __FAST_MATH__. (vceqq_f32): Likewise. (vceq_f16): Likewise. (vceqq_f16): Likewise. diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index dcd533fd003..7a800062f9e 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -2359,112 +2359,120 @@ __extension__ extern __inline uint8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceq_s8 (int8x8_t __a, int8x8_t __b) { - return (uint8x8_t)__builtin_neon_vceqv8qi (__a, __b); + return (uint8x8_t) (__a == __b); } __extension__ extern __inline uint16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceq_s16 (int16x4_t __a, int16x4_t __b) { - return (uint16x4_t)__builtin_neon_vceqv4hi (__a, __b); + return (uint16x4_t) (__a == __b); } __extension__ extern __inline uint32x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceq_s32 (int32x2_t __a, int32x2_t __b) { - return (uint32x2_t)__builtin_neon_vceqv2si (__a, __b); + return (uint32x2_t) (__a == __b); } __extension__ extern __inline uint32x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceq_f32 (float32x2_t __a, float32x2_t __b) { +#ifdef __FAST_MATH__ + return (uint32x2_t) (__a == __b); +#else return (uint32x2_t)__builtin_neon_vceqv2sf (__a, __b); +#endif } __extension__ extern __inline uint8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceq_u8 (uint8x8_t __a, uint8x8_t __b) { - return (uint8x8_t)__builtin_neon_vceqv8qi ((int8x8_t) __a, (int8x8_t) __b); + return (uint8x8_t) (__a == __b); } __extension__ extern __inline uint16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceq_u16 (uint16x4_t __a, uint16x4_t __b) { - return (uint16x4_t)__builtin_neon_vceqv4hi ((int16x4_t) __a, (int16x4_t) __b); + return (uint16x4_t) (__a == __b); } __extension__ extern __inline uint32x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceq_u32 (uint32x2_t __a, uint32x2_t __b) { - return (uint32x2_t)__builtin_neon_vceqv2si ((int32x2_t) __a, (int32x2_t) __b); + return (uint32x2_t) (__a == __b); } __extension__ extern __inline uint8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceq_p8 (poly8x8_t __a, poly8x8_t __b) { - return (uint8x8_t)__builtin_neon_vceqv8qi ((int8x8_t) __a, (int8x8_t) __b); + return (uint8x8_t) (__a == __b); } __extension__ extern __inline uint8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceqq_s8 (int8x16_t __a, int8x16_t __b) { - return (uint8x16_t)__builtin_neon_vceqv16qi (__a, __b); + return (uint8x16_t) (__a == __b); } __extension__ extern __inline uint16x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceqq_s16 (int16x8_t __a, int16x8_t __b) { - return (uint16x8_t)__builtin_neon_vceqv8hi (__a, __b); + return (uint16x8_t) (__a == __b); } __extension__ extern __inline uint32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceqq_s32 (int32x4_t __a, int32x4_t __b) { - return (uint32x4_t)__builtin_neon_vceqv4si (__a, __b); + return (uint32x4_t) (__a == __b); } __extension__ extern __inline uint32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceqq_f32 (float32x4_t __a, float32x4_t __b) { +#ifdef __FAST_MATH__ + return (uint32x4_t) (__a == __b); +#else return (uint32x4_t)__builtin_neon_vceqv4sf (__a, __b); +#endif } __extension__ extern __inline uint8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceqq_u8 (uint8x16_t __a, uint8x16_t __b) { - return (uint8x16_t)__builtin_neon_vceqv16qi ((int8x16_t) __a, (int8x16_t) __b); + return (uint8x16_t) (__a == __b); } __extension__ extern __inline uint16x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceqq_u16 (uint16x8_t __a, uint16x8_t __b) { - return (uint16x8_t)__builtin_neon_vceqv8hi ((int16x8_t) __a, (int16x8_t) __b); + return (uint16x8_t) (__a == __b); } __extension__ extern __inline uint32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceqq_u32 (uint32x4_t __a, uint32x4_t __b) { - return (uint32x4_t)__builtin_neon_vceqv4si ((int32x4_t) __a, (int32x4_t) __b); + return (uint32x4_t) (__a == __b); } __extension__ extern __inline uint8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceqq_p8 (poly8x16_t __a, poly8x16_t __b) { - return (uint8x16_t)__builtin_neon_vceqv16qi ((int8x16_t) __a, (int8x16_t) __b); + return (uint8x16_t) (__a == __b); } __extension__ extern __inline uint8x8_t @@ -17195,14 +17203,22 @@ __extension__ extern __inline uint16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceq_f16 (float16x4_t __a, float16x4_t __b) { +#ifdef __FAST_MATH__ + return (uint16x4_t) (__a == __b); +#else return (uint16x4_t)__builtin_neon_vceqv4hf (__a, __b); +#endif } __extension__ extern __inline uint16x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceqq_f16 (float16x8_t __a, float16x8_t __b) { +#ifdef __FAST_MATH__ + return (uint16x8_t) (__a == __b); +#else return (uint16x8_t)__builtin_neon_vceqv8hf (__a, __b); +#endif } __extension__ extern __inline uint16x4_t