From patchwork Sat Jul 20 03:18:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 1134346 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-505358-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=us.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="MPL3St2T"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45rClH393cz9sNF for ; Sat, 20 Jul 2019 13:19:04 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=cEhuOlbEu8UFtWRl 9WtEYTwQR9m+1jt6RKFE4e0HwiZTTXDfWZptoh6S0i/CnDGE3/uF2ZVQY0fI6OhZ FRyLihSjExAOXqxinRsDBTona0/Zclyr9CJoSHKVTEjjJ0ujZW3t16QdqVXIKBhl hrSXYJsl4/vtuBL3iw5jj7choVQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type :content-transfer-encoding; s=default; bh=xP/L+oanIRQa8S2k2auOFp ahUBA=; b=MPL3St2TwX8Tm1z7htbCLewuxQElT29mf8Zc6qKIEF2//Ch/rXLkQe u1zvuJbtIK8DQ3H5nOyro3rLa492vaJeWWG20ty2DcglQzbRlIfxnilzWdrU6yzZ 674jIY/WUDz7xp8E4Nt9xJPYnpQWRZr+ok1n0wnPXYkF0vYQgg47c= Received: (qmail 73714 invoked by alias); 20 Jul 2019 03:18:56 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 73706 invoked by uid 89); 20 Jul 2019 03:18:55 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-19.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 spammy=clarke X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 20 Jul 2019 03:18:54 +0000 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x6K3HqXn030787; Fri, 19 Jul 2019 23:18:51 -0400 Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0a-001b2d01.pphosted.com with ESMTP id 2tunuqgdtj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 19 Jul 2019 23:18:51 -0400 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id x6K3Elx0027033; Sat, 20 Jul 2019 03:18:50 GMT Received: from b03cxnp08026.gho.boulder.ibm.com (b03cxnp08026.gho.boulder.ibm.com [9.17.130.18]) by ppma01dal.us.ibm.com with ESMTP id 2tutk603t5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 20 Jul 2019 03:18:50 +0000 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x6K3IngT58392866 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 20 Jul 2019 03:18:49 GMT Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4A3596A04D; Sat, 20 Jul 2019 03:18:49 +0000 (GMT) Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BCAE36A047; Sat, 20 Jul 2019 03:18:48 +0000 (GMT) Received: from oc3272150783.ibm.com (unknown [9.80.203.226]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTPS; Sat, 20 Jul 2019 03:18:48 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: Segher Boessenkool From: Paul Clarke Subject: [PATCH] [rs6000] Add _mm_blend_epi16 and _mm_blendv_epi8 Message-ID: Date: Fri, 19 Jul 2019 22:18:47 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 Add compatibility implementations of _mm_blend_epi16 and _mm_blendv_epi8 intrinsics. Respective test cases are copied almost verbatim (minor changes to the dejagnu head lines) from i386. 2019-07-19 Paul A. Clarke [gcc] * config/rs6000/smmintrin.h (_mm_blend_epi16): New. (_mm_blendv_epi8): New. [gcc/testsuite] * gcc.target/powerpc/sse4_1-check.h: New. * gcc.target/powerpc/sse4_1-pblendvb.c: New. * gcc.target/powerpc/sse4_1-pblendw.c: New. * gcc.target/powerpc/sse4_1-pblendw-2.c: New. Tested on 64bit LE, 64bit and 32bit BE. OK for trunk? PC Index: gcc/config/rs6000/smmintrin.h =================================================================== diff --git a/trunk/gcc/config/rs6000/smmintrin.h b/trunk/gcc/config/rs6000/smmintrin.h --- a/trunk/gcc/config/rs6000/smmintrin.h (revision 273615) +++ b/trunk/gcc/config/rs6000/smmintrin.h (working copy) @@ -66,4 +66,27 @@ _mm_extract_ps (__m128 __X, const int __N) return ((__v4si)__X)[__N & 3]; } +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8) +{ + __v8hu __bitmask = vec_splats ((unsigned short) __imm8); + const __v8hu __shifty = { 0, 1, 2, 3, 4, 5, 6, 7 }; + __bitmask = vec_sr (__bitmask, __shifty); + const __v8hu __ones = vec_splats ((unsigned short) 0x0001); + __bitmask = vec_and (__bitmask, __ones); + const __v8hu __zero = {0}; + __bitmask = vec_sub (__zero, __bitmask); + return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __bitmask); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask) +{ + const __v16qu __hibits = vec_splats ((unsigned char) 0x80); + __v16qu __lmask = vec_and ((__v16qu) __mask, __hibits); + const __v16qu __zero = {0}; + __lmask = (vector unsigned char) vec_cmpgt (__lmask, __zero); + return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask); +} + #endif Index: gcc/testsuite/gcc.target/powerpc/sse4_1-check.h =================================================================== diff --git a/trunk/gcc/testsuite/gcc.target/powerpc/sse4_1-check.h b/trunk/gcc/testsuite/gcc.target/powerpc/sse4_1-check.h new file mode 10644 --- /dev/null (revision 0) +++ b/trunk/gcc/testsuite/gcc.target/powerpc/sse4_1-check.h (working copy) @@ -0,0 +1,27 @@ +#include +#include + +#include "m128-check.h" + +//#define DEBUG 1 + +#define TEST sse4_1_test + +static void sse4_1_test (void); + +static void +__attribute__ ((noinline)) +do_test (void) +{ + sse4_1_test (); +} + +int +main () +{ + do_test (); +#ifdef DEBUG + printf ("PASSED\n"); +#endif + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/sse4_1-pblendvb.c =================================================================== diff --git a/trunk/gcc/testsuite/gcc.target/powerpc/sse4_1-pblendvb.c b/trunk/gcc/testsuite/gcc.target/powerpc/sse4_1-pblendvb.c new file mode 10644 --- /dev/null (revision 0) +++ b/trunk/gcc/testsuite/gcc.target/powerpc/sse4_1-pblendvb.c (working copy) @@ -0,0 +1,71 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include +#include + +#define NUM 20 + +static void +init_pblendvb (unsigned char *src1, unsigned char *src2, + unsigned char *mask) +{ + int i, sign = 1; + + for (i = 0; i < NUM * 16; i++) + { + src1[i] = i* i * sign; + src2[i] = (i + 20) * sign; + mask[i] = (i % 3) + ((i * (14 + sign)) + ^ (src1[i] | src2[i] | (i*3))); + sign = -sign; + } +} + +static int +check_pblendvb (__m128i *dst, unsigned char *src1, + unsigned char *src2, unsigned char *mask) +{ + unsigned char tmp[16]; + int j; + + memcpy (&tmp[0], src1, sizeof (tmp)); + for (j = 0; j < 16; j++) + if (mask [j] & 0x80) + tmp[j] = src2[j]; + + return memcmp (dst, &tmp[0], sizeof (tmp)); +} + +static void +TEST (void) +{ + union + { + __m128i x[NUM]; + unsigned char c[NUM * 16]; + } dst, src1, src2, mask; + int i; + + init_pblendvb (src1.c, src2.c, mask.c); + + for (i = 0; i < NUM; i++) + { + dst.x[i] = _mm_blendv_epi8 (src1.x[i], src2.x[i], mask.x[i]); + if (check_pblendvb (&dst.x[i], &src1.c[i * 16], &src2.c[i * 16], + &mask.c[i * 16])) + abort (); + } +} Index: gcc/testsuite/gcc.target/powerpc/sse4_1-pblendw-2.c =================================================================== diff --git a/trunk/gcc/testsuite/gcc.target/powerpc/sse4_1-pblendw-2.c b/trunk/gcc/testsuite/gcc.target/powerpc/sse4_1-pblendw-2.c new file mode 10644 --- /dev/null (revision 0) +++ b/trunk/gcc/testsuite/gcc.target/powerpc/sse4_1-pblendw-2.c (working copy) @@ -0,0 +1,80 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 +#include "sse4_1-check.h" + +#include +#include + +#define NUM 20 + +#undef MASK +#define MASK 0xfe + +static void +init_pblendw (short *src1, short *src2) +{ + int i, sign = 1; + + for (i = 0; i < NUM * 8; i++) + { + src1[i] = i * i * sign; + src2[i] = (i + 20) * sign; + sign = -sign; + } +} + +static int +check_pblendw (__m128i *dst, short *src1, short *src2) +{ + short tmp[8]; + int j; + + memcpy (&tmp[0], src1, sizeof (tmp)); + for (j = 0; j < 8; j++) + if ((MASK & (1 << j))) + tmp[j] = src2[j]; + + return memcmp (dst, &tmp[0], sizeof (tmp)); +} + +static void +sse4_1_test (void) +{ + __m128i x, y; + union + { + __m128i x[NUM]; + short s[NUM * 8]; + } dst, src1, src2; + union + { + __m128i x; + short s[8]; + } src3; + int i; + + init_pblendw (src1.s, src2.s); + + /* Check pblendw imm8, m128, xmm */ + for (i = 0; i < NUM; i++) + { + dst.x[i] = _mm_blend_epi16 (src1.x[i], src2.x[i], MASK); + if (check_pblendw (&dst.x[i], &src1.s[i * 8], &src2.s[i * 8])) + abort (); + } + + /* Check pblendw imm8, xmm, xmm */ + src3.x = _mm_setzero_si128 (); + + x = _mm_blend_epi16 (dst.x[2], src3.x, MASK); + y = _mm_blend_epi16 (src3.x, dst.x[2], MASK); + + if (check_pblendw (&x, &dst.s[16], &src3.s[0])) + abort (); + + if (check_pblendw (&y, &src3.s[0], &dst.s[16])) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse4_1-pblendw.c =================================================================== diff --git a/trunk/gcc/testsuite/gcc.target/powerpc/sse4_1-pblendw.c b/trunk/gcc/testsuite/gcc.target/powerpc/sse4_1-pblendw.c new file mode 10644 --- /dev/null (revision 0) +++ b/trunk/gcc/testsuite/gcc.target/powerpc/sse4_1-pblendw.c (working copy) @@ -0,0 +1,89 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include +#include + +#define NUM 20 + +#ifndef MASK +#define MASK 0x0f +#endif + +static void +init_pblendw (short *src1, short *src2) +{ + int i, sign = 1; + + for (i = 0; i < NUM * 8; i++) + { + src1[i] = i * i * sign; + src2[i] = (i + 20) * sign; + sign = -sign; + } +} + +static int +check_pblendw (__m128i *dst, short *src1, short *src2) +{ + short tmp[8]; + int j; + + memcpy (&tmp[0], src1, sizeof (tmp)); + for (j = 0; j < 8; j++) + if ((MASK & (1 << j))) + tmp[j] = src2[j]; + + return memcmp (dst, &tmp[0], sizeof (tmp)); +} + +static void +TEST (void) +{ + __m128i x, y; + union + { + __m128i x[NUM]; + short s[NUM * 8]; + } dst, src1, src2; + union + { + __m128i x; + short s[8]; + } src3; + int i; + + init_pblendw (src1.s, src2.s); + + /* Check pblendw imm8, m128, xmm */ + for (i = 0; i < NUM; i++) + { + dst.x[i] = _mm_blend_epi16 (src1.x[i], src2.x[i], MASK); + if (check_pblendw (&dst.x[i], &src1.s[i * 8], &src2.s[i * 8])) + abort (); + } + + /* Check pblendw imm8, xmm, xmm */ + src3.x = _mm_setzero_si128 (); + + x = _mm_blend_epi16 (dst.x[2], src3.x, MASK); + y = _mm_blend_epi16 (src3.x, dst.x[2], MASK); + + if (check_pblendw (&x, &dst.s[16], &src3.s[0])) + abort (); + + if (check_pblendw (&y, &src3.s[0], &dst.s[16])) + abort (); +}