From patchwork Fri Oct 5 17:59:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 979704 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-487041-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=us.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="nLYsXUTq"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42RcwH358pz9s4Z for ; Sat, 6 Oct 2018 03:59:42 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:date:mime-version:message-id:content-type :content-transfer-encoding; q=dns; s=default; b=DYNIasI/KU5h/dDP 1CvpyZcxl3Uqb4EbCmEgL4mQB6J8aPDfXHbfth0wKL2MNw2XVo6H5oUBDH8bUbC9 Qw/7vHoincSZdciubhYayndOHT9nHkdbA42NQ4mLgX9ITD027NTEWKHajF5Wb8yJ fiariz3Cb33FDmLqg/GB+oKtPIk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:date:mime-version:message-id:content-type :content-transfer-encoding; s=default; bh=XUCl8jEQlz4jFx9dB9Qyo7 HuOm4=; b=nLYsXUTqKr8lNN7gnmOC75uw8QZEPG7X0Ji3ww6ySI46SUWjsseH4a eaisAm/wI3ObTMF7VPyFa8fn3pZM1FY+ESV0cBGevyz9vwThurvjStTyfmfPSj73 /V/kfc0hnHvtD39cPVUkAhKgpVmXSgwgEHrC8DMFqm4Aeub/JSIgo= Received: (qmail 42140 invoked by alias); 5 Oct 2018 17:59:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 42119 invoked by uid 89); 5 Oct 2018 17:59:30 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-9.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KHOP_DYNAMIC, RCVD_IN_DNSWL_LOW, SPF_PASS, UNWANTED_LANGUAGE_BODY autolearn=ham version=3.3.2 spammy=transferring, sk:NO_WARN, __m128d, sk:no_warn X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 05 Oct 2018 17:59:23 +0000 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w95HosgQ078533 for ; Fri, 5 Oct 2018 13:59:21 -0400 Received: from e15.ny.us.ibm.com (e15.ny.us.ibm.com [129.33.205.205]) by mx0a-001b2d01.pphosted.com with ESMTP id 2mxcqcgama-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 05 Oct 2018 13:59:20 -0400 Received: from localhost by e15.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 5 Oct 2018 13:59:18 -0400 Received: from b01cxnp23033.gho.pok.ibm.com (9.57.198.28) by e15.ny.us.ibm.com (146.89.104.202) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 5 Oct 2018 13:59:16 -0400 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w95HxFWY21233720 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 5 Oct 2018 17:59:15 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 947A111206F; Fri, 5 Oct 2018 13:58:38 -0400 (EDT) Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2DC04112063; Fri, 5 Oct 2018 13:58:38 -0400 (EDT) Received: from oc3272150783.ibm.com (unknown [9.85.187.215]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTPS; Fri, 5 Oct 2018 13:58:38 -0400 (EDT) To: gcc-patches@gcc.gnu.org, Segher Boessenkool From: Paul Clarke Subject: [PATCH v2, rs6000] 2/2 Add x86 SSE3 intrinsics to GCC PPC64LE target Date: Fri, 5 Oct 2018 12:59:14 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 x-cbid: 18100517-0068-0000-0000-0000034849CC X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00009825; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000267; SDB=6.01098329; UDB=6.00568082; IPR=6.00878369; MB=3.00023631; MTD=3.00000008; XFM=3.00000015; UTC=2018-10-05 17:59:17 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18100517-0069-0000-0000-000045F85182 Message-Id: This is part 2/2 for contributing PPC64LE support for X86 SSE3 instrisics. This patch includes testsuite/gcc.target tests for the intrinsics defined in pmmintrin.h. Tested on POWER8 ppc64le and ppc64 (-m64 and -m32, the latter only reporting 10 new unsupported tests.) [gcc/testsuite] 2018-10-01 Paul A. Clarke * gcc.target/powerpc/sse3-check.h: New file. * gcc.target/powerpc/sse3-addsubps.c: New file. * gcc.target/powerpc/sse3-addsubpd.c: New file. * gcc.target/powerpc/sse3-haddps.c: New file. * gcc.target/powerpc/sse3-hsubps.c: New file. * gcc.target/powerpc/sse3-haddpd.c: New file. * gcc.target/powerpc/sse3-hsubpd.c: New file. * gcc.target/powerpc/sse3-lddqu.c: New file. * gcc.target/powerpc/sse3-movsldup.c: New file. * gcc.target/powerpc/sse3-movshdup.c: New file. * gcc.target/powerpc/sse3-movddup.c: New file. * gcc.target/powerpc/pr37191.c: New file. v2: tested with -mcpu=power7; fixed up ChangeLog; universally used "-mpower8-vector" (per Segher review); changed "TEST" function declarations to be globally consistent Index: gcc/testsuite/gcc.target/powerpc/pr37191.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr37191.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr37191.c (working copy) @@ -0,0 +1,51 @@ +/* { dg-do compile } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-options "-O3 -mpower8-vector" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 + +#include +#include +#include + +#if 0 +extern const uint64_t ff_bone; +#endif + +static inline void transpose4x4(uint8_t *dst, uint8_t *src, ptrdiff_t dst_stride, ptrdiff_t src_stride) { + __m64 row0 = _mm_cvtsi32_si64(*(unsigned*)(src + (0 * src_stride))); + __m64 row1 = _mm_cvtsi32_si64(*(unsigned*)(src + (1 * src_stride))); + __m64 row2 = _mm_cvtsi32_si64(*(unsigned*)(src + (2 * src_stride))); + __m64 row3 = _mm_cvtsi32_si64(*(unsigned*)(src + (3 * src_stride))); + __m64 tmp0 = _mm_unpacklo_pi8(row0, row1); + __m64 tmp1 = _mm_unpacklo_pi8(row2, row3); + __m64 row01 = _mm_unpacklo_pi16(tmp0, tmp1); + __m64 row23 = _mm_unpackhi_pi16(tmp0, tmp1); + *((unsigned*)(dst + (0 * dst_stride))) = _mm_cvtsi64_si32(row01); + *((unsigned*)(dst + (1 * dst_stride))) = _mm_cvtsi64_si32(_mm_unpackhi_pi32(row01, row01)); + *((unsigned*)(dst + (2 * dst_stride))) = _mm_cvtsi64_si32(row23); + *((unsigned*)(dst + (3 * dst_stride))) = _mm_cvtsi64_si32(_mm_unpackhi_pi32(row23, row23)); +} + +#if 0 +static inline void h264_loop_filter_chroma_intra_mmx2(uint8_t *pix, int stride, int alpha1, int beta1) +{ + asm volatile( + "" + :: "r"(pix-2*stride), "r"(pix), "r"((long)stride), + "m"(alpha1), "m"(beta1), "m"(ff_bone) + ); +} +#endif + +void h264_h_loop_filter_chroma_intra_mmx2(uint8_t *pix, int stride, int alpha, int beta) +{ + uint8_t trans[8*4] __attribute__ ((aligned (8))); + transpose4x4(trans, pix-2, 8, stride); + transpose4x4(trans+4, pix-2+4*stride, 8, stride); +// h264_loop_filter_chroma_intra_mmx2(trans+2*8, 8, alpha-1, beta-1); + transpose4x4(pix-2, trans, stride, 8); + transpose4x4(pix-2+4*stride, trans+4, stride, 8); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-addsubpd.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-addsubpd.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-addsubpd.c (working copy) @@ -0,0 +1,101 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_addsubpd_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_addsubpd (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_loadu_pd (i2); + + t1 = _mm_addsub_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static void +sse3_test_addsubpd_subsume (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_load_pd (i1); + __m128d t2 = _mm_load_pd (i2); + + t1 = _mm_addsub_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2] __attribute__ ((aligned(16))); +static double p2[2] __attribute__ ((aligned(16))); +static double p3[2]; +static double ck[2]; + +double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +static void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 4) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + + p2[0] = vals[i+2]; + p2[1] = vals[i+3]; + + ck[0] = p1[0] - p2[0]; + ck[1] = p1[1] + p2[1]; + + sse3_test_addsubpd (p1, p2, p3); + + fail += chk_pd (ck, p3); + + sse3_test_addsubpd_subsume (p1, p2, p3); + + fail += chk_pd (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-addsubps.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-addsubps.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-addsubps.c (working copy) @@ -0,0 +1,107 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_addsubps_1 +#endif + +#include + +static void +sse3_test_addsubps (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_loadu_ps (i2); + + t1 = _mm_addsub_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static void +sse3_test_addsubps_subsume (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_load_ps (i2); + + t1 = _mm_addsub_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static int +chk_ps (float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4] __attribute__ ((aligned(16))); +static float p3[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +static void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals); i += 8) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + p1[2] = vals[i+2]; + p1[3] = vals[i+3]; + + p2[0] = vals[i+4]; + p2[1] = vals[i+5]; + p2[2] = vals[i+6]; + p2[3] = vals[i+7]; + + ck[0] = p1[0] - p2[0]; + ck[1] = p1[1] + p2[1]; + ck[2] = p1[2] - p2[2]; + ck[3] = p1[3] + p2[3]; + + sse3_test_addsubps (p1, p2, p3); + + fail += chk_ps (ck, p3); + + sse3_test_addsubps_subsume (p1, p2, p3); + + fail += chk_ps (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-check.h =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-check.h (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-check.h (working copy) @@ -0,0 +1,43 @@ +#include +#include + +#include "m128-check.h" + +/* define DEBUG replace abort with printf on error. */ +//#define DEBUG 1 + +#define TEST sse3_test + +static void sse3_test (void); + +static void +__attribute__ ((noinline)) +do_test (void) +{ + sse3_test (); +} + +int +main () +{ +#ifdef __BUILTIN_CPU_SUPPORTS__ + /* Most SSE intrinsic operations can be implemented via VMX + instructions, but some operations may be faster / simpler + using the POWER8 VSX instructions. This is especially true + when we are transferring / converting to / from __m64 types. + The direct register transfer instructions from POWER8 are + especially important. So we test for arch_2_07. */ + if (__builtin_cpu_supports ("arch_2_07")) + { + do_test (); +#ifdef DEBUG + printf ("PASSED\n"); +#endif + } +#ifdef DEBUG + else + printf ("SKIPPED\n"); +#endif +#endif /* __BUILTIN_CPU_SUPPORTS__ */ + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/sse3-haddpd.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-haddpd.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-haddpd.c (working copy) @@ -0,0 +1,100 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_haddpd_1 +#endif +#include + +static void +sse3_test_haddpd (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_loadu_pd (i2); + + t1 = _mm_hadd_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static void +sse3_test_haddpd_subsume (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_load_pd (i1); + __m128d t2 = _mm_load_pd (i2); + + t1 = _mm_hadd_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2] __attribute__ ((aligned(16))); +static double p2[2] __attribute__ ((aligned(16))); +static double p3[2]; +static double ck[2]; + +static double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +static void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 4) + { + p1[0] = vals[i + 0]; + p1[1] = vals[i + 1]; + + p2[0] = vals[i + 2]; + p2[1] = vals[i + 3]; + + ck[0] = p1[0] + p1[1]; + ck[1] = p2[0] + p2[1]; + + sse3_test_haddpd (p1, p2, p3); + + fail += chk_pd (ck, p3); + + sse3_test_haddpd_subsume (p1, p2, p3); + + fail += chk_pd (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-haddps.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-haddps.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-haddps.c (working copy) @@ -0,0 +1,107 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_haddps_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_haddps (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_loadu_ps (i2); + + t1 = _mm_hadd_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static void +sse3_test_haddps_subsume (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_load_ps (i2); + + t1 = _mm_hadd_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static int +chk_ps(float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4] __attribute__ ((aligned(16))); +static float p3[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +static void +TEST () +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 8) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + p1[2] = vals[i+2]; + p1[3] = vals[i+3]; + + p2[0] = vals[i+4]; + p2[1] = vals[i+5]; + p2[2] = vals[i+6]; + p2[3] = vals[i+7]; + + ck[0] = p1[0] + p1[1]; + ck[1] = p1[2] + p1[3]; + ck[2] = p2[0] + p2[1]; + ck[3] = p2[2] + p2[3]; + + sse3_test_haddps (p1, p2, p3); + + fail += chk_ps (ck, p3); + + sse3_test_haddps_subsume (p1, p2, p3); + + fail += chk_ps (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-hsubpd.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-hsubpd.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-hsubpd.c (working copy) @@ -0,0 +1,101 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_hsubpd_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_hsubpd (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_loadu_pd (i2); + + t1 = _mm_hsub_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static void +sse3_test_hsubpd_subsume (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_load_pd (i1); + __m128d t2 = _mm_load_pd (i2); + + t1 = _mm_hsub_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2] __attribute__ ((aligned(16))); +static double p2[2] __attribute__ ((aligned(16))); +static double p3[2]; +static double ck[2]; + +static double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +static void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 4) + { + p1[0] = vals[i + 0]; + p1[1] = vals[i + 1]; + + p2[0] = vals[i + 2]; + p2[1] = vals[i + 3]; + + ck[0] = p1[0] - p1[1]; + ck[1] = p2[0] - p2[1]; + + sse3_test_hsubpd (p1, p2, p3); + + fail += chk_pd (ck, p3); + + sse3_test_hsubpd_subsume (p1, p2, p3); + + fail += chk_pd (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-hsubps.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-hsubps.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-hsubps.c (working copy) @@ -0,0 +1,107 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_hsubps_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_hsubps (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_loadu_ps (i2); + + t1 = _mm_hsub_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static void +sse3_test_hsubps_subsume (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_load_ps (i2); + + t1 = _mm_hsub_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static int +chk_ps(float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4] __attribute__ ((aligned(16))); +static float p3[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +static void +TEST () +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 8) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + p1[2] = vals[i+2]; + p1[3] = vals[i+3]; + + p2[0] = vals[i+4]; + p2[1] = vals[i+5]; + p2[2] = vals[i+6]; + p2[3] = vals[i+7]; + + ck[0] = p1[0] - p1[1]; + ck[1] = p1[2] - p1[3]; + ck[2] = p2[0] - p2[1]; + ck[3] = p2[2] - p2[3]; + + sse3_test_hsubps (p1, p2, p3); + + fail += chk_ps (ck, p3); + + sse3_test_hsubps_subsume (p1, p2, p3); + + fail += chk_ps (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-lddqu.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-lddqu.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-lddqu.c (working copy) @@ -0,0 +1,79 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_lddqu_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_lddqu (double *i1, double *r) +{ + __m128i t1 = _mm_lddqu_si128 ((__m128i *) i1); + + _mm_storeu_si128 ((__m128i *) r, t1); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2]; +static double p2[2]; +static double ck[2]; + +static double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +static void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 2) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + + sse3_test_lddqu (p1, p2); + + ck[0] = p1[0]; + ck[1] = p1[1]; + + fail += chk_pd (ck, p2); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-movddup.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-movddup.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-movddup.c (working copy) @@ -0,0 +1,134 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_movddup_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_movddup_mem (double *i1, double *r) +{ + __m128d t1 = _mm_loaddup_pd (i1); + + _mm_storeu_pd (r, t1); +} + +static double cnst1 [2] = {1.0, 1.0}; + +static void +sse3_test_movddup_reg (double *i1, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_loadu_pd (&cnst1[0]); + + t1 = _mm_mul_pd (t1, t2); + t2 = _mm_movedup_pd (t1); + + _mm_storeu_pd (r, t2); +} + +static void +sse3_test_movddup_reg_subsume_unaligned (double *i1, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_movedup_pd (t1); + + _mm_storeu_pd (r, t2); +} + +static void +sse3_test_movddup_reg_subsume_ldsd (double *i1, double *r) +{ + __m128d t1 = _mm_load_sd (i1); + __m128d t2 = _mm_movedup_pd (t1); + + _mm_storeu_pd (r, t2); +} + +static void +sse3_test_movddup_reg_subsume (double *i1, double *r) +{ + __m128d t1 = _mm_load_pd (i1); + __m128d t2 = _mm_movedup_pd (t1); + + _mm_storeu_pd (r, t2); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2] __attribute__ ((aligned(16))); +static double p2[2]; +static double ck[2]; + +static double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +static void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 1) + { + p1[0] = vals[i+0]; + + ck[0] = p1[0]; + ck[1] = p1[0]; + + sse3_test_movddup_mem (p1, p2); + + fail += chk_pd (ck, p2); + + sse3_test_movddup_reg (p1, p2); + + fail += chk_pd (ck, p2); + + sse3_test_movddup_reg_subsume (p1, p2); + + fail += chk_pd (ck, p2); + + sse3_test_movddup_reg_subsume_unaligned (p1, p2); + + fail += chk_pd (ck, p2); + + sse3_test_movddup_reg_subsume_ldsd (p1, p2); + + fail += chk_pd (ck, p2); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-movshdup.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-movshdup.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-movshdup.c (working copy) @@ -0,0 +1,97 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_movshdup_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_movshdup_reg (float *i1, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_movehdup_ps (t1); + + _mm_storeu_ps (r, t2); +} + +static void +sse3_test_movshdup_reg_subsume (float *i1, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_movehdup_ps (t1); + + _mm_storeu_ps (r, t2); +} + +static int +chk_ps (float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +static void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 2) + { + p1[0] = 0.0; + p1[1] = vals[i+0]; + p1[2] = 1.0; + p1[3] = vals[i+1]; + + ck[0] = p1[1]; + ck[1] = p1[1]; + ck[2] = p1[3]; + ck[3] = p1[3]; + + sse3_test_movshdup_reg (p1, p2); + + fail += chk_ps (ck, p2); + + sse3_test_movshdup_reg_subsume (p1, p2); + + fail += chk_ps (ck, p2); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-movsldup.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-movsldup.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-movsldup.c (working copy) @@ -0,0 +1,97 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_movsldup_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_movsldup_reg (float *i1, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_moveldup_ps (t1); + + _mm_storeu_ps (r, t2); +} + +static void +sse3_test_movsldup_reg_subsume (float *i1, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_moveldup_ps (t1); + + _mm_storeu_ps (r, t2); +} + +static int +chk_ps (float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +static void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 2) + { + p1[0] = vals[i+0]; + p1[1] = 0.0; + p1[2] = vals[i+1]; + p1[3] = 1.0; + + ck[0] = p1[0]; + ck[1] = p1[0]; + ck[2] = p1[2]; + ck[3] = p1[2]; + + sse3_test_movsldup_reg (p1, p2); + + fail += chk_ps (ck, p2); + + sse3_test_movsldup_reg_subsume (p1, p2); + + fail += chk_ps (ck, p2); + } + + if (fail != 0) + abort (); +} =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr37191.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr37191.c (working copy) @@ -0,0 +1,51 @@ +/* { dg-do compile } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-options "-O3 -mpower8-vector" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 + +#include +#include +#include + +#if 0 +extern const uint64_t ff_bone; +#endif + +static inline void transpose4x4(uint8_t *dst, uint8_t *src, ptrdiff_t dst_stride, ptrdiff_t src_stride) { + __m64 row0 = _mm_cvtsi32_si64(*(unsigned*)(src + (0 * src_stride))); + __m64 row1 = _mm_cvtsi32_si64(*(unsigned*)(src + (1 * src_stride))); + __m64 row2 = _mm_cvtsi32_si64(*(unsigned*)(src + (2 * src_stride))); + __m64 row3 = _mm_cvtsi32_si64(*(unsigned*)(src + (3 * src_stride))); + __m64 tmp0 = _mm_unpacklo_pi8(row0, row1); + __m64 tmp1 = _mm_unpacklo_pi8(row2, row3); + __m64 row01 = _mm_unpacklo_pi16(tmp0, tmp1); + __m64 row23 = _mm_unpackhi_pi16(tmp0, tmp1); + *((unsigned*)(dst + (0 * dst_stride))) = _mm_cvtsi64_si32(row01); + *((unsigned*)(dst + (1 * dst_stride))) = _mm_cvtsi64_si32(_mm_unpackhi_pi32(row01, row01)); + *((unsigned*)(dst + (2 * dst_stride))) = _mm_cvtsi64_si32(row23); + *((unsigned*)(dst + (3 * dst_stride))) = _mm_cvtsi64_si32(_mm_unpackhi_pi32(row23, row23)); +} + +#if 0 +static inline void h264_loop_filter_chroma_intra_mmx2(uint8_t *pix, int stride, int alpha1, int beta1) +{ + asm volatile( + "" + :: "r"(pix-2*stride), "r"(pix), "r"((long)stride), + "m"(alpha1), "m"(beta1), "m"(ff_bone) + ); +} +#endif + +void h264_h_loop_filter_chroma_intra_mmx2(uint8_t *pix, int stride, int alpha, int beta) +{ + uint8_t trans[8*4] __attribute__ ((aligned (8))); + transpose4x4(trans, pix-2, 8, stride); + transpose4x4(trans+4, pix-2+4*stride, 8, stride); +// h264_loop_filter_chroma_intra_mmx2(trans+2*8, 8, alpha-1, beta-1); + transpose4x4(pix-2, trans, stride, 8); + transpose4x4(pix-2+4*stride, trans+4, stride, 8); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-addsubpd.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-addsubpd.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-addsubpd.c (working copy) @@ -0,0 +1,102 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_addsubpd_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_addsubpd (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_loadu_pd (i2); + + t1 = _mm_addsub_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static void +sse3_test_addsubpd_subsume (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_load_pd (i1); + __m128d t2 = _mm_load_pd (i2); + + t1 = _mm_addsub_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2] __attribute__ ((aligned(16))); +static double p2[2] __attribute__ ((aligned(16))); +static double p3[2]; +static double ck[2]; + +double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +static +void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 4) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + + p2[0] = vals[i+2]; + p2[1] = vals[i+3]; + + ck[0] = p1[0] - p2[0]; + ck[1] = p1[1] + p2[1]; + + sse3_test_addsubpd (p1, p2, p3); + + fail += chk_pd (ck, p3); + + sse3_test_addsubpd_subsume (p1, p2, p3); + + fail += chk_pd (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-addsubps.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-addsubps.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-addsubps.c (working copy) @@ -0,0 +1,108 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_addsubps_1 +#endif + +#include + +static void +sse3_test_addsubps (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_loadu_ps (i2); + + t1 = _mm_addsub_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static void +sse3_test_addsubps_subsume (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_load_ps (i2); + + t1 = _mm_addsub_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static int +chk_ps (float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4] __attribute__ ((aligned(16))); +static float p3[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals); i += 8) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + p1[2] = vals[i+2]; + p1[3] = vals[i+3]; + + p2[0] = vals[i+4]; + p2[1] = vals[i+5]; + p2[2] = vals[i+6]; + p2[3] = vals[i+7]; + + ck[0] = p1[0] - p2[0]; + ck[1] = p1[1] + p2[1]; + ck[2] = p1[2] - p2[2]; + ck[3] = p1[3] + p2[3]; + + sse3_test_addsubps (p1, p2, p3); + + fail += chk_ps (ck, p3); + + sse3_test_addsubps_subsume (p1, p2, p3); + + fail += chk_ps (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-check.h =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-check.h (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-check.h (working copy) @@ -0,0 +1,43 @@ +#include +#include + +#include "m128-check.h" + +/* define DEBUG replace abort with printf on error. */ +//#define DEBUG 1 + +#define TEST sse3_test + +static void sse3_test (void); + +static void +__attribute__ ((noinline)) +do_test (void) +{ + sse3_test (); +} + +int +main () +{ +#ifdef __BUILTIN_CPU_SUPPORTS__ + /* Most SSE intrinsic operations can be implemented via VMX + instructions, but some operations may be faster / simpler + using the POWER8 VSX instructions. This is especially true + when we are transferring / converting to / from __m64 types. + The direct register transfer instructions from POWER8 are + especially important. So we test for arch_2_07. */ + if (__builtin_cpu_supports ("arch_2_07")) + { + do_test (); +#ifdef DEBUG + printf ("PASSED\n"); +#endif + } +#ifdef DEBUG + else + printf ("SKIPPED\n"); +#endif +#endif /* __BUILTIN_CPU_SUPPORTS__ */ + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/sse3-haddpd.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-haddpd.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-haddpd.c (working copy) @@ -0,0 +1,100 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_haddpd_1 +#endif +#include + +static void +sse3_test_haddpd (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_loadu_pd (i2); + + t1 = _mm_hadd_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static void +sse3_test_haddpd_subsume (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_load_pd (i1); + __m128d t2 = _mm_load_pd (i2); + + t1 = _mm_hadd_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2] __attribute__ ((aligned(16))); +static double p2[2] __attribute__ ((aligned(16))); +static double p3[2]; +static double ck[2]; + +static double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 4) + { + p1[0] = vals[i + 0]; + p1[1] = vals[i + 1]; + + p2[0] = vals[i + 2]; + p2[1] = vals[i + 3]; + + ck[0] = p1[0] + p1[1]; + ck[1] = p2[0] + p2[1]; + + sse3_test_haddpd (p1, p2, p3); + + fail += chk_pd (ck, p3); + + sse3_test_haddpd_subsume (p1, p2, p3); + + fail += chk_pd (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-haddps.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-haddps.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-haddps.c (working copy) @@ -0,0 +1,108 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_haddps_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_haddps (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_loadu_ps (i2); + + t1 = _mm_hadd_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static void +sse3_test_haddps_subsume (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_load_ps (i2); + + t1 = _mm_hadd_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static int +chk_ps(float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4] __attribute__ ((aligned(16))); +static float p3[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void +TEST () +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 8) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + p1[2] = vals[i+2]; + p1[3] = vals[i+3]; + + p2[0] = vals[i+4]; + p2[1] = vals[i+5]; + p2[2] = vals[i+6]; + p2[3] = vals[i+7]; + + ck[0] = p1[0] + p1[1]; + ck[1] = p1[2] + p1[3]; + ck[2] = p2[0] + p2[1]; + ck[3] = p2[2] + p2[3]; + + sse3_test_haddps (p1, p2, p3); + + fail += chk_ps (ck, p3); + + sse3_test_haddps_subsume (p1, p2, p3); + + fail += chk_ps (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-hsubpd.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-hsubpd.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-hsubpd.c (working copy) @@ -0,0 +1,101 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_hsubpd_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_hsubpd (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_loadu_pd (i2); + + t1 = _mm_hsub_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static void +sse3_test_hsubpd_subsume (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_load_pd (i1); + __m128d t2 = _mm_load_pd (i2); + + t1 = _mm_hsub_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2] __attribute__ ((aligned(16))); +static double p2[2] __attribute__ ((aligned(16))); +static double p3[2]; +static double ck[2]; + +static double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 4) + { + p1[0] = vals[i + 0]; + p1[1] = vals[i + 1]; + + p2[0] = vals[i + 2]; + p2[1] = vals[i + 3]; + + ck[0] = p1[0] - p1[1]; + ck[1] = p2[0] - p2[1]; + + sse3_test_hsubpd (p1, p2, p3); + + fail += chk_pd (ck, p3); + + sse3_test_hsubpd_subsume (p1, p2, p3); + + fail += chk_pd (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-hsubps.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-hsubps.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-hsubps.c (working copy) @@ -0,0 +1,108 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_hsubps_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_hsubps (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_loadu_ps (i2); + + t1 = _mm_hsub_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static void +sse3_test_hsubps_subsume (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_load_ps (i2); + + t1 = _mm_hsub_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static int +chk_ps(float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4] __attribute__ ((aligned(16))); +static float p3[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void +TEST () +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 8) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + p1[2] = vals[i+2]; + p1[3] = vals[i+3]; + + p2[0] = vals[i+4]; + p2[1] = vals[i+5]; + p2[2] = vals[i+6]; + p2[3] = vals[i+7]; + + ck[0] = p1[0] - p1[1]; + ck[1] = p1[2] - p1[3]; + ck[2] = p2[0] - p2[1]; + ck[3] = p2[2] - p2[3]; + + sse3_test_hsubps (p1, p2, p3); + + fail += chk_ps (ck, p3); + + sse3_test_hsubps_subsume (p1, p2, p3); + + fail += chk_ps (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-lddqu.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-lddqu.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-lddqu.c (working copy) @@ -0,0 +1,80 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_lddqu_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_lddqu (double *i1, double *r) +{ + __m128i t1 = _mm_lddqu_si128 ((__m128i *) i1); + + _mm_storeu_si128 ((__m128i *) r, t1); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2]; +static double p2[2]; +static double ck[2]; + +static double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 2) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + + sse3_test_lddqu (p1, p2); + + ck[0] = p1[0]; + ck[1] = p1[1]; + + fail += chk_pd (ck, p2); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-movddup.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-movddup.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-movddup.c (working copy) @@ -0,0 +1,135 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_movddup_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_movddup_mem (double *i1, double *r) +{ + __m128d t1 = _mm_loaddup_pd (i1); + + _mm_storeu_pd (r, t1); +} + +static double cnst1 [2] = {1.0, 1.0}; + +static void +sse3_test_movddup_reg (double *i1, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_loadu_pd (&cnst1[0]); + + t1 = _mm_mul_pd (t1, t2); + t2 = _mm_movedup_pd (t1); + + _mm_storeu_pd (r, t2); +} + +static void +sse3_test_movddup_reg_subsume_unaligned (double *i1, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_movedup_pd (t1); + + _mm_storeu_pd (r, t2); +} + +static void +sse3_test_movddup_reg_subsume_ldsd (double *i1, double *r) +{ + __m128d t1 = _mm_load_sd (i1); + __m128d t2 = _mm_movedup_pd (t1); + + _mm_storeu_pd (r, t2); +} + +static void +sse3_test_movddup_reg_subsume (double *i1, double *r) +{ + __m128d t1 = _mm_load_pd (i1); + __m128d t2 = _mm_movedup_pd (t1); + + _mm_storeu_pd (r, t2); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2] __attribute__ ((aligned(16))); +static double p2[2]; +static double ck[2]; + +static double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 1) + { + p1[0] = vals[i+0]; + + ck[0] = p1[0]; + ck[1] = p1[0]; + + sse3_test_movddup_mem (p1, p2); + + fail += chk_pd (ck, p2); + + sse3_test_movddup_reg (p1, p2); + + fail += chk_pd (ck, p2); + + sse3_test_movddup_reg_subsume (p1, p2); + + fail += chk_pd (ck, p2); + + sse3_test_movddup_reg_subsume_unaligned (p1, p2); + + fail += chk_pd (ck, p2); + + sse3_test_movddup_reg_subsume_ldsd (p1, p2); + + fail += chk_pd (ck, p2); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-movshdup.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-movshdup.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-movshdup.c (working copy) @@ -0,0 +1,98 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_movshdup_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_movshdup_reg (float *i1, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_movehdup_ps (t1); + + _mm_storeu_ps (r, t2); +} + +static void +sse3_test_movshdup_reg_subsume (float *i1, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_movehdup_ps (t1); + + _mm_storeu_ps (r, t2); +} + +static int +chk_ps (float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 2) + { + p1[0] = 0.0; + p1[1] = vals[i+0]; + p1[2] = 1.0; + p1[3] = vals[i+1]; + + ck[0] = p1[1]; + ck[1] = p1[1]; + ck[2] = p1[3]; + ck[3] = p1[3]; + + sse3_test_movshdup_reg (p1, p2); + + fail += chk_ps (ck, p2); + + sse3_test_movshdup_reg_subsume (p1, p2); + + fail += chk_ps (ck, p2); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-movsldup.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-movsldup.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse3-movsldup.c (working copy) @@ -0,0 +1,98 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_movsldup_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_movsldup_reg (float *i1, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_moveldup_ps (t1); + + _mm_storeu_ps (r, t2); +} + +static void +sse3_test_movsldup_reg_subsume (float *i1, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_moveldup_ps (t1); + + _mm_storeu_ps (r, t2); +} + +static int +chk_ps (float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 2) + { + p1[0] = vals[i+0]; + p1[1] = 0.0; + p1[2] = vals[i+1]; + p1[3] = 1.0; + + ck[0] = p1[0]; + ck[1] = p1[0]; + ck[2] = p1[2]; + ck[3] = p1[2]; + + sse3_test_movsldup_reg (p1, p2); + + fail += chk_ps (ck, p2); + + sse3_test_movsldup_reg_subsume (p1, p2); + + fail += chk_ps (ck, p2); + } + + if (fail != 0) + abort (); +}