From patchwork Sun Dec 27 23:00:05 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 561198 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 5E8C6140CA7 for ; Mon, 28 Dec 2015 10:00:28 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=xOM2+IrW; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:content-type:date:message-id:mime-version :content-transfer-encoding; q=dns; s=default; b=XhgXvamaka7x/PdD OAW8mpnp42KtJbsdt+JQQWwJRVCGYL65F68LmRN30IHx+Lk1EfW9hoLReSmawMHC Fr8KujX0fl/otIUI+aQNpRucTyWMjH0e3gq7oj/TB8n32FxXnFvzX/gaRnErQ5YS I5GBcXYV1goYFw7dQkKDDOlXWRQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:content-type:date:message-id:mime-version :content-transfer-encoding; s=default; bh=nyiM1mHGoAmEOpT4rOP/rc l2mtk=; b=xOM2+IrWbsxnzfzqJbm3UjEWhWLeSI91VPzBTe1lpQQDvUQw6WLSj3 tnDEPFg3Yx7jbBm3Nd0PxMEMLfTJ7WSY4NfpPa/bqehQFf/y+BKUPffA2tUGqM7M 2wh8FpXZQHCZLpVSDrg/1F+XsqqZUEUwZpQQXyTT95VS9lXhE+7fs= Received: (qmail 3132 invoked by alias); 27 Dec 2015 23:00:19 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 3088 invoked by uid 89); 27 Dec 2015 23:00:15 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL, BAYES_05, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=MEM_P, memory_operand, dgskipif, dg-skip-if X-HELO: e18.ny.us.ibm.com Received: from e18.ny.us.ibm.com (HELO e18.ny.us.ibm.com) (129.33.205.208) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Sun, 27 Dec 2015 23:00:13 +0000 Received: from localhost by e18.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sun, 27 Dec 2015 18:00:11 -0500 Received: from d01dlp02.pok.ibm.com (9.56.250.167) by e18.ny.us.ibm.com (146.89.104.205) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Sun, 27 Dec 2015 18:00:08 -0500 X-IBM-Helo: d01dlp02.pok.ibm.com X-IBM-MailFrom: wschmidt@linux.vnet.ibm.com X-IBM-RcptTo: gcc-patches@gcc.gnu.org Received: from b01cxnp22036.gho.pok.ibm.com (b01cxnp22036.gho.pok.ibm.com [9.57.198.26]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 555C96E8041 for ; Sun, 27 Dec 2015 18:00:07 -0500 (EST) Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by b01cxnp22036.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id tBRN07j223724152 for ; Sun, 27 Dec 2015 23:00:07 GMT Received: from d01av04.pok.ibm.com (localhost [127.0.0.1]) by d01av04.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id tBRN07TG005978 for ; Sun, 27 Dec 2015 18:00:07 -0500 Received: from [9.80.20.226] ([9.80.20.226]) by d01av04.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id tBRN05lE005654; Sun, 27 Dec 2015 18:00:06 -0500 Subject: [PATCH, rs6000] Add support for lxvx and stxvx P9 instructions From: Bill Schmidt To: gcc-patches@gcc.gnu.org Cc: dje.gcc@gmail.com Date: Sun, 27 Dec 2015 17:00:05 -0600 Message-ID: <1451257205.17947.9.camel@oc8801110288.ibm.com> Mime-Version: 1.0 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15122723-0045-0000-0000-000002F293E3 X-IsSubscribed: yes Hi, POWER9 adds endian-neutral load and store vector instructions that support unaligned accesses. This allows more efficient code generation than POWER8. With these new instructions, we no longer generate the load-swap and swap-store sequences, and we no longer need to perform swap optimization to get rid of unnecessary swaps. We also need to make sure that we don't perform P8-specific vector load fusion sequences when the new instructions are available. This patch includes two tests that verify the correct instructions are generated with -mcpu=power9. One of these generates a pattern that causes P8-specific vector load fusion with -mcpu=power8, and verifies we don't generate it with -mcpu=power9. Besides these tests, I hand-tested all the swaps-p8* tests to verify correct generation of lxvx and stxvx rather than the old P8 sequences. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Ok for trunk, and then for backport to GCC 5? Thanks, Bill [gcc] 2015-12-27 Bill Schmidt * config/rs6000/rs6000.c (rs6000_emit_le_vsx_move): Verify that this is never called when lxvx/stxvx are available. (pass_analyze_swaps::gate): Don't perform swap optimization when lxvx/stxvx are available. * config/rs6000/vector.md (mov): Don't call rs6000_emit_le_vsx_move when lxvx/stxvx are available. * config/rs6000/vsx.md (*p9_vecload_): New define_insn. (*p9_vecstore_): Likewise. (*vsx_le_perm_load_:VSX_LE): Disable when lxvx/stxvx are available. (*vsx_le_perm_load_:VSX_W): Likewise. (*vsx_le_perm_load_v8hi): Likewise. (*vsx_le_perm_load_v16qi): Likewise. (*vsx_le_perm_store_:VSX_LE): Likewise. ([related define_splits]): Likewise. (*vsx_le_perm_store_:VSX_W): Likewise. ([related define_splits]): Likewise. (*vsx_le_perm_store_v8hi): Likewise. ([related define_splits]): Likewise. (*vsx_le_perm_store_v16qi): Likewise. ([related define_splits]): Likewise. (*vsx_lxvd2x2_le_): Likewise. (*vsx_lxvd2x4_le_): Likewise. (*vsx_lxvd2x8_le_V8HI): Likewise. (*vsx_lvxd2x16_le_V16QI): Likewise. (*vsx_stxvd2x2_le_): Likewise. (*vsx_stxvd2x4_le_): Likewise. (*vsx_stxvd2x8_le_V8HI): Likewise. (*vsx_stxvdx16_le_V16QI): Likewise. [gcc/testsuite] 2015-12-27 Bill Schmidt * gcc.target/powerpc/p9-lxvx-stxvx-1.c: New. * gcc.target/powerpc/p9-lxvx-stxvx-2.c: New. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 231965) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -8904,6 +8904,7 @@ rs6000_emit_le_vsx_move (rtx dest, rtx source, mac { gcc_assert (!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode) + && !TARGET_P9_VECTOR && !gpr_or_gpr_p (dest, source) && (MEM_P (source) ^ MEM_P (dest))); @@ -37793,7 +37794,7 @@ class pass_analyze_swaps : public rtl_opt_pass virtual bool gate (function *) { return (optimize > 0 && !BYTES_BIG_ENDIAN && TARGET_VSX - && rs6000_optimize_swaps); + && !TARGET_P9_VECTOR && rs6000_optimize_swaps); } virtual unsigned int execute (function *fun) Index: gcc/config/rs6000/vector.md =================================================================== --- gcc/config/rs6000/vector.md (revision 231965) +++ gcc/config/rs6000/vector.md (working copy) @@ -113,6 +113,7 @@ } if (!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode) + && !TARGET_P9_VECTOR && !gpr_or_gpr_p (operands[0], operands[1]) && (memory_operand (operands[0], mode) ^ memory_operand (operands[1], mode))) Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 231965) +++ gcc/config/rs6000/vsx.md (working copy) @@ -301,6 +301,24 @@ UNSPEC_VSX_XVCVDPUXDS ]) +;; VSX (P9) moves + +(define_insn "*p9_vecload_" + [(set (match_operand:VSX_M 0 "vsx_register_operand" "=") + (match_operand:VSX_M 1 "memory_operand" "Z"))] + "TARGET_P9_VECTOR" + "lxvx %x0,%y1" + [(set_attr "type" "vecload") + (set_attr "length" "4")]) + +(define_insn "*p9_vecstore_" + [(set (match_operand:VSX_M 0 "memory_operand" "=Z") + (match_operand:VSX_M 1 "vsx_register_operand" ""))] + "TARGET_P9_VECTOR" + "stxvx %x1,%y0" + [(set_attr "type" "vecstore") + (set_attr "length" "4")]) + ;; VSX moves ;; The patterns for LE permuted loads and stores come before the general @@ -308,9 +326,9 @@ (define_insn_and_split "*vsx_le_perm_load_" [(set (match_operand:VSX_LE 0 "vsx_register_operand" "=") (match_operand:VSX_LE 1 "memory_operand" "Z"))] - "!BYTES_BIG_ENDIAN && TARGET_VSX" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" "#" - "!BYTES_BIG_ENDIAN && TARGET_VSX" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" [(set (match_dup 2) (vec_select: (match_dup 1) @@ -331,9 +349,9 @@ (define_insn_and_split "*vsx_le_perm_load_" [(set (match_operand:VSX_W 0 "vsx_register_operand" "=") (match_operand:VSX_W 1 "memory_operand" "Z"))] - "!BYTES_BIG_ENDIAN && TARGET_VSX" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" "#" - "!BYTES_BIG_ENDIAN && TARGET_VSX" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" [(set (match_dup 2) (vec_select: (match_dup 1) @@ -356,9 +374,9 @@ (define_insn_and_split "*vsx_le_perm_load_v8hi" [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa") (match_operand:V8HI 1 "memory_operand" "Z"))] - "!BYTES_BIG_ENDIAN && TARGET_VSX" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" "#" - "!BYTES_BIG_ENDIAN && TARGET_VSX" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" [(set (match_dup 2) (vec_select:V8HI (match_dup 1) @@ -385,9 +403,9 @@ (define_insn_and_split "*vsx_le_perm_load_v16qi" [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") (match_operand:V16QI 1 "memory_operand" "Z"))] - "!BYTES_BIG_ENDIAN && TARGET_VSX" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" "#" - "!BYTES_BIG_ENDIAN && TARGET_VSX" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" [(set (match_dup 2) (vec_select:V16QI (match_dup 1) @@ -422,7 +440,7 @@ (define_insn "*vsx_le_perm_store_" [(set (match_operand:VSX_LE 0 "memory_operand" "=Z") (match_operand:VSX_LE 1 "vsx_register_operand" "+"))] - "!BYTES_BIG_ENDIAN && TARGET_VSX" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" "#" [(set_attr "type" "vecstore") (set_attr "length" "12")]) @@ -430,7 +448,7 @@ (define_split [(set (match_operand:VSX_LE 0 "memory_operand" "") (match_operand:VSX_LE 1 "vsx_register_operand" ""))] - "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR && !reload_completed" [(set (match_dup 2) (vec_select: (match_dup 1) @@ -449,7 +467,7 @@ (define_split [(set (match_operand:VSX_LE 0 "memory_operand" "") (match_operand:VSX_LE 1 "vsx_register_operand" ""))] - "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR && reload_completed" [(set (match_dup 1) (vec_select: (match_dup 1) @@ -467,7 +485,7 @@ (define_insn "*vsx_le_perm_store_" [(set (match_operand:VSX_W 0 "memory_operand" "=Z") (match_operand:VSX_W 1 "vsx_register_operand" "+"))] - "!BYTES_BIG_ENDIAN && TARGET_VSX" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" "#" [(set_attr "type" "vecstore") (set_attr "length" "12")]) @@ -475,7 +493,7 @@ (define_split [(set (match_operand:VSX_W 0 "memory_operand" "") (match_operand:VSX_W 1 "vsx_register_operand" ""))] - "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR && !reload_completed" [(set (match_dup 2) (vec_select: (match_dup 1) @@ -496,7 +514,7 @@ (define_split [(set (match_operand:VSX_W 0 "memory_operand" "") (match_operand:VSX_W 1 "vsx_register_operand" ""))] - "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR && reload_completed" [(set (match_dup 1) (vec_select: (match_dup 1) @@ -517,7 +535,7 @@ (define_insn "*vsx_le_perm_store_v8hi" [(set (match_operand:V8HI 0 "memory_operand" "=Z") (match_operand:V8HI 1 "vsx_register_operand" "+wa"))] - "!BYTES_BIG_ENDIAN && TARGET_VSX" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" "#" [(set_attr "type" "vecstore") (set_attr "length" "12")]) @@ -525,7 +543,7 @@ (define_split [(set (match_operand:V8HI 0 "memory_operand" "") (match_operand:V8HI 1 "vsx_register_operand" ""))] - "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR && !reload_completed" [(set (match_dup 2) (vec_select:V8HI (match_dup 1) @@ -550,7 +568,7 @@ (define_split [(set (match_operand:V8HI 0 "memory_operand" "") (match_operand:V8HI 1 "vsx_register_operand" ""))] - "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR && reload_completed" [(set (match_dup 1) (vec_select:V8HI (match_dup 1) @@ -577,7 +595,7 @@ (define_insn "*vsx_le_perm_store_v16qi" [(set (match_operand:V16QI 0 "memory_operand" "=Z") (match_operand:V16QI 1 "vsx_register_operand" "+wa"))] - "!BYTES_BIG_ENDIAN && TARGET_VSX" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" "#" [(set_attr "type" "vecstore") (set_attr "length" "12")]) @@ -585,7 +603,7 @@ (define_split [(set (match_operand:V16QI 0 "memory_operand" "") (match_operand:V16QI 1 "vsx_register_operand" ""))] - "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR && !reload_completed" [(set (match_dup 2) (vec_select:V16QI (match_dup 1) @@ -618,7 +636,7 @@ (define_split [(set (match_operand:V16QI 0 "memory_operand" "") (match_operand:V16QI 1 "vsx_register_operand" ""))] - "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed" + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR && reload_completed" [(set (match_dup 1) (vec_select:V16QI (match_dup 1) @@ -1781,7 +1799,7 @@ (vec_select:VSX_LE (match_operand:VSX_LE 1 "memory_operand" "Z") (parallel [(const_int 1) (const_int 0)])))] - "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode)" + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode) && !TARGET_P9_VECTOR" "lxvd2x %x0,%y1" [(set_attr "type" "vecload")]) @@ -1791,7 +1809,7 @@ (match_operand:VSX_W 1 "memory_operand" "Z") (parallel [(const_int 2) (const_int 3) (const_int 0) (const_int 1)])))] - "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode)" + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode) && !TARGET_P9_VECTOR" "lxvd2x %x0,%y1" [(set_attr "type" "vecload")]) @@ -1803,7 +1821,7 @@ (const_int 6) (const_int 7) (const_int 0) (const_int 1) (const_int 2) (const_int 3)])))] - "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)" + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode) && !TARGET_P9_VECTOR" "lxvd2x %x0,%y1" [(set_attr "type" "vecload")]) @@ -1819,7 +1837,7 @@ (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)])))] - "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)" + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode) && !TARGET_P9_VECTOR" "lxvd2x %x0,%y1" [(set_attr "type" "vecload")]) @@ -1830,7 +1848,7 @@ (vec_select:VSX_LE (match_operand:VSX_LE 1 "vsx_register_operand" "") (parallel [(const_int 1) (const_int 0)])))] - "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode)" + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode) && !TARGET_P9_VECTOR" "stxvd2x %x1,%y0" [(set_attr "type" "vecstore")]) @@ -1840,7 +1858,7 @@ (match_operand:VSX_W 1 "vsx_register_operand" "") (parallel [(const_int 2) (const_int 3) (const_int 0) (const_int 1)])))] - "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode)" + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode) && !TARGET_P9_VECTOR" "stxvd2x %x1,%y0" [(set_attr "type" "vecstore")]) @@ -1852,7 +1870,7 @@ (const_int 6) (const_int 7) (const_int 0) (const_int 1) (const_int 2) (const_int 3)])))] - "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)" + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode) && !TARGET_P9_VECTOR" "stxvd2x %x1,%y0" [(set_attr "type" "vecstore")]) @@ -1868,7 +1886,7 @@ (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)])))] - "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)" + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode) && !TARGET_P9_VECTOR" "stxvd2x %x1,%y0" [(set_attr "type" "vecstore")]) @@ -2456,7 +2474,7 @@ (set (match_operand:VSX_M2 2 "vsx_register_operand" "") (mem:VSX_M2 (plus:P (match_dup 0) (match_operand:P 3 "int_reg_operand" ""))))] - "TARGET_VSX && TARGET_P8_FUSION" + "TARGET_VSX && TARGET_P8_FUSION && !TARGET_P9_VECTOR" "li %0,%1\t\t\t# vector load fusion\;lxx %x2,%0,%3" [(set_attr "length" "8") (set_attr "type" "vecload")]) @@ -2467,7 +2485,7 @@ (set (match_operand:VSX_M2 2 "vsx_register_operand" "") (mem:VSX_M2 (plus:P (match_operand:P 3 "int_reg_operand" "") (match_dup 0))))] - "TARGET_VSX && TARGET_P8_FUSION" + "TARGET_VSX && TARGET_P8_FUSION && !TARGET_P9_VECTOR" "li %0,%1\t\t\t# vector load fusion\;lxx %x2,%0,%3" [(set_attr "length" "8") (set_attr "type" "vecload")]) Index: gcc/testsuite/gcc.target/powerpc/p9-lxvx-stxvx-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/p9-lxvx-stxvx-1.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/p9-lxvx-stxvx-1.c (working copy) @@ -0,0 +1,26 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-options "-mcpu=power9 -O3" } */ +/* { dg-final { scan-assembler "lxvx" } } */ +/* { dg-final { scan-assembler "stxvx" } } */ +/* { dg-final { scan-assembler-not "lxvd2x" } } */ +/* { dg-final { scan-assembler-not "stxvd2x" } } */ +/* { dg-final { scan-assembler-not "xxpermdi" } } */ + +/* Verify P9 vector loads and stores are used rather than the + load-swap/swap-store workarounds for P8. */ +#define N 16 + +signed char ca[N] __attribute__((aligned(16))); +signed char cb[] __attribute__((aligned(16))) + = {8, 7, 6, 5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5, -6, -7}; +signed char cc[] __attribute__((aligned(16))) + = {1, 1, 2, 2, 3, 3, 2, 2, 1, 1, 0, 0, -1, -1, -2, -2}; + +__attribute__((noinline)) void foo () +{ + int i; + for (i = 0; i < N; i++) { + ca[i] = cb[i] - cc[i]; + } +} Index: gcc/testsuite/gcc.target/powerpc/p9-lxvx-stxvx-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/p9-lxvx-stxvx-2.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/p9-lxvx-stxvx-2.c (working copy) @@ -0,0 +1,15 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-options "-mcpu=power9 -O1" } */ +/* { dg-final { scan-assembler "lxvx" } } */ +/* { dg-final { scan-assembler "stvewx" } } */ +/* { dg-final { scan-assembler-not "lxvd2x" } } */ + +/* Verify we don't perform P8 load-vector fusion on P9. */ +#include + +void f (void *p) +{ + vector unsigned int u32 = vec_vsx_ld (1, (const unsigned int *)p); + vec_ste (u32, 1, (unsigned int *)p); +}