From patchwork Tue Oct 1 02:04:37 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 279311 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id DCF2A2C0098 for ; Tue, 1 Oct 2013 12:04:48 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:date:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=urv FLD7/P+jKlUZxuyF2fmfF9Z691ko3HwqFXsFbZAKUA8qB6qs5F/jiRXsf3WWInjl D9UrPDtnMgFxwUj6+t/Y7EtWUDc2eQKZ7UVgL2gltn28CcOufVwLsGg+wiJCTGRQ /UkDRgxS5QH1W0tA3H4Be1eyZJDKNSaali4FZ9AI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:date:content-type :content-transfer-encoding:mime-version; s=default; bh=nuTWjTONl xBPiUssAfzLkOmwzrg=; b=KfqGJZLaHfSBAWY7THPmOCkCUSmFZhc1BG1tKscSL aDHqoTzH+IJ6aLhgduRPhd43UtN6OJAwYUwPdTWUfjGWQfkDITKw+XGqTkqOSqrS Lgi7zJHCje31lhvVnxX7mk114sAlNMixt9hjhrLVKpAgxjF2CezIfFa2N1K84ukx ww= Received: (qmail 31450 invoked by alias); 1 Oct 2013 02:04:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 31429 invoked by uid 89); 1 Oct 2013 02:04:39 -0000 Received: from e28smtp03.in.ibm.com (HELO e28smtp03.in.ibm.com) (122.248.162.3) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Tue, 01 Oct 2013 02:04:39 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.4 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e28smtp03.in.ibm.com Received: from /spool/local by e28smtp03.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 1 Oct 2013 07:34:31 +0530 Received: from d28dlp02.in.ibm.com (9.184.220.127) by e28smtp03.in.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 1 Oct 2013 07:34:28 +0530 Received: from d28relay03.in.ibm.com (d28relay03.in.ibm.com [9.184.220.60]) by d28dlp02.in.ibm.com (Postfix) with ESMTP id D1217394004D for ; Tue, 1 Oct 2013 07:34:11 +0530 (IST) Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67]) by d28relay03.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r9126o0D43057362 for ; Tue, 1 Oct 2013 07:36:50 +0530 Received: from d28av05.in.ibm.com (localhost [127.0.0.1]) by d28av05.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id r9124R5h025923 for ; Tue, 1 Oct 2013 07:34:27 +0530 Received: from [9.80.14.61] ([9.80.14.61]) by d28av05.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id r9124ObS025824; Tue, 1 Oct 2013 07:34:26 +0530 Message-ID: <1380593077.367.44.camel@gnopaine> Subject: [PATCH, PowerPC] Change code generation for VSX loads and stores for little endian From: Bill Schmidt To: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com Date: Mon, 30 Sep 2013 21:04:37 -0500 Mime-Version: 1.0 X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13100102-3864-0000-0000-00000A51DC9B X-IsSubscribed: yes This patch implements support for VSX vector loads and stores in little endian mode. VSX loads and stores permute the register image with respect to storage in a unique manner that is not truly little endian. This can cause problems (for example, when a vector appears in a union with a different type). This patch adds an explicit permute to each VSX load and store instruction so that the register image is true little endian. It is desirable to remove redundant pairs of permutes where legal to do so. This work is delayed to a later patch. This patch currently has no effect on generated code because -mvsx is disabled in little endian mode, pending fixes of additional problems with little endian code generation. I tested this by enabling -mvsx in little endian mode and running the regression bucket. Using a GCC code base from August 5, I observed that this patch corrected 187 failures and exposed a new regression. Investigation showed that the regression is not directly related to the patch. Unfortunately the results are not as good on current trunk. It appears we have introduced some more problems for little endian code generation since August 5th, which hides the effectiveness of the patch; most of the VSX vector tests still fail with the patch applied to current trunk. There are a handful of additional regressions, which again are not directly related to the patch. I feel that the patch is well-tested by the August 5 results, and would like to commit it before continuing to investigate the recently introduced problems. I also bootstrapped and tested the patch on a big-endian machine (powerpc64-unknown-linux-gnu) to verify that I introduced no regressions in that environment. Ok for trunk? Thanks, Bill gcc: 2013-09-30 Bill Schmidt * config/rs6000/vector.md (mov): Emit permuted move sequences for LE VSX loads and stores at expand time. * config/rs6000/rs6000-protos.h (rs6000_emit_le_vsx_move): New prototype. * config/rs6000/rs6000.c (rs6000_const_vec): New. (rs6000_gen_le_vsx_permute): New. (rs6000_gen_le_vsx_load): New. (rs6000_gen_le_vsx_store): New. (rs6000_gen_le_vsx_move): New. * config/rs6000/vsx.md (*vsx_le_perm_load_v2di): New. (*vsx_le_perm_load_v4si): New. (*vsx_le_perm_load_v8hi): New. (*vsx_le_perm_load_v16qi): New. (*vsx_le_perm_store_v2di): New. (*vsx_le_perm_store_v4si): New. (*vsx_le_perm_store_v8hi): New. (*vsx_le_perm_store_v16qi): New. (*vsx_xxpermdi2_le_): New. (*vsx_xxpermdi4_le_): New. (*vsx_xxpermdi8_le_V8HI): New. (*vsx_xxpermdi16_le_V16QI): New. (*vsx_lxvd2x2_le_): New. (*vsx_lxvd2x4_le_): New. (*vsx_lxvd2x8_le_V8HI): New. (*vsx_lxvd2x16_le_V16QI): New. (*vsx_stxvd2x2_le_): New. (*vsx_stxvd2x4_le_): New. (*vsx_stxvd2x8_le_V8HI): New. (*vsx_stxvd2x16_le_V16QI): New. gcc/testsuite: 2013-09-30 Bill Schmidt * gcc.target/powerpc/pr43154.c: Skip for ppc64 little endian. * gcc.target/powerpc/fusion.c: Likewise. Index: gcc/testsuite/gcc.target/powerpc/pr43154.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr43154.c (revision 203018) +++ gcc/testsuite/gcc.target/powerpc/pr43154.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */ /* { dg-require-effective-target powerpc_vsx_ok } */ /* { dg-options "-O2 -mcpu=power7" } */ Index: gcc/testsuite/gcc.target/powerpc/fusion.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/fusion.c (revision 203018) +++ gcc/testsuite/gcc.target/powerpc/fusion.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ /* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */ Index: gcc/config/rs6000/vector.md =================================================================== --- gcc/config/rs6000/vector.md (revision 203018) +++ gcc/config/rs6000/vector.md (working copy) @@ -88,7 +88,8 @@ (smax "smax")]) -;; Vector move instructions. +;; Vector move instructions. Little-endian VSX loads and stores require +;; special handling to circumvent "element endianness." (define_expand "mov" [(set (match_operand:VEC_M 0 "nonimmediate_operand" "") (match_operand:VEC_M 1 "any_operand" ""))] @@ -104,6 +105,15 @@ && !vlogical_operand (operands[1], mode)) operands[1] = force_reg (mode, operands[1]); } + if (!BYTES_BIG_ENDIAN + && VECTOR_MEM_VSX_P (mode) + && mode != TImode + && (memory_operand (operands[0], mode) + ^ memory_operand (operands[1], mode))) + { + rs6000_emit_le_vsx_move (operands[0], operands[1], mode); + DONE; + } }) ;; Generic vector floating point load/store instructions. These will match Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (revision 203018) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -122,6 +122,7 @@ extern rtx rs6000_longcall_ref (rtx); extern void rs6000_fatal_bad_address (rtx); extern rtx create_TOC_reference (rtx, rtx); extern void rs6000_split_multireg_move (rtx, rtx); +extern void rs6000_emit_le_vsx_move (rtx, rtx, enum machine_mode); extern void rs6000_emit_move (rtx, rtx, enum machine_mode); extern rtx rs6000_secondary_memory_needed_rtx (enum machine_mode); extern rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode, Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 203018) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -7665,6 +7665,106 @@ rs6000_eliminate_indexed_memrefs (rtx operands[2]) copy_addr_to_reg (XEXP (operands[1], 0))); } +/* Generate a vector of constants to permute MODE for a little-endian + storage operation by swapping the two halves of a vector. */ +static rtvec +rs6000_const_vec (enum machine_mode mode) +{ + int i, subparts; + rtvec v; + + switch (mode) + { + case V2DFmode: + case V2DImode: + subparts = 2; + break; + case V4SFmode: + case V4SImode: + subparts = 4; + break; + case V8HImode: + subparts = 8; + break; + case V16QImode: + subparts = 16; + break; + default: + gcc_unreachable(); + } + + v = rtvec_alloc (subparts); + + for (i = 0; i < subparts / 2; ++i) + RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i + subparts / 2); + for (i = subparts / 2; i < subparts; ++i) + RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i - subparts / 2); + + return v; +} + +/* Generate a permute rtx that represents an lxvd2x, stxvd2x, or xxpermdi + for a VSX load or store operation. */ +rtx +rs6000_gen_le_vsx_permute (rtx source, enum machine_mode mode) +{ + rtx par = gen_rtx_PARALLEL (VOIDmode, rs6000_const_vec (mode)); + return gen_rtx_VEC_SELECT (mode, source, par); +} + +/* Emit a little-endian load from vector memory location SOURCE to VSX + register DEST in mode MODE. The load is done with two permuting + insn's that represent an lxvd2x and xxpermdi. */ +void +rs6000_emit_le_vsx_load (rtx dest, rtx source, enum machine_mode mode) +{ + rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (dest) : dest; + rtx permute_mem = rs6000_gen_le_vsx_permute (source, mode); + rtx permute_reg = rs6000_gen_le_vsx_permute (tmp, mode); + emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_mem)); + emit_insn (gen_rtx_SET (VOIDmode, dest, permute_reg)); +} + +/* Emit a little-endian store to vector memory location DEST from VSX + register SOURCE in mode MODE. The store is done with two permuting + insn's that represent an xxpermdi and an stxvd2x. */ +void +rs6000_emit_le_vsx_store (rtx dest, rtx source, enum machine_mode mode) +{ + rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (source) : source; + rtx permute_src = rs6000_gen_le_vsx_permute (source, mode); + rtx permute_tmp = rs6000_gen_le_vsx_permute (tmp, mode); + emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_src)); + emit_insn (gen_rtx_SET (VOIDmode, dest, permute_tmp)); +} + +/* Emit a sequence representing a little-endian VSX load or store, + moving data from SOURCE to DEST in mode MODE. This is done + separately from rs6000_emit_move to ensure it is called only + during expand. LE VSX loads and stores introduced later are + handled with a split. The expand-time RTL generation allows + us to optimize away redundant pairs of register-permutes. */ +void +rs6000_emit_le_vsx_move (rtx dest, rtx source, enum machine_mode mode) +{ + gcc_assert (!BYTES_BIG_ENDIAN + && VECTOR_MEM_VSX_P (mode) + && mode != TImode + && (MEM_P (source) ^ MEM_P (dest))); + + if (MEM_P (source)) + { + gcc_assert (REG_P (dest)); + rs6000_emit_le_vsx_load (dest, source, mode); + } + else + { + if (!REG_P (source)) + source = force_reg (mode, source); + rs6000_emit_le_vsx_store (dest, source, mode); + } +} + /* Emit a move from SOURCE to DEST in mode MODE. */ void rs6000_emit_move (rtx dest, rtx source, enum machine_mode mode) Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 203018) +++ gcc/config/rs6000/vsx.md (working copy) @@ -216,6 +216,238 @@ ]) ;; VSX moves + +;; The patterns for LE permuted loads and stores come before the general +;; VSX moves so they match first. +(define_insn_and_split "*vsx_le_perm_load_v2di" + [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa") + (match_operand:V2DI 1 "memory_operand" "Z"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set (match_dup 2) + (vec_select:V2DI + (match_dup 1) + (parallel [(const_int 1) (const_int 0)]))) + (set (match_dup 0) + (vec_select:V2DI + (match_dup 2) + (parallel [(const_int 1) (const_int 0)])))] + " +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0]) + : operands[0]; +} + " + [(set_attr "type" "vecload") + (set_attr "length" "8")]) + +(define_insn_and_split "*vsx_le_perm_load_v4si" + [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa") + (match_operand:V4SI 1 "memory_operand" "Z"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set (match_dup 2) + (vec_select:V4SI + (match_dup 1) + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)]))) + (set (match_dup 0) + (vec_select:V4SI + (match_dup 2) + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)])))] + " +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0]) + : operands[0]; +} + " + [(set_attr "type" "vecload") + (set_attr "length" "8")]) + +(define_insn_and_split "*vsx_le_perm_load_v8hi" + [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa") + (match_operand:V8HI 1 "memory_operand" "Z"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set (match_dup 2) + (vec_select:V8HI + (match_dup 1) + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)]))) + (set (match_dup 0) + (vec_select:V8HI + (match_dup 2) + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)])))] + " +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0]) + : operands[0]; +} + " + [(set_attr "type" "vecload") + (set_attr "length" "8")]) + +(define_insn_and_split "*vsx_le_perm_load_v16qi" + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") + (match_operand:V16QI 1 "memory_operand" "Z"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set (match_dup 2) + (vec_select:V16QI + (match_dup 1) + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)]))) + (set (match_dup 0) + (vec_select:V16QI + (match_dup 2) + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])))] + " +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0]) + : operands[0]; +} + " + [(set_attr "type" "vecload") + (set_attr "length" "8")]) + +(define_insn_and_split "*vsx_le_perm_store_v2di" + [(set (match_operand:V2DI 0 "memory_operand" "=Z") + (match_operand:V2DI 1 "vsx_register_operand" "+wa"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set (match_dup 2) + (vec_select:V2DI + (match_dup 1) + (parallel [(const_int 1) (const_int 0)]))) + (set (match_dup 0) + (vec_select:V2DI + (match_dup 2) + (parallel [(const_int 1) (const_int 0)])))] + " +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) + : operands[1]; +} + " + [(set_attr "type" "vecstore") + (set_attr "length" "8")]) + +(define_insn_and_split "*vsx_le_perm_store_v4si" + [(set (match_operand:V4SI 0 "memory_operand" "=Z") + (match_operand:V4SI 1 "vsx_register_operand" "+wa"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set (match_dup 2) + (vec_select:V4SI + (match_dup 1) + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)]))) + (set (match_dup 0) + (vec_select:V4SI + (match_dup 2) + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)])))] + " +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) + : operands[1]; +} + " + [(set_attr "type" "vecstore") + (set_attr "length" "8")]) + +(define_insn_and_split "*vsx_le_perm_store_v8hi" + [(set (match_operand:V8HI 0 "memory_operand" "=Z") + (match_operand:V8HI 1 "vsx_register_operand" "+wa"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set (match_dup 2) + (vec_select:V8HI + (match_dup 1) + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)]))) + (set (match_dup 0) + (vec_select:V8HI + (match_dup 2) + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)])))] + " +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) + : operands[1]; +} + " + [(set_attr "type" "vecstore") + (set_attr "length" "8")]) + +(define_insn_and_split "*vsx_le_perm_store_v16qi" + [(set (match_operand:V16QI 0 "memory_operand" "=Z") + (match_operand:V16QI 1 "vsx_register_operand" "+wa"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set (match_dup 2) + (vec_select:V16QI + (match_dup 1) + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)]))) + (set (match_dup 0) + (vec_select:V16QI + (match_dup 2) + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])))] + " +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) + : operands[1]; +} + " + [(set_attr "type" "vecstore") + (set_attr "length" "8")]) + + (define_insn "*vsx_mov" [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,,,?Z,?wa,?wa,wQ,?&r,??Y,??r,??r,,?wa,*r,v,wZ, v") (match_operand:VSX_M 1 "input_operand" ",Z,,wa,Z,wa,r,wQ,r,Y,r,j,j,j,W,v,wZ"))] @@ -1076,6 +1308,153 @@ "xxpermdi %x0,%x1,%x2,0" [(set_attr "type" "vecperm")]) +;; xxpermdi for little endian loads and stores. We need several of +;; these since the form of the PARALLEL differs by mode. +(define_insn "*vsx_xxpermdi2_le_" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_select:VSX_D + (match_operand:VSX_D 1 "vsx_register_operand" "wa") + (parallel [(const_int 1) (const_int 0)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode)" + "xxpermdi %x0,%x1,%x1,2" + [(set_attr "type" "vecperm")]) + +(define_insn "*vsx_xxpermdi4_le_" + [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") + (vec_select:VSX_W + (match_operand:VSX_W 1 "vsx_register_operand" "wa") + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode)" + "xxpermdi %x0,%x1,%x1,2" + [(set_attr "type" "vecperm")]) + +(define_insn "*vsx_xxpermdi8_le_V8HI" + [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa") + (vec_select:V8HI + (match_operand:V8HI 1 "vsx_register_operand" "wa") + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)" + "xxpermdi %x0,%x1,%x1,2" + [(set_attr "type" "vecperm")]) + +(define_insn "*vsx_xxpermdi16_le_V16QI" + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") + (vec_select:V16QI + (match_operand:V16QI 1 "vsx_register_operand" "wa") + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)" + "xxpermdi %x0,%x1,%x1,2" + [(set_attr "type" "vecperm")]) + +;; lxvd2x for little endian loads. We need several of +;; these since the form of the PARALLEL differs by mode. +(define_insn "*vsx_lxvd2x2_le_" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_select:VSX_D + (match_operand:VSX_D 1 "memory_operand" "Z") + (parallel [(const_int 1) (const_int 0)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode)" + "lxvd2x %x0,%y1" + [(set_attr "type" "vecload")]) + +(define_insn "*vsx_lxvd2x4_le_" + [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") + (vec_select:VSX_W + (match_operand:VSX_W 1 "memory_operand" "Z") + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode)" + "lxvd2x %x0,%y1" + [(set_attr "type" "vecload")]) + +(define_insn "*vsx_lxvd2x8_le_V8HI" + [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa") + (vec_select:V8HI + (match_operand:V8HI 1 "memory_operand" "Z") + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)" + "lxvd2x %x0,%y1" + [(set_attr "type" "vecload")]) + +(define_insn "*vsx_lxvd2x16_le_V16QI" + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") + (vec_select:V16QI + (match_operand:V16QI 1 "memory_operand" "Z") + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)" + "lxvd2x %x0,%y1" + [(set_attr "type" "vecload")]) + +;; stxvd2x for little endian stores. We need several of +;; these since the form of the PARALLEL differs by mode. +(define_insn "*vsx_stxvd2x2_le_" + [(set (match_operand:VSX_D 0 "memory_operand" "=Z") + (vec_select:VSX_D + (match_operand:VSX_D 1 "vsx_register_operand" "wa") + (parallel [(const_int 1) (const_int 0)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode)" + "stxvd2x %x1,%y0" + [(set_attr "type" "vecstore")]) + +(define_insn "*vsx_stxvd2x4_le_" + [(set (match_operand:VSX_W 0 "memory_operand" "=Z") + (vec_select:VSX_W + (match_operand:VSX_W 1 "vsx_register_operand" "wa") + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode)" + "stxvd2x %x1,%y0" + [(set_attr "type" "vecstore")]) + +(define_insn "*vsx_stxvd2x8_le_V8HI" + [(set (match_operand:V8HI 0 "memory_operand" "=Z") + (vec_select:V8HI + (match_operand:V8HI 1 "vsx_register_operand" "wa") + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)" + "stxvd2x %x1,%y0" + [(set_attr "type" "vecstore")]) + +(define_insn "*vsx_stxvd2x16_le_V16QI" + [(set (match_operand:V16QI 0 "memory_operand" "=Z") + (vec_select:V16QI + (match_operand:V16QI 1 "vsx_register_operand" "wa") + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)" + "stxvd2x %x1,%y0" + [(set_attr "type" "vecstore")]) + ;; Set the element of a V2DI/VD2F mode (define_insn "vsx_set_" [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?wa")