From patchwork Sun Nov 6 02:39:20 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Miller X-Patchwork-Id: 123910 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 899AFB6F7D for ; Sun, 6 Nov 2011 13:40:15 +1100 (EST) Received: (qmail 28394 invoked by alias); 6 Nov 2011 02:40:11 -0000 Received: (qmail 28336 invoked by uid 22791); 6 Nov 2011 02:40:06 -0000 X-SWARE-Spam-Status: No, hits=-7.4 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, SPF_HELO_PASS, TW_OV, TW_VW X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 06 Nov 2011 02:39:45 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id pA62dMVM002933 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 5 Nov 2011 22:39:22 -0400 Received: from localhost (ovpn-113-56.phx2.redhat.com [10.3.113.56]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id pA62dKfQ014353; Sat, 5 Nov 2011 22:39:21 -0400 Date: Sat, 05 Nov 2011 22:39:20 -0400 (EDT) Message-Id: <20111105.223920.1520491100204286381.davem@redhat.com> To: gcc-patches@gcc.gnu.org CC: ebotcazou@adacore.com, rth@redhat.com Subject: [PATCH] More improvements to sparc VIS vec_init code generation. From: David Miller Mime-Version: 1.0 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Eric, the testsuite target tests for vis2 and vi3 capable hardware work well in my own testing but if you find some problem with how it's done just let me know and I'll try to fix it up. I'm almost %100 satisfied with the code generation for vec_init now. The one remaining case where I think we can do better is initializing a V8QImode vector using bshuffle with more than 4 unique inputs. I've been trying to come up with a trick to use fpmerge to get the last few bytes into place for the bshuffle, but it's a bit of a challenge because the bytes don't propagate into the destination in a convenient way. In particular it can't be done without fighting the natural move coalescing from the RTL optimizers that cleans up these RTL expansions. The vector_init_bshuffle() code tries to rely upon moves as much as possible to put the bshuffle inputs into place, because such moves are typically completely optimized away by the compiler. So something like: __v4hi foo(short a, short b, short c, short d) { __v4hi x = { a, b, c, d }; return x; } generates: foo: movwtos %o0, %f2 movwtos %o1, %f3 movwtos %o2, %f4 movwtos %o3, %f5 sethi %hi(bmask_val), %g1 or %g1, %lo(bmask_val), %g1 bmask %g1, %g0, %g1 retl bshuffle %f2, %f4, %f0 for VIS3. For cases where we only load part of a register input to the bshuffle instruction, and the rest of the register is "don't care" we have a preceeding clobber emitted so that the compiler doesn't try to zero initialize the uninitialized bits. Support for the short floating point loads starts to show up here as well, and I intend to flesh these out, support the short float stores, and add VIS intrinsic access to them. Richard, is there a better way to represent this in RTL? These instructions basically load a single byte or half-word into the bottom of a 64-bit float register, and clear the rest of that register with zeros. So the v4hi one is essentially loading the vector: [(const_int 0) (const_int 0) (const_int 0) (mem:HI (register:P ...))] into the destination 64-bit float reg. For now I'm simply using an unspec. Committed to trunk. gcc/ * config/sparc/sparc.md (UNSPEC_SHORT_LOAD): New unspec. (zero-extend_v8qi_vis, zero_extend_v4hi_vis): New expanders. (*zero_extend_v8qi__insn, *zero_extend_v4hi__insn): New insns. * config/sparc/sparc.c (vector_init_move_words) (vector_init_prepare_elts, sparc_expand_vector_init_vis2, sparc_expand_vector_init_vis1): New functions. (vector_init_bshuffle): Rewrite to handle more cases and make use of locs[] array prepared by vector_init_prepare_elts. (vector_init_fpmerge, vector_init_faligndata): Delete. (sparc_expand_vector_init): Rewrite using new infrastructure. gcc/testsuite/ * lib/test-supports.exp (check_effective_target_ultrasparc_vis2_hw): New proc. (check_effective_target_ultrasparc_vis3_hw): New proc. * gcc.target/sparc/vec-init-1.inc: New vector init common code. * gcc.target/sparc/vec-init-2.inc: Likewise. * gcc.target/sparc/vec-init-3.inc: Likewise. * gcc.target/sparc/vec-init-1-vis1.c: New test. * gcc.target/sparc/vec-init-1-vis2.c: New test. * gcc.target/sparc/vec-init-1-vis3.c: New test. * gcc.target/sparc/vec-init-2-vis1.c: New test. * gcc.target/sparc/vec-init-2-vis2.c: New test. * gcc.target/sparc/vec-init-2-vis3.c: New test. * gcc.target/sparc/vec-init-3-vis1.c: New test. * gcc.target/sparc/vec-init-3-vis2.c: New test. * gcc.target/sparc/vec-init-3-vis3.c: New test. --- gcc/ChangeLog | 16 +- gcc/config/sparc/sparc.c | 419 +++++++++++++++++----- gcc/config/sparc/sparc.md | 43 +++ gcc/testsuite/ChangeLog | 18 + gcc/testsuite/gcc.target/sparc/vec-init-1-vis1.c | 5 + gcc/testsuite/gcc.target/sparc/vec-init-1-vis2.c | 5 + gcc/testsuite/gcc.target/sparc/vec-init-1-vis3.c | 5 + gcc/testsuite/gcc.target/sparc/vec-init-1.inc | 85 +++++ gcc/testsuite/gcc.target/sparc/vec-init-2-vis1.c | 5 + gcc/testsuite/gcc.target/sparc/vec-init-2-vis2.c | 5 + gcc/testsuite/gcc.target/sparc/vec-init-2-vis3.c | 5 + gcc/testsuite/gcc.target/sparc/vec-init-2.inc | 94 +++++ gcc/testsuite/gcc.target/sparc/vec-init-3-vis1.c | 5 + gcc/testsuite/gcc.target/sparc/vec-init-3-vis2.c | 5 + gcc/testsuite/gcc.target/sparc/vec-init-3-vis3.c | 5 + gcc/testsuite/gcc.target/sparc/vec-init-3.inc | 105 ++++++ gcc/testsuite/lib/target-supports.exp | 18 + 17 files changed, 743 insertions(+), 100 deletions(-) create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-1-vis1.c create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-1-vis2.c create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-1-vis3.c create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-1.inc create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-2-vis1.c create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-2-vis2.c create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-2-vis3.c create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-2.inc create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-3-vis1.c create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-3-vis2.c create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-3-vis3.c create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-3.inc diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 2df0736..819ec63 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,17 @@ +2011-11-05 David S. Miller + + * config/sparc/sparc.md (UNSPEC_SHORT_LOAD): New unspec. + (zero-extend_v8qi_vis, zero_extend_v4hi_vis): New expanders. + (*zero_extend_v8qi__insn, + *zero_extend_v4hi__insn): New insns. + * config/sparc/sparc.c (vector_init_move_words) + (vector_init_prepare_elts, sparc_expand_vector_init_vis2, + sparc_expand_vector_init_vis1): New functions. + (vector_init_bshuffle): Rewrite to handle more cases and make use + of locs[] array prepared by vector_init_prepare_elts. + (vector_init_fpmerge, vector_init_faligndata): Delete. + (sparc_expand_vector_init): Rewrite using new infrastructure. + 2011-11-05 Joern Rennecke * config.gcc (epiphany-*-*): New architecture. @@ -56,7 +70,7 @@ Remove -mcpu=601 multilib. Remove -Dmpc8260 multilib. * config/rs6000/rtems.h: Allow --float-gprs=... to override grps - on E500 targets. + on E500 targets. 2011-11-05 Quentin Neill diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index 0daa53d..5d22fc0 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -11280,83 +11280,333 @@ output_v8plus_mult (rtx insn, rtx *operands, const char *name) } static void -vector_init_bshuffle (rtx target, rtx elt, enum machine_mode mode, +vector_init_bshuffle (rtx target, rtx *locs, int n_elts, enum machine_mode mode, enum machine_mode inner_mode) { - rtx t1, final_insn; - int bmask; + rtx mid_target, r0_high, r0_low, r1_high, r1_low; + enum machine_mode partial_mode; + int bmask, i, idxs[8]; - t1 = gen_reg_rtx (mode); + partial_mode = (mode == V4HImode + ? V2HImode + : (mode == V8QImode + ? V4QImode : mode)); - elt = convert_modes (SImode, inner_mode, elt, true); - emit_move_insn (gen_lowpart(SImode, t1), elt); + r0_high = r0_low = NULL_RTX; + r1_high = r1_low = NULL_RTX; - switch (mode) + /* Move the pieces into place, as needed, and calculate the nibble + indexes for the bmask calculation. After we execute this loop the + locs[] array is no longer needed. Therefore, to simplify things, + we set entries that have been processed already to NULL_RTX. */ + + for (i = 0; i < n_elts; i++) + { + int j; + + if (locs[i] == NULL_RTX) + continue; + + if (!r0_low) { - case V2SImode: - final_insn = gen_bshufflev2si_vis (target, t1, t1); - bmask = 0x45674567; - break; - case V4HImode: - final_insn = gen_bshufflev4hi_vis (target, t1, t1); - bmask = 0x67676767; + r0_low = locs[i]; + idxs[i] = 0x7; + } + else if (!r1_low) + { + r1_low = locs[i]; + idxs[i] = 0xf; + } + else if (!r0_high) + { + r0_high = gen_highpart (partial_mode, r0_low); + emit_move_insn (r0_high, gen_lowpart (partial_mode, locs[i])); + idxs[i] = 0x3; + } + else if (!r1_high) + { + r1_high = gen_highpart (partial_mode, r1_low); + emit_move_insn (r1_high, gen_lowpart (partial_mode, locs[i])); + idxs[i] = 0xb; + } + else + gcc_unreachable (); + + for (j = i + 1; j < n_elts; j++) + { + if (locs[j] == locs[i]) + { + locs[j] = NULL_RTX; + idxs[j] = idxs[i]; + } + } + locs[i] = NULL_RTX; + } + + bmask = 0; + for (i = 0; i < n_elts; i++) + { + int v = idxs[i]; + + switch (GET_MODE_SIZE (inner_mode)) + { + case 2: + bmask <<= 8; + bmask |= (((v - 1) << 4) | v); break; - case V8QImode: - final_insn = gen_bshufflev8qi_vis (target, t1, t1); - bmask = 0x77777777; + + case 1: + bmask <<= 4; + bmask |= v; break; + default: gcc_unreachable (); } + } + + emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), CONST0_RTX (SImode), + force_reg (SImode, GEN_INT (bmask)))); + + mid_target = target; + if (GET_MODE_SIZE (mode) == 4) + { + mid_target = gen_reg_rtx (mode == V2HImode + ? V4HImode : V8QImode); + } + + if (!r1_low) + r1_low = r0_low; + + switch (GET_MODE (mid_target)) + { + case V4HImode: + emit_insn (gen_bshufflev4hi_vis (mid_target, r0_low, r1_low)); + break; + case V8QImode: + emit_insn (gen_bshufflev8qi_vis (mid_target, r0_low, r1_low)); + break; + default: + gcc_unreachable (); + } - emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), CONST0_RTX (SImode), - force_reg (SImode, GEN_INT (bmask)))); - emit_insn (final_insn); + if (mid_target != target) + emit_move_insn (target, gen_lowpart (partial_mode, mid_target)); } +static bool +vector_init_move_words (rtx target, rtx vals, enum machine_mode mode, + enum machine_mode inner_mode) +{ + switch (mode) + { + case V1SImode: + case V1DImode: + emit_move_insn (gen_lowpart (inner_mode, target), + gen_lowpart (inner_mode, XVECEXP (vals, 0, 0))); + return true; + + case V2SImode: + emit_move_insn (gen_highpart (SImode, target), XVECEXP (vals, 0, 0)); + emit_move_insn (gen_lowpart (SImode, target), XVECEXP (vals, 0, 1)); + return true; + + default: + break; + } + return false; +} + +/* Move the elements in rtvec VALS into registers compatible with MODE. + Store the rtx for these regs into the corresponding array entry of + LOCS. */ static void -vector_init_fpmerge (rtx target, rtx elt, enum machine_mode inner_mode) +vector_init_prepare_elts (rtx vals, int n_elts, rtx *locs, enum machine_mode mode, + enum machine_mode inner_mode) { - rtx t1, t2, t3, t3_low; + enum machine_mode loc_mode; + int i; - t1 = gen_reg_rtx (V4QImode); - elt = convert_modes (SImode, inner_mode, elt, true); - emit_move_insn (gen_lowpart (SImode, t1), elt); + switch (mode) + { + case V2HImode: + loc_mode = V4HImode; + break; - t2 = gen_reg_rtx (V4QImode); - emit_move_insn (t2, t1); + case V4QImode: + loc_mode = V8QImode; + break; + + case V4HImode: + case V8QImode: + loc_mode = mode; + break; + + default: + gcc_unreachable (); + } + + gcc_assert (GET_MODE_SIZE (inner_mode) <= 4); + for (i = 0; i < n_elts; i++) + { + rtx dst, elt = XVECEXP (vals, 0, i); + int j; + + /* Did we see this already? If so just record it's location. */ + dst = NULL_RTX; + for (j = 0; j < i; j++) + { + if (XVECEXP (vals, 0, j) == elt) + { + dst = locs[j]; + break; + } + } - t3 = gen_reg_rtx (V8QImode); - t3_low = gen_lowpart (V4QImode, t3); + if (! dst) + { + enum rtx_code code = GET_CODE (elt); - emit_insn (gen_fpmerge_vis (t3, t1, t2)); - emit_move_insn (t1, t3_low); - emit_move_insn (t2, t3_low); + dst = gen_reg_rtx (loc_mode); - emit_insn (gen_fpmerge_vis (t3, t1, t2)); - emit_move_insn (t1, t3_low); - emit_move_insn (t2, t3_low); + /* We use different strategies based upon whether the element + is in memory or in a register. When we start in a register + and we're VIS3 capable, it's always cheaper to use the VIS3 + int-->fp register moves since we avoid having to use stack + memory. */ + if ((TARGET_VIS3 && (code == REG || code == SUBREG)) + || (CONSTANT_P (elt) + && (const_zero_operand (elt, inner_mode) + || const_all_ones_operand (elt, inner_mode)))) + { + elt = convert_modes (SImode, inner_mode, elt, true); - emit_insn (gen_fpmerge_vis (gen_lowpart (V8QImode, target), t1, t2)); + emit_clobber (dst); + emit_move_insn (gen_lowpart (SImode, dst), elt); + } + else + { + rtx m = elt; + + if (CONSTANT_P (elt)) + { + m = force_const_mem (inner_mode, elt); + } + else if (code != MEM) + { + rtx stk = assign_stack_temp (inner_mode, GET_MODE_SIZE(inner_mode), 0); + emit_move_insn (stk, elt); + m = stk; + } + + switch (loc_mode) + { + case V4HImode: + emit_insn (gen_zero_extend_v4hi_vis (dst, m)); + break; + case V8QImode: + emit_insn (gen_zero_extend_v8qi_vis (dst, m)); + break; + default: + gcc_unreachable (); + } + } + } + locs[i] = dst; + } } static void -vector_init_faligndata (rtx target, rtx elt, enum machine_mode inner_mode) +sparc_expand_vector_init_vis2 (rtx target, rtx *locs, int n_elts, int n_unique, + enum machine_mode mode, + enum machine_mode inner_mode) { - rtx t1 = gen_reg_rtx (V4HImode); + if (n_unique <= 4) + { + vector_init_bshuffle (target, locs, n_elts, mode, inner_mode); + } + else + { + int i; - elt = convert_modes (SImode, inner_mode, elt, true); + gcc_assert (mode == V8QImode); - emit_move_insn (gen_lowpart (SImode, t1), elt); + emit_insn (gen_alignaddrsi_vis (gen_reg_rtx (SImode), + force_reg (SImode, GEN_INT (7)), + CONST0_RTX (SImode))); + i = n_elts - 1; + emit_insn (gen_faligndatav8qi_vis (target, locs[i], locs[i])); + while (--i >= 0) + emit_insn (gen_faligndatav8qi_vis (target, locs[i], target)); + } +} + +static void +sparc_expand_vector_init_vis1 (rtx target, rtx *locs, int n_elts, int n_unique, + enum machine_mode mode) +{ + enum machine_mode full_mode = mode; + rtx (*emitter)(rtx, rtx, rtx); + int alignaddr_val, i; + rtx tmp = target; + + if (n_unique == 1 && mode == V8QImode) + { + rtx t2, t2_low, t1; + + t1 = gen_reg_rtx (V4QImode); + emit_move_insn (t1, gen_lowpart (V4QImode, locs[0])); + + t2 = gen_reg_rtx (V8QImode); + t2_low = gen_lowpart (V4QImode, t2); + + /* xxxxxxAA --> xxxxxxxxxxxxAAAA + xxxxAAAA --> xxxxxxxxAAAAAAAA + AAAAAAAA --> AAAAAAAAAAAAAAAA */ + emit_insn (gen_fpmerge_vis (t2, t1, t1)); + emit_move_insn (t1, t2_low); + emit_insn (gen_fpmerge_vis (t2, t1, t1)); + emit_move_insn (t1, t2_low); + emit_insn (gen_fpmerge_vis (target, t1, t1)); + return; + } + + switch (mode) + { + case V2HImode: + full_mode = V4HImode; + /* FALLTHRU */ + case V4HImode: + emitter = gen_faligndatav4hi_vis; + alignaddr_val = 6; + break; + + case V4QImode: + full_mode = V8QImode; + /* FALLTHRU */ + case V8QImode: + emitter = gen_faligndatav8qi_vis; + alignaddr_val = 7; + break; + + default: + gcc_unreachable (); + } + + if (full_mode != mode) + tmp = gen_reg_rtx (full_mode); emit_insn (gen_alignaddrsi_vis (gen_reg_rtx (SImode), - force_reg (SImode, GEN_INT (6)), + force_reg (SImode, GEN_INT (alignaddr_val)), CONST0_RTX (SImode))); - emit_insn (gen_faligndatav4hi_vis (target, t1, target)); - emit_insn (gen_faligndatav4hi_vis (target, t1, target)); - emit_insn (gen_faligndatav4hi_vis (target, t1, target)); - emit_insn (gen_faligndatav4hi_vis (target, t1, target)); + i = n_elts - 1; + emit_insn (emitter (tmp, locs[i], locs[i])); + while (--i >= 0) + emit_insn (emitter (tmp, locs[i], tmp)); + + if (tmp != target) + emit_move_insn (target, gen_highpart (mode, tmp)); } void @@ -11365,19 +11615,30 @@ sparc_expand_vector_init (rtx target, rtx vals) enum machine_mode mode = GET_MODE (target); enum machine_mode inner_mode = GET_MODE_INNER (mode); int n_elts = GET_MODE_NUNITS (mode); - int i, n_var = 0; - bool all_same; - rtx mem; + int i, n_var = 0, n_unique = 0; + rtx locs[8]; + + gcc_assert (n_elts <= 8); - all_same = true; for (i = 0; i < n_elts; i++) { rtx x = XVECEXP (vals, 0, i); + bool found = false; + int j; + if (!CONSTANT_P (x)) n_var++; - if (i > 0 && !rtx_equal_p (x, XVECEXP (vals, 0, 0))) - all_same = false; + for (j = 0; j < i; j++) + { + if (rtx_equal_p (x, XVECEXP (vals, 0, j))) + { + found = true; + break; + } + } + if (!found) + n_unique++; } if (n_var == 0) @@ -11386,56 +11647,16 @@ sparc_expand_vector_init (rtx target, rtx vals) return; } - if (GET_MODE_SIZE (inner_mode) == GET_MODE_SIZE (mode)) - { - if (GET_MODE_SIZE (inner_mode) == 4) - { - emit_move_insn (gen_lowpart (SImode, target), - gen_lowpart (SImode, XVECEXP (vals, 0, 0))); - return; - } - else if (GET_MODE_SIZE (inner_mode) == 8) - { - emit_move_insn (gen_lowpart (DImode, target), - gen_lowpart (DImode, XVECEXP (vals, 0, 0))); - return; - } - } - else if (GET_MODE_SIZE (inner_mode) == GET_MODE_SIZE (word_mode) - && GET_MODE_SIZE (mode) == 2 * GET_MODE_SIZE (word_mode)) - { - emit_move_insn (gen_highpart (word_mode, target), - gen_lowpart (word_mode, XVECEXP (vals, 0, 0))); - emit_move_insn (gen_lowpart (word_mode, target), - gen_lowpart (word_mode, XVECEXP (vals, 0, 1))); - return; - } + if (vector_init_move_words (target, vals, mode, inner_mode)) + return; - if (all_same && GET_MODE_SIZE (mode) == 8) - { - if (TARGET_VIS2) - { - vector_init_bshuffle (target, XVECEXP (vals, 0, 0), mode, inner_mode); - return; - } - if (mode == V8QImode) - { - vector_init_fpmerge (target, XVECEXP (vals, 0, 0), inner_mode); - return; - } - if (mode == V4HImode) - { - vector_init_faligndata (target, XVECEXP (vals, 0, 0), inner_mode); - return; - } - } + vector_init_prepare_elts (vals, n_elts, locs, mode, inner_mode); - mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0); - for (i = 0; i < n_elts; i++) - emit_move_insn (adjust_address_nv (mem, inner_mode, - i * GET_MODE_SIZE (inner_mode)), - XVECEXP (vals, 0, i)); - emit_move_insn (target, mem); + if (TARGET_VIS2) + sparc_expand_vector_init_vis2 (target, locs, n_elts, n_unique, + mode, inner_mode); + else + sparc_expand_vector_init_vis1 (target, locs, n_elts, n_unique, mode); } static reg_class_t diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md index d4827bd..7452f96 100644 --- a/gcc/config/sparc/sparc.md +++ b/gcc/config/sparc/sparc.md @@ -92,6 +92,7 @@ (UNSPEC_MUL8 86) (UNSPEC_MUL8SU 87) (UNSPEC_MULDSU 88) + (UNSPEC_SHORT_LOAD 89) ]) (define_constants @@ -7830,6 +7831,48 @@ DONE; }) +(define_expand "zero_extend_v8qi_vis" + [(set (match_operand:V8QI 0 "register_operand" "") + (unspec:V8QI [(match_operand:QI 1 "memory_operand" "")] + UNSPEC_SHORT_LOAD))] + "TARGET_VIS" +{ + if (! REG_P (XEXP (operands[1], 0))) + { + rtx addr = force_reg (Pmode, XEXP (operands[1], 0)); + operands[1] = replace_equiv_address (operands[1], addr); + } +}) + +(define_expand "zero_extend_v4hi_vis" + [(set (match_operand:V4HI 0 "register_operand" "") + (unspec:V4HI [(match_operand:HI 1 "memory_operand" "")] + UNSPEC_SHORT_LOAD))] + "TARGET_VIS" +{ + if (! REG_P (XEXP (operands[1], 0))) + { + rtx addr = force_reg (Pmode, XEXP (operands[1], 0)); + operands[1] = replace_equiv_address (operands[1], addr); + } +}) + +(define_insn "*zero_extend_v8qi__insn" + [(set (match_operand:V8QI 0 "register_operand" "=e") + (unspec:V8QI [(mem:QI + (match_operand:P 1 "register_operand" "r"))] + UNSPEC_SHORT_LOAD))] + "TARGET_VIS" + "ldda\t[%1] 0xd0, %0") + +(define_insn "*zero_extend_v4hi__insn" + [(set (match_operand:V4HI 0 "register_operand" "=e") + (unspec:V4HI [(mem:HI + (match_operand:P 1 "register_operand" "r"))] + UNSPEC_SHORT_LOAD))] + "TARGET_VIS" + "ldda\t[%1] 0xd2, %0") + (define_expand "vec_init" [(match_operand:VMALL 0 "register_operand" "") (match_operand:VMALL 1 "" "")] diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 8091789..b84dcf0 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,21 @@ +2011-11-05 David S. Miller + + * lib/test-supports.exp + (check_effective_target_ultrasparc_vis2_hw): New proc. + (check_effective_target_ultrasparc_vis3_hw): New proc. + * gcc.target/sparc/vec-init-1.inc: New vector init common code. + * gcc.target/sparc/vec-init-2.inc: Likewise. + * gcc.target/sparc/vec-init-3.inc: Likewise. + * gcc.target/sparc/vec-init-1-vis1.c: New test. + * gcc.target/sparc/vec-init-1-vis2.c: New test. + * gcc.target/sparc/vec-init-1-vis3.c: New test. + * gcc.target/sparc/vec-init-2-vis1.c: New test. + * gcc.target/sparc/vec-init-2-vis2.c: New test. + * gcc.target/sparc/vec-init-2-vis3.c: New test. + * gcc.target/sparc/vec-init-3-vis1.c: New test. + * gcc.target/sparc/vec-init-3-vis2.c: New test. + * gcc.target/sparc/vec-init-3-vis3.c: New test. + 2011-11-05 Joern Rennecke * gcc.c-torture/execute/ieee/mul-subnormal-single-1.x: diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-1-vis1.c b/gcc/testsuite/gcc.target/sparc/vec-init-1-vis1.c new file mode 100644 index 0000000..4202bfa --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/vec-init-1-vis1.c @@ -0,0 +1,5 @@ +/* { dg-do run } */ +/* { dg-require-effective-target ultrasparc_hw } */ +/* { dg-options "-mcpu=ultrasparc -mvis -O2" } */ + +#include "vec-init-1.inc" diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-1-vis2.c b/gcc/testsuite/gcc.target/sparc/vec-init-1-vis2.c new file mode 100644 index 0000000..a5c2132 --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/vec-init-1-vis2.c @@ -0,0 +1,5 @@ +/* { dg-do run } */ +/* { dg-require-effective-target ultrasparc_vis2_hw } */ +/* { dg-options "-mcpu=ultrasparc3 -O2" } */ + +#include "vec-init-1.inc" diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-1-vis3.c b/gcc/testsuite/gcc.target/sparc/vec-init-1-vis3.c new file mode 100644 index 0000000..ab916e0 --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/vec-init-1-vis3.c @@ -0,0 +1,5 @@ +/* { dg-do run } */ +/* { dg-require-effective-target ultrasparc_vis3_hw } */ +/* { dg-options "-mcpu=niagara3 -O2" } */ + +#include "vec-init-1.inc" diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-1.inc b/gcc/testsuite/gcc.target/sparc/vec-init-1.inc new file mode 100644 index 0000000..e27bb6e --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/vec-init-1.inc @@ -0,0 +1,85 @@ +typedef int __v1si __attribute__ ((__vector_size__ (4))); +typedef int __v2si __attribute__ ((__vector_size__ (8))); +typedef short __v2hi __attribute__ ((__vector_size__ (4))); +typedef short __v4hi __attribute__ ((__vector_size__ (8))); +typedef unsigned char __v4qi __attribute__ ((__vector_size__ (4))); +typedef unsigned char __v8qi __attribute__ ((__vector_size__ (8))); + +extern void abort (void); + +static void +compare64 (void *p, unsigned long long val) +{ + if (*(unsigned long long *)p != val) + abort(); +} + +static void +compare32 (void *p, unsigned int val) +{ + if (*(unsigned int *)p != val) + abort(); +} + +static void +test_v8qi (unsigned char x) +{ + __v8qi v = { x, x, x, x, x, x, x, x }; + + compare64(&v, 0x4444444444444444ULL); +} + +static void +test_v4qi (unsigned char x) +{ + __v4qi v = { x, x, x, x }; + + compare32(&v, 0x44444444); +} + +static void +test_v4hi (unsigned short x) +{ + __v4hi v = { x, x, x, x, }; + + compare64(&v, 0x3344334433443344ULL); +} + +static void +test_v2hi (unsigned short x) +{ + __v2hi v = { x, x, }; + + compare32(&v, 0x33443344); +} + +static void +test_v2si (unsigned int x) +{ + __v2si v = { x, x, }; + + compare64(&v, 0x1122334411223344ULL); +} + +static void +test_v1si (unsigned int x) +{ + __v1si v = { x }; + + compare32(&v, 0x11223344); +} + +unsigned char x8 = 0x44; +unsigned short x16 = 0x3344; +unsigned int x32 = 0x11223344; + +int main(void) +{ + test_v8qi (x8); + test_v4qi (x8); + test_v4hi (x16); + test_v2hi (x16); + test_v2si (x32); + test_v1si (x32); + return 0; +} diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-2-vis1.c b/gcc/testsuite/gcc.target/sparc/vec-init-2-vis1.c new file mode 100644 index 0000000..efa08fa --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/vec-init-2-vis1.c @@ -0,0 +1,5 @@ +/* { dg-do run } */ +/* { dg-require-effective-target ultrasparc_hw } */ +/* { dg-options "-mcpu=ultrasparc -mvis -O2" } */ + +#include "vec-init-2.inc" diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-2-vis2.c b/gcc/testsuite/gcc.target/sparc/vec-init-2-vis2.c new file mode 100644 index 0000000..3aa0f51 --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/vec-init-2-vis2.c @@ -0,0 +1,5 @@ +/* { dg-do run } */ +/* { dg-require-effective-target ultrasparc_vis2_hw } */ +/* { dg-options "-mcpu=ultrasparc3 -O2" } */ + +#include "vec-init-2.inc" diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-2-vis3.c b/gcc/testsuite/gcc.target/sparc/vec-init-2-vis3.c new file mode 100644 index 0000000..5f0c658 --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/vec-init-2-vis3.c @@ -0,0 +1,5 @@ +/* { dg-do run } */ +/* { dg-require-effective-target ultrasparc_vis3_hw } */ +/* { dg-options "-mcpu=niagara3 -O2" } */ + +#include "vec-init-2.inc" diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-2.inc b/gcc/testsuite/gcc.target/sparc/vec-init-2.inc new file mode 100644 index 0000000..13685a1 --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/vec-init-2.inc @@ -0,0 +1,94 @@ +typedef short __v2hi __attribute__ ((__vector_size__ (4))); +typedef short __v4hi __attribute__ ((__vector_size__ (8))); + +extern void abort (void); + +static void +compare64 (int n, void *p, unsigned long long val) +{ + unsigned long long *x = (unsigned long long *) p; + + if (*x != val) + abort(); +} + +static void +compare32 (int n, void *p, unsigned int val) +{ + unsigned int *x = (unsigned int *) p; + if (*x != val) + abort(); +} + +#define V2HI_TEST(N, elt0, elt1) \ +static void \ +test_v2hi_##N (unsigned short x, unsigned short y) \ +{ \ + __v2hi v = { (elt0), (elt1) }; \ + compare32(N, &v, ((int)(elt0) << 16) | (elt1)); \ +} + +V2HI_TEST(1, x, y) +V2HI_TEST(2, y, x) +V2HI_TEST(3, x, x) +V2HI_TEST(4, x, 0) +V2HI_TEST(5, 0, x) +V2HI_TEST(6, y, 1) +V2HI_TEST(7, 1, y) +V2HI_TEST(8, 2, 3) +V2HI_TEST(9, 0x400, x) +V2HI_TEST(10, y, 0x8000) + +#define V4HI_TEST(N, elt0, elt1, elt2, elt3) \ +static void \ +test_v4hi_##N (unsigned short a, unsigned short b, unsigned short c, unsigned short d) \ +{ \ + __v4hi v = { (elt0), (elt1), (elt2), (elt3) }; \ + compare64(N, &v, \ + ((long long)(elt0) << 48) | \ + ((long long)(elt1) << 32) | \ + ((long long)(elt2) << 16) | \ + ((long long)(elt3))); \ +} + +V4HI_TEST(1, a, a, a, a) +V4HI_TEST(2, a, b, c, d) +V4HI_TEST(3, a, a, b, b) +V4HI_TEST(4, d, c, b, a) +V4HI_TEST(5, a, 0, 0, 0) +V4HI_TEST(6, a, 0, b, 0) +V4HI_TEST(7, c, 5, 5, 5) +V4HI_TEST(8, d, 6, a, 6) +V4HI_TEST(9, 0x200, 0x300, 0x500, 0x8800) +V4HI_TEST(10, 0x600, a, a, a) + +unsigned short a16 = 0x3344; +unsigned short b16 = 0x5566; +unsigned short c16 = 0x7788; +unsigned short d16 = 0x9911; + +int main(void) +{ + test_v2hi_1 (a16, b16); + test_v2hi_2 (a16, b16); + test_v2hi_3 (a16, b16); + test_v2hi_4 (a16, b16); + test_v2hi_5 (a16, b16); + test_v2hi_6 (a16, b16); + test_v2hi_7 (a16, b16); + test_v2hi_8 (a16, b16); + test_v2hi_9 (a16, b16); + test_v2hi_10 (a16, b16); + + test_v4hi_1 (a16, b16, c16, d16); + test_v4hi_2 (a16, b16, c16, d16); + test_v4hi_3 (a16, b16, c16, d16); + test_v4hi_4 (a16, b16, c16, d16); + test_v4hi_5 (a16, b16, c16, d16); + test_v4hi_6 (a16, b16, c16, d16); + test_v4hi_7 (a16, b16, c16, d16); + test_v4hi_8 (a16, b16, c16, d16); + test_v4hi_9 (a16, b16, c16, d16); + test_v4hi_10 (a16, b16, c16, d16); + return 0; +} diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-3-vis1.c b/gcc/testsuite/gcc.target/sparc/vec-init-3-vis1.c new file mode 100644 index 0000000..6c82610 --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/vec-init-3-vis1.c @@ -0,0 +1,5 @@ +/* { dg-do run } */ +/* { dg-require-effective-target ultrasparc_hw } */ +/* { dg-options "-mcpu=ultrasparc -mvis -O2" } */ + +#include "vec-init-3.inc" diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-3-vis2.c b/gcc/testsuite/gcc.target/sparc/vec-init-3-vis2.c new file mode 100644 index 0000000..6424e2f --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/vec-init-3-vis2.c @@ -0,0 +1,5 @@ +/* { dg-do run } */ +/* { dg-require-effective-target ultrasparc_vis2_hw } */ +/* { dg-options "-mcpu=ultrasparc3 -O2" } */ + +#include "vec-init-3.inc" diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-3-vis3.c b/gcc/testsuite/gcc.target/sparc/vec-init-3-vis3.c new file mode 100644 index 0000000..226c108 --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/vec-init-3-vis3.c @@ -0,0 +1,5 @@ +/* { dg-do run } */ +/* { dg-require-effective-target ultrasparc_vis3_hw } */ +/* { dg-options "-mcpu=niagara3 -O2" } */ + +#include "vec-init-3.inc" diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-3.inc b/gcc/testsuite/gcc.target/sparc/vec-init-3.inc new file mode 100644 index 0000000..8a3db26 --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/vec-init-3.inc @@ -0,0 +1,105 @@ +typedef unsigned char __v4qi __attribute__ ((__vector_size__ (4))); +typedef unsigned char __v8qi __attribute__ ((__vector_size__ (8))); + +extern void abort (void); + +static void +compare64 (int n, void *p, unsigned long long val) +{ + unsigned long long *x = (unsigned long long *) p; + + if (*x != val) + abort(); +} + +static void +compare32 (int n, void *p, unsigned int val) +{ + unsigned int *x = (unsigned int *) p; + if (*x != val) + abort(); +} + +#define V4QI_TEST(N, elt0, elt1, elt2, elt3) \ +static void \ +test_v4qi_##N (unsigned char a, unsigned char b, unsigned char c, unsigned char d) \ +{ \ + __v4qi v = { (elt0), (elt1), (elt2), (elt3) }; \ + compare32(N, &v, ((int)(elt0) << 24) | \ + ((int)(elt1) << 16) | \ + ((int)(elt2) << 8) | ((int)(elt3))); \ +} + +V4QI_TEST(1, a, a, a, a) +V4QI_TEST(2, b, b, b, b) +V4QI_TEST(3, a, b, c, d) +V4QI_TEST(4, d, c, b, a) +V4QI_TEST(5, a, 0, 0, 0) +V4QI_TEST(6, b, 1, 1, b) +V4QI_TEST(7, c, 5, d, 5) +V4QI_TEST(8, 0x20, 0x30, b, a) +V4QI_TEST(9, 0x40, 0x50, 0x60, 0x70) +V4QI_TEST(10, 0x40, 0x50, 0x60, c) + +#define V8QI_TEST(N, elt0, elt1, elt2, elt3, elt4, elt5, elt6, elt7) \ +static void \ +test_v8qi_##N (unsigned char a, unsigned char b, unsigned char c, unsigned char d, \ + unsigned char e, unsigned char f, unsigned char g, unsigned char h) \ +{ \ + __v8qi v = { (elt0), (elt1), (elt2), (elt3), \ + (elt4), (elt5), (elt6), (elt7) }; \ + compare64(N, &v, ((long long)(elt0) << 56) | \ + ((long long)(elt1) << 48) | \ + ((long long)(elt2) << 40) | \ + ((long long)(elt3) << 32) | \ + ((long long)(elt4) << 24) | \ + ((long long)(elt5) << 16) | \ + ((long long)(elt6) << 8) | \ + ((long long)(elt7) << 0)); \ +} + +V8QI_TEST(1, a, a, a, a, a, a, a, a) +V8QI_TEST(2, a, b, c, d, e, f, g, h) +V8QI_TEST(3, h, g, f, e, d, c, b, a) +V8QI_TEST(4, a, b, a, b, a, b, a, b) +V8QI_TEST(5, c, b, c, b, c, b, c, a) +V8QI_TEST(6, a, 0, 0, 0, 0, 0, 0, 0) +V8QI_TEST(7, b, 1, b, 1, b, 1, b, 1) +V8QI_TEST(8, c, d, 0x20, a, 0x21, b, 0x23, c) +V8QI_TEST(9, 1, 2, 3, 4, 5, 6, 7, 8) +V8QI_TEST(10, a, a, b, b, c, c, d, d) + +unsigned char a8 = 0x33; +unsigned char b8 = 0x55; +unsigned char c8 = 0x77; +unsigned char d8 = 0x99; +unsigned char e8 = 0x11; +unsigned char f8 = 0x22; +unsigned char g8 = 0x44; +unsigned char h8 = 0x66; + +int main(void) +{ + test_v4qi_1 (a8, b8, c8, d8); + test_v4qi_2 (a8, b8, c8, d8); + test_v4qi_3 (a8, b8, c8, d8); + test_v4qi_4 (a8, b8, c8, d8); + test_v4qi_5 (a8, b8, c8, d8); + test_v4qi_6 (a8, b8, c8, d8); + test_v4qi_7 (a8, b8, c8, d8); + test_v4qi_8 (a8, b8, c8, d8); + test_v4qi_9 (a8, b8, c8, d8); + test_v4qi_10 (a8, b8, c8, d8); + + test_v8qi_1 (a8, b8, c8, d8, e8, f8, g8, h8); + test_v8qi_2 (a8, b8, c8, d8, e8, f8, g8, h8); + test_v8qi_3 (a8, b8, c8, d8, e8, f8, g8, h8); + test_v8qi_4 (a8, b8, c8, d8, e8, f8, g8, h8); + test_v8qi_5 (a8, b8, c8, d8, e8, f8, g8, h8); + test_v8qi_6 (a8, b8, c8, d8, e8, f8, g8, h8); + test_v8qi_7 (a8, b8, c8, d8, e8, f8, g8, h8); + test_v8qi_8 (a8, b8, c8, d8, e8, f8, g8, h8); + test_v8qi_9 (a8, b8, c8, d8, e8, f8, g8, h8); + test_v8qi_10 (a8, b8, c8, d8, e8, f8, g8, h8); + return 0; +} diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index f19c3c5..1ba71f0 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2449,6 +2449,24 @@ proc check_effective_target_ultrasparc_hw { } { } "-mcpu=ultrasparc"] } +# Return 1 if the test environment supports executing UltraSPARC VIS2 +# instructions. We check this by attempting: "bmask %g0, %g0, %g0" + +proc check_effective_target_ultrasparc_vis2_hw { } { + return [check_runtime ultrasparc_hw { + int main() { __asm__(".word 0x81b00320"); return 0; } + } "-mcpu=ultrasparc3"] +} + +# Return 1 if the test environment supports executing UltraSPARC VIS3 +# instructions. We check this by attempting: "addxc %g0, %g0, %g0" + +proc check_effective_target_ultrasparc_vis3_hw { } { + return [check_runtime ultrasparc_hw { + int main() { __asm__(".word 0x81b00220"); return 0; } + } "-mcpu=niagara3"] +} + # Return 1 if the target supports hardware vector shift operation. proc check_effective_target_vect_shift { } {