From patchwork Mon Oct 24 03:53:21 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Miller X-Patchwork-Id: 121269 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 002FCB6F75 for ; Mon, 24 Oct 2011 14:54:15 +1100 (EST) Received: (qmail 14987 invoked by alias); 24 Oct 2011 03:54:13 -0000 Received: (qmail 14971 invoked by uid 22791); 24 Oct 2011 03:54:10 -0000 X-SWARE-Spam-Status: No, hits=-0.2 required=5.0 tests=AWL, BAYES_50, TW_OV, TW_SV, TW_VS, TW_VW X-Spam-Check-By: sourceware.org Received: from shards.monkeyblade.net (HELO shards.monkeyblade.net) (198.137.202.13) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 24 Oct 2011 03:53:51 +0000 Received: from localhost (cpe-66-65-61-233.nyc.res.rr.com [66.65.61.233]) (authenticated bits=0) by shards.monkeyblade.net (8.14.4/8.14.4) with ESMTP id p9O3rM4Y032039 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 23 Oct 2011 20:53:24 -0700 Date: Sun, 23 Oct 2011 23:53:21 -0400 (EDT) Message-Id: <20111023.235321.1404597026148966447.davem@davemloft.net> To: gcc-patches@gcc.gnu.org CC: ebotcazou@adacore.com, rth@redhat.com Subject: [PATCH] Add support for sparc VIS3 fp<-->int moves. From: David Miller Mime-Version: 1.0 X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org The non-trivial aspects (and what took the most time for me) of these changes are: 1) Getting the register move costs and class preferencing right such that the VIS3 moves do get effectively used for incoming float/vector argument passing on 32-bit, yet IRA and reload don't go nuts allocating integer registers to float/vector mode values and vice versa. Non-optimized compiles are particularly sensitive to this because there's simply a lot of moves that don't get cleaned up. So we might have 6 moves, 3 on each side of a single real calculation, so in the IRA costs the register classes of the moves dominate. 2) Making sure we don't merge a VIS3 move into a restore instruction. 3) Dealing with the restriction that we can't operate on 32-bit pieces of values contained in the upper 32 v9 float registers. We deal with this using two elements. First, we indicate a FP_REGS or GENERAL_OR_FP_REGS preferred reload class when we see reload try to load an integer register into class EXTRA_FP_REGS or GENERAL_OR_EXTRA_FP_REGS. Second, we teach reload that if it tries to move between float and integer regs, and some register class involving EXTRA_FP_REGS is involved, that an intermediate FP_REGS class register will possibly be needed to successfully complete the reload. The rest is mostly mechanical work of splitting the existing v9/64-bit move patterns into non-vis3 and vis3 variants. Because of how float arguments are passed on 32-bit, these instructions help a lot. This is evident in even the simplest examples, this C code: float fnegs (float a) { return -a; } double fnegd (double a) { return -a; } would generate: fnegs: add %sp, -104, %sp st %o0, [%sp+100] ld [%sp+100], %f8 sub %sp, -104, %sp jmp %o7+8 fnegs %f8, %f0 fnegd: add %sp, -104, %sp std %o0, [%sp+96] ldd [%sp+96], %f8 sub %sp, -104, %sp jmp %o7+8 fnegd %f8, %f0 but with VIS3 moves we get: fnegs: movwtos %o0, %f8 jmp %o7+8 fnegs %f8, %f0 fnegd: movwtos %o0, %f8 movwtos %o1, %f9 jmp %o7+8 fnegd %f8, %f0 And with our good friend pdist.c we get the following code for function 'foo' with VIS3 moves: foo: fzero %f8 movwtos %o0, %f10 movwtos %o1, %f11 movwtos %o2, %f12 movwtos %o3, %f13 pdist %f10, %f12, %f8 movstouw %f8, %o0 jmp %o7+8 movstouw %f9, %o1 Another good example of significantly improved code generation can be found when looking at the output of libgcc2.c:_mulsc3() Of course, sometimes we generate spurious secondary reloads because the use of the EXTRA_FP_REGS (and GENERAL_OR_EXTRA_FP_REGS) register class doesn't necessary result in using one of the upper 32 v9 float registers. Maybe if we used segregated register classes for the lower and upper float regs we could attack this issue effectively. These VIS3 patterns can also in the future be used for more crafty constant and non-constant vec_init sequences. This was regstrapped both with the compiler defaulting to vis3, and without. Committed to trunk. gcc/ * config/sparc/sparc.h (SECONDARY_MEMORY_NEEDED): We can move between float and non-float regs when VIS3. * config/sparc/sparc.c (eligible_for_restore_insn): We can't use a restore when the source is a float register. (sparc_split_regreg_legitimate): When VIS3 allow moves between float and integer regs. (sparc_register_move_cost): Adjust to account for VIS3 moves. (sparc_preferred_reload_class): On 32-bit with VIS3 when moving an integer reg to a class containing EXTRA_FP_REGS, constrain to FP_REGS. (sparc_secondary_reload): On 32-bit with VIS3 when moving between float and integer regs we sometimes need a FP_REGS class intermediate move to satisfy the reload. When this happens specify an extra cost of 2. (*movsi_insn): Rename to have "_novis3" suffix and add !VIS3 guard. (*movdi_insn_sp32_v9): Likewise. (*movdi_insn_sp64): Likewise. (*movsf_insn): Likewise. (*movdf_insn_sp32_v9): Likewise. (*movdf_insn_sp64): Likewise. (*zero_extendsidi2_insn_sp64): Likewise. (*sign_extendsidi2_insn): Likewise. (*movsi_insn_vis3): New insn. (*movdi_insn_sp32_v9_vis3): New insn. (*movdi_insn_sp64_vis3): New insn. (*movsf_insn_vis3): New insn. (*movdf_insn_sp32_v9_vis3): New insn. (*movdf_insn_sp64_vis3): New insn. (*zero_extendsidi2_insn_sp64_vis3): New insn. (*sign_extendsidi2_insn_vis3): New insn. (TFmode reg/reg split): Make sure both REG operands are float. (*mov_insn): Add "_novis3" suffix and !VIS3 guard. Remove easy constant to integer reg alternatives. (*mov_insn_sp64): Likewise. (*mov_insn_sp32_novis3): Likewise. (*mov_insn_vis3): New insn. (*mov_insn_sp64_vis3): New insn. (*mov_insn_sp32_vis3): New insn. (VM64 reg<-->reg split): New spliiter for 32-bit. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@180360 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog | 41 +++++ gcc/config/sparc/sparc.c | 85 ++++++++++- gcc/config/sparc/sparc.h | 9 +- gcc/config/sparc/sparc.md | 375 +++++++++++++++++++++++++++++++++++++++++---- 4 files changed, 469 insertions(+), 41 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index dfa4caf..1842402 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,46 @@ 2011-10-23 David S. Miller + * config/sparc/sparc.h (SECONDARY_MEMORY_NEEDED): We can move + between float and non-float regs when VIS3. + * config/sparc/sparc.c (eligible_for_restore_insn): We can't + use a restore when the source is a float register. + (sparc_split_regreg_legitimate): When VIS3 allow moves between + float and integer regs. + (sparc_register_move_cost): Adjust to account for VIS3 moves. + (sparc_preferred_reload_class): On 32-bit with VIS3 when moving an + integer reg to a class containing EXTRA_FP_REGS, constrain to + FP_REGS. + (sparc_secondary_reload): On 32-bit with VIS3 when moving between + float and integer regs we sometimes need a FP_REGS class + intermediate move to satisfy the reload. When this happens + specify an extra cost of 2. + (*movsi_insn): Rename to have "_novis3" suffix and add !VIS3 + guard. + (*movdi_insn_sp32_v9): Likewise. + (*movdi_insn_sp64): Likewise. + (*movsf_insn): Likewise. + (*movdf_insn_sp32_v9): Likewise. + (*movdf_insn_sp64): Likewise. + (*zero_extendsidi2_insn_sp64): Likewise. + (*sign_extendsidi2_insn): Likewise. + (*movsi_insn_vis3): New insn. + (*movdi_insn_sp32_v9_vis3): New insn. + (*movdi_insn_sp64_vis3): New insn. + (*movsf_insn_vis3): New insn. + (*movdf_insn_sp32_v9_vis3): New insn. + (*movdf_insn_sp64_vis3): New insn. + (*zero_extendsidi2_insn_sp64_vis3): New insn. + (*sign_extendsidi2_insn_vis3): New insn. + (TFmode reg/reg split): Make sure both REG operands are float. + (*mov_insn): Add "_novis3" suffix and !VIS3 guard. Remove + easy constant to integer reg alternatives. + (*mov_insn_sp64): Likewise. + (*mov_insn_sp32_novis3): Likewise. + (*mov_insn_vis3): New insn. + (*mov_insn_sp64_vis3): New insn. + (*mov_insn_sp32_vis3): New insn. + (VM64 reg<-->reg split): New spliiter for 32-bit. + * config/sparc/sparc.c (sparc_split_regreg_legitimate): New function. * config/sparc/sparc-protos.h (sparc_split_regreg_legitimate): diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index 29d2847..79bb821 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -2996,10 +2996,23 @@ eligible_for_restore_insn (rtx trial, bool return_p) { rtx pat = PATTERN (trial); rtx src = SET_SRC (pat); + bool src_is_freg = false; + rtx src_reg; + + /* Since we now can do moves between float and integer registers when + VIS3 is enabled, we have to catch this case. We can allow such + moves when doing a 'return' however. */ + src_reg = src; + if (GET_CODE (src_reg) == SUBREG) + src_reg = SUBREG_REG (src_reg); + if (GET_CODE (src_reg) == REG + && SPARC_FP_REG_P (REGNO (src_reg))) + src_is_freg = true; /* The 'restore src,%g0,dest' pattern for word mode and below. */ if (GET_MODE_CLASS (GET_MODE (src)) != MODE_FLOAT - && arith_operand (src, GET_MODE (src))) + && arith_operand (src, GET_MODE (src)) + && ! src_is_freg) { if (TARGET_ARCH64) return GET_MODE_SIZE (GET_MODE (src)) <= GET_MODE_SIZE (DImode); @@ -3009,7 +3022,8 @@ eligible_for_restore_insn (rtx trial, bool return_p) /* The 'restore src,%g0,dest' pattern for double-word mode. */ else if (GET_MODE_CLASS (GET_MODE (src)) != MODE_FLOAT - && arith_double_operand (src, GET_MODE (src))) + && arith_double_operand (src, GET_MODE (src)) + && ! src_is_freg) return GET_MODE_SIZE (GET_MODE (src)) <= GET_MODE_SIZE (DImode); /* The 'restore src,%g0,dest' pattern for float if no FPU. */ @@ -7784,6 +7798,13 @@ sparc_split_regreg_legitimate (rtx reg1, rtx reg2) if (SPARC_INT_REG_P (regno1) && SPARC_INT_REG_P (regno2)) return 1; + if (TARGET_VIS3) + { + if ((SPARC_INT_REG_P (regno1) && SPARC_FP_REG_P (regno2)) + || (SPARC_FP_REG_P (regno1) && SPARC_INT_REG_P (regno2))) + return 1; + } + return 0; } @@ -10302,10 +10323,28 @@ static int sparc_register_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED, reg_class_t from, reg_class_t to) { - if ((FP_REG_CLASS_P (from) && general_or_i64_p (to)) - || (general_or_i64_p (from) && FP_REG_CLASS_P (to)) - || from == FPCC_REGS - || to == FPCC_REGS) + bool need_memory = false; + + if (from == FPCC_REGS || to == FPCC_REGS) + need_memory = true; + else if ((FP_REG_CLASS_P (from) && general_or_i64_p (to)) + || (general_or_i64_p (from) && FP_REG_CLASS_P (to))) + { + if (TARGET_VIS3) + { + int size = GET_MODE_SIZE (mode); + if (size == 8 || size == 4) + { + if (! TARGET_ARCH32 || size == 4) + return 4; + else + return 6; + } + } + need_memory = true; + } + + if (need_memory) { if (sparc_cpu == PROCESSOR_ULTRASPARC || sparc_cpu == PROCESSOR_ULTRASPARC3 @@ -11163,6 +11202,18 @@ sparc_preferred_reload_class (rtx x, reg_class_t rclass) } } + if (TARGET_VIS3 + && ! TARGET_ARCH64 + && (rclass == EXTRA_FP_REGS + || rclass == GENERAL_OR_EXTRA_FP_REGS)) + { + int regno = true_regnum (x); + + if (SPARC_INT_REG_P (regno)) + return (rclass == EXTRA_FP_REGS + ? FP_REGS : GENERAL_OR_FP_REGS); + } + return rclass; } @@ -11275,6 +11326,9 @@ sparc_secondary_reload (bool in_p, rtx x, reg_class_t rclass_i, { enum reg_class rclass = (enum reg_class) rclass_i; + sri->icode = CODE_FOR_nothing; + sri->extra_cost = 0; + /* We need a temporary when loading/storing a HImode/QImode value between memory and the FPU registers. This can happen when combine puts a paradoxical subreg in a float/fix conversion insn. */ @@ -11307,6 +11361,25 @@ sparc_secondary_reload (bool in_p, rtx x, reg_class_t rclass_i, return NO_REGS; } + if (TARGET_VIS3 && TARGET_ARCH32) + { + int regno = true_regnum (x); + + /* When using VIS3 fp<-->int register moves, on 32-bit we have + to move 8-byte values in 4-byte pieces. This only works via + FP_REGS, and not via EXTRA_FP_REGS. Therefore if we try to + move between EXTRA_FP_REGS and GENERAL_REGS, we will need + an FP_REGS intermediate move. */ + if ((rclass == EXTRA_FP_REGS && SPARC_INT_REG_P (regno)) + || ((general_or_i64_p (rclass) + || rclass == GENERAL_OR_FP_REGS) + && SPARC_FP_REG_P (regno))) + { + sri->extra_cost = 2; + return FP_REGS; + } + } + return NO_REGS; } diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h index 76240f0..aed18fc 100644 --- a/gcc/config/sparc/sparc.h +++ b/gcc/config/sparc/sparc.h @@ -1040,10 +1040,13 @@ extern char leaf_reg_remap[]; #define SPARC_SETHI32_P(X) \ (SPARC_SETHI_P ((unsigned HOST_WIDE_INT) (X) & GET_MODE_MASK (SImode))) -/* On SPARC it is not possible to directly move data between - GENERAL_REGS and FP_REGS. */ +/* On SPARC when not VIS3 it is not possible to directly move data + between GENERAL_REGS and FP_REGS. */ #define SECONDARY_MEMORY_NEEDED(CLASS1, CLASS2, MODE) \ - (FP_REG_CLASS_P (CLASS1) != FP_REG_CLASS_P (CLASS2)) + ((FP_REG_CLASS_P (CLASS1) != FP_REG_CLASS_P (CLASS2)) \ + && (! TARGET_VIS3 \ + || GET_MODE_SIZE (MODE) > 8 \ + || GET_MODE_SIZE (MODE) < 4)) /* Get_secondary_mem widens its argument to BITS_PER_WORD which loses on v9 because the movsi and movsf patterns don't handle r/f moves. diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md index b84699a..0f716d6 100644 --- a/gcc/config/sparc/sparc.md +++ b/gcc/config/sparc/sparc.md @@ -1312,11 +1312,12 @@ DONE; }) -(define_insn "*movsi_insn" +(define_insn "*movsi_insn_novis3" [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,m,!f,!f,!m,d,d") (match_operand:SI 1 "input_operand" "rI,K,m,rJ,f,m,f,J,P"))] - "(register_operand (operands[0], SImode) - || register_or_zero_or_all_ones_operand (operands[1], SImode))" + "(! TARGET_VIS3 + && (register_operand (operands[0], SImode) + || register_or_zero_or_all_ones_operand (operands[1], SImode)))" "@ mov\t%1, %0 sethi\t%%hi(%a1), %0 @@ -1329,6 +1330,26 @@ fones\t%0" [(set_attr "type" "*,*,load,store,fpmove,fpload,fpstore,fga,fga")]) +(define_insn "*movsi_insn_vis3" + [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r, m, r,*f,*f,*f, m,d,d") + (match_operand:SI 1 "input_operand" "rI,K,m,rJ,*f, r, f, m,*f,J,P"))] + "(TARGET_VIS3 + && (register_operand (operands[0], SImode) + || register_or_zero_or_all_ones_operand (operands[1], SImode)))" + "@ + mov\t%1, %0 + sethi\t%%hi(%a1), %0 + ld\t%1, %0 + st\t%r1, %0 + movstouw\t%1, %0 + movwtos\t%1, %0 + fmovs\t%1, %0 + ld\t%1, %0 + st\t%1, %0 + fzeros\t%0 + fones\t%0" + [(set_attr "type" "*,*,load,store,*,*,fpmove,fpload,fpstore,fga,fga")]) + (define_insn "*movsi_lo_sum" [(set (match_operand:SI 0 "register_operand" "=r") (lo_sum:SI (match_operand:SI 1 "register_operand" "r") @@ -1486,13 +1507,14 @@ [(set_attr "type" "store,store,load,*,*,*,*,fpstore,fpload,*,*,*") (set_attr "length" "2,*,*,2,2,2,2,*,*,2,2,2")]) -(define_insn "*movdi_insn_sp32_v9" +(define_insn "*movdi_insn_sp32_v9_novis3" [(set (match_operand:DI 0 "nonimmediate_operand" "=T,o,T,U,o,r,r,r,?T,?f,?f,?o,?e,?e,?W,b,b") (match_operand:DI 1 "input_operand" " J,J,U,T,r,o,i,r, f, T, o, f, e, W, e,J,P"))] "! TARGET_ARCH64 && TARGET_V9 + && ! TARGET_VIS3 && (register_operand (operands[0], DImode) || register_or_zero_operand (operands[1], DImode))" "@ @@ -1517,10 +1539,45 @@ (set_attr "length" "*,2,*,*,2,2,2,2,*,*,2,2,*,*,*,*,*") (set_attr "fptype" "*,*,*,*,*,*,*,*,*,*,*,*,double,*,*,double,double")]) -(define_insn "*movdi_insn_sp64" +(define_insn "*movdi_insn_sp32_v9_vis3" + [(set (match_operand:DI 0 "nonimmediate_operand" + "=T,o,T,U,o,r,r,r,?T,?*f,?*f,?o,?*e, r,?*f,?*e,?W,b,b") + (match_operand:DI 1 "input_operand" + " J,J,U,T,r,o,i,r,*f, T, o,*f, *e,?*f, r, W,*e,J,P"))] + "! TARGET_ARCH64 + && TARGET_V9 + && TARGET_VIS3 + && (register_operand (operands[0], DImode) + || register_or_zero_operand (operands[1], DImode))" + "@ + stx\t%%g0, %0 + # + std\t%1, %0 + ldd\t%1, %0 + # + # + # + # + std\t%1, %0 + ldd\t%1, %0 + # + # + fmovd\t%1, %0 + # + # + ldd\t%1, %0 + std\t%1, %0 + fzero\t%0 + fone\t%0" + [(set_attr "type" "store,store,store,load,*,*,*,*,fpstore,fpload,*,*,*,*,fpmove,fpload,fpstore,fga,fga") + (set_attr "length" "*,2,*,*,2,2,2,2,*,*,2,2,*,2,2,*,*,*,*") + (set_attr "fptype" "*,*,*,*,*,*,*,*,*,*,*,*,double,*,*,*,*,double,double")]) + +(define_insn "*movdi_insn_sp64_novis3" [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,m,?e,?e,?W,b,b") (match_operand:DI 1 "input_operand" "rI,N,m,rJ,e,W,e,J,P"))] "TARGET_ARCH64 + && ! TARGET_VIS3 && (register_operand (operands[0], DImode) || register_or_zero_or_all_ones_operand (operands[1], DImode))" "@ @@ -1536,6 +1593,28 @@ [(set_attr "type" "*,*,load,store,fpmove,fpload,fpstore,fga,fga") (set_attr "fptype" "*,*,*,*,double,*,*,double,double")]) +(define_insn "*movdi_insn_sp64_vis3" + [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r, m, r,*e,?*e,?*e,?W,b,b") + (match_operand:DI 1 "input_operand" "rI,N,m,rJ,*e, r, *e, W,*e,J,P"))] + "TARGET_ARCH64 + && TARGET_VIS3 + && (register_operand (operands[0], DImode) + || register_or_zero_or_all_ones_operand (operands[1], DImode))" + "@ + mov\t%1, %0 + sethi\t%%hi(%a1), %0 + ldx\t%1, %0 + stx\t%r1, %0 + movdtox\t%1, %0 + movxtod\t%1, %0 + fmovd\t%1, %0 + ldd\t%1, %0 + std\t%1, %0 + fzero\t%0 + fone\t%0" + [(set_attr "type" "*,*,load,store,*,*,fpmove,fpload,fpstore,fga,fga") + (set_attr "fptype" "*,*,*,*,*,*,double,*,*,double,double")]) + (define_expand "movdi_pic_label_ref" [(set (match_dup 3) (high:DI (unspec:DI [(match_operand:DI 1 "label_ref_operand" "") @@ -1933,10 +2012,11 @@ DONE; }) -(define_insn "*movsf_insn" +(define_insn "*movsf_insn_novis3" [(set (match_operand:SF 0 "nonimmediate_operand" "=d, d,f, *r,*r,*r,f,*r,m, m") (match_operand:SF 1 "input_operand" "GY,ZC,f,*rRY, Q, S,m, m,f,*rGY"))] "TARGET_FPU + && ! TARGET_VIS3 && (register_operand (operands[0], SFmode) || register_or_zero_or_all_ones_operand (operands[1], SFmode))" { @@ -1979,6 +2059,57 @@ } [(set_attr "type" "fga,fga,fpmove,*,*,*,fpload,load,fpstore,store")]) +(define_insn "*movsf_insn_vis3" + [(set (match_operand:SF 0 "nonimmediate_operand" "=d, d,f, *r,*r,*r,*r, f, f,*r, m, m") + (match_operand:SF 1 "input_operand" "GY,ZC,f,*rRY, Q, S, f,*r, m, m, f,*rGY"))] + "TARGET_FPU + && TARGET_VIS3 + && (register_operand (operands[0], SFmode) + || register_or_zero_or_all_ones_operand (operands[1], SFmode))" +{ + if (GET_CODE (operands[1]) == CONST_DOUBLE + && (which_alternative == 3 + || which_alternative == 4 + || which_alternative == 5)) + { + REAL_VALUE_TYPE r; + long i; + + REAL_VALUE_FROM_CONST_DOUBLE (r, operands[1]); + REAL_VALUE_TO_TARGET_SINGLE (r, i); + operands[1] = GEN_INT (i); + } + + switch (which_alternative) + { + case 0: + return "fzeros\t%0"; + case 1: + return "fones\t%0"; + case 2: + return "fmovs\t%1, %0"; + case 3: + return "mov\t%1, %0"; + case 4: + return "sethi\t%%hi(%a1), %0"; + case 5: + return "#"; + case 6: + return "movstouw\t%1, %0"; + case 7: + return "movwtos\t%1, %0"; + case 8: + case 9: + return "ld\t%1, %0"; + case 10: + case 11: + return "st\t%r1, %0"; + default: + gcc_unreachable (); + } +} + [(set_attr "type" "fga,fga,fpmove,*,*,*,*,*,fpload,load,fpstore,store")]) + ;; Exactly the same as above, except that all `f' cases are deleted. ;; This is necessary to prevent reload from ever trying to use a `f' reg ;; when -mno-fpu. @@ -2107,11 +2238,12 @@ (set_attr "length" "*,*,2,2,2")]) ;; We have available v9 double floats but not 64-bit integer registers. -(define_insn "*movdf_insn_sp32_v9" +(define_insn "*movdf_insn_sp32_v9_novis3" [(set (match_operand:DF 0 "nonimmediate_operand" "=b, b,e, e, T,W,U,T, f, *r, o") (match_operand:DF 1 "input_operand" "GY,ZC,e,W#F,GY,e,T,U,o#F,*roGYDF,*rGYf"))] "TARGET_FPU && TARGET_V9 + && ! TARGET_VIS3 && ! TARGET_ARCH64 && (register_operand (operands[0], DFmode) || register_or_zero_or_all_ones_operand (operands[1], DFmode))" @@ -2131,6 +2263,33 @@ (set_attr "length" "*,*,*,*,*,*,*,*,2,2,2") (set_attr "fptype" "double,double,double,*,*,*,*,*,*,*,*")]) +(define_insn "*movdf_insn_sp32_v9_vis3" + [(set (match_operand:DF 0 "nonimmediate_operand" "=b, b,e,*r, f, e, T,W,U,T, f, *r, o") + (match_operand:DF 1 "input_operand" "GY,ZC,e, f,*r,W#F,GY,e,T,U,o#F,*roGYDF,*rGYf"))] + "TARGET_FPU + && TARGET_V9 + && TARGET_VIS3 + && ! TARGET_ARCH64 + && (register_operand (operands[0], DFmode) + || register_or_zero_or_all_ones_operand (operands[1], DFmode))" + "@ + fzero\t%0 + fone\t%0 + fmovd\t%1, %0 + # + # + ldd\t%1, %0 + stx\t%r1, %0 + std\t%1, %0 + ldd\t%1, %0 + std\t%1, %0 + # + # + #" + [(set_attr "type" "fga,fga,fpmove,*,*,load,store,store,load,store,*,*,*") + (set_attr "length" "*,*,*,2,2,*,*,*,*,*,2,2,2") + (set_attr "fptype" "double,double,double,*,*,*,*,*,*,*,*,*,*")]) + (define_insn "*movdf_insn_sp32_v9_no_fpu" [(set (match_operand:DF 0 "nonimmediate_operand" "=U,T,T,r,o") (match_operand:DF 1 "input_operand" "T,U,G,ro,rG"))] @@ -2149,10 +2308,11 @@ (set_attr "length" "*,*,*,2,2")]) ;; We have available both v9 double floats and 64-bit integer registers. -(define_insn "*movdf_insn_sp64" +(define_insn "*movdf_insn_sp64_novis3" [(set (match_operand:DF 0 "nonimmediate_operand" "=b, b,e, e,W, *r,*r, m,*r") (match_operand:DF 1 "input_operand" "GY,ZC,e,W#F,e,*rGY, m,*rGY,DF"))] "TARGET_FPU + && ! TARGET_VIS3 && TARGET_ARCH64 && (register_operand (operands[0], DFmode) || register_or_zero_or_all_ones_operand (operands[1], DFmode))" @@ -2170,6 +2330,30 @@ (set_attr "length" "*,*,*,*,*,*,*,*,2") (set_attr "fptype" "double,double,double,*,*,*,*,*,*")]) +(define_insn "*movdf_insn_sp64_vis3" + [(set (match_operand:DF 0 "nonimmediate_operand" "=b, b,e,*r, e, e,W, *r,*r, m,*r") + (match_operand:DF 1 "input_operand" "GY,ZC,e, e,*r,W#F,e,*rGY, m,*rGY,DF"))] + "TARGET_FPU + && TARGET_ARCH64 + && TARGET_VIS3 + && (register_operand (operands[0], DFmode) + || register_or_zero_or_all_ones_operand (operands[1], DFmode))" + "@ + fzero\t%0 + fone\t%0 + fmovd\t%1, %0 + movdtox\t%1, %0 + movxtod\t%1, %0 + ldd\t%1, %0 + std\t%1, %0 + mov\t%r1, %0 + ldx\t%1, %0 + stx\t%r1, %0 + #" + [(set_attr "type" "fga,fga,fpmove,*,*,load,store,*,load,store,*") + (set_attr "length" "*,*,*,*,*,*,*,*,*,*,2") + (set_attr "fptype" "double,double,double,double,double,*,*,*,*,*,*")]) + (define_insn "*movdf_insn_sp64_no_fpu" [(set (match_operand:DF 0 "nonimmediate_operand" "=r,r,m") (match_operand:DF 1 "input_operand" "r,m,rG"))] @@ -2444,7 +2628,8 @@ && (! TARGET_ARCH64 || (TARGET_FPU && ! TARGET_HARD_QUAD) - || ! fp_register_operand (operands[0], TFmode))" + || (! fp_register_operand (operands[0], TFmode) + && ! fp_register_operand (operands[1], TFmode)))" [(clobber (const_int 0))] { rtx set_dest = operands[0]; @@ -2944,15 +3129,29 @@ "" "") -(define_insn "*zero_extendsidi2_insn_sp64" +(define_insn "*zero_extendsidi2_insn_sp64_novis3" [(set (match_operand:DI 0 "register_operand" "=r,r") (zero_extend:DI (match_operand:SI 1 "input_operand" "r,m")))] - "TARGET_ARCH64 && GET_CODE (operands[1]) != CONST_INT" + "TARGET_ARCH64 + && ! TARGET_VIS3 + && GET_CODE (operands[1]) != CONST_INT" "@ srl\t%1, 0, %0 lduw\t%1, %0" [(set_attr "type" "shift,load")]) +(define_insn "*zero_extendsidi2_insn_sp64_vis3" + [(set (match_operand:DI 0 "register_operand" "=r,r,r") + (zero_extend:DI (match_operand:SI 1 "input_operand" "r,m,*f")))] + "TARGET_ARCH64 + && TARGET_VIS3 + && GET_CODE (operands[1]) != CONST_INT" + "@ + srl\t%1, 0, %0 + lduw\t%1, %0 + movstouw\t%1, %0" + [(set_attr "type" "shift,load,*")]) + (define_insn_and_split "*zero_extendsidi2_insn_sp32" [(set (match_operand:DI 0 "register_operand" "=r") (zero_extend:DI (match_operand:SI 1 "register_operand" "r")))] @@ -3276,16 +3475,27 @@ "TARGET_ARCH64" "") -(define_insn "*sign_extendsidi2_insn" +(define_insn "*sign_extendsidi2_insn_novis3" [(set (match_operand:DI 0 "register_operand" "=r,r") (sign_extend:DI (match_operand:SI 1 "input_operand" "r,m")))] - "TARGET_ARCH64" + "TARGET_ARCH64 && ! TARGET_VIS3" "@ sra\t%1, 0, %0 ldsw\t%1, %0" [(set_attr "type" "shift,sload") (set_attr "us3load_type" "*,3cycle")]) +(define_insn "*sign_extendsidi2_insn_vis3" + [(set (match_operand:DI 0 "register_operand" "=r,r,r") + (sign_extend:DI (match_operand:SI 1 "input_operand" "r,m,*f")))] + "TARGET_ARCH64 && TARGET_VIS3" + "@ + sra\t%1, 0, %0 + ldsw\t%1, %0 + movstosw\t%1, %0" + [(set_attr "type" "shift,sload,*") + (set_attr "us3load_type" "*,3cycle,*")]) + ;; Special pattern for optimizing bit-field compares. This is needed ;; because combine uses this as a canonical form. @@ -7769,10 +7979,11 @@ DONE; }) -(define_insn "*mov_insn" - [(set (match_operand:VM32 0 "nonimmediate_operand" "=f, f,f,f,m, m,r,m, r, r") - (match_operand:VM32 1 "input_operand" "GY,ZC,f,m,f,GY,m,r,GY,ZC"))] +(define_insn "*mov_insn_novis3" + [(set (match_operand:VM32 0 "nonimmediate_operand" "=f, f,f,f,m, m,r,m,*r") + (match_operand:VM32 1 "input_operand" "GY,ZC,f,m,f,GY,m,r,*r"))] "TARGET_VIS + && ! TARGET_VIS3 && (register_operand (operands[0], mode) || register_or_zero_or_all_ones_operand (operands[1], mode))" "@ @@ -7784,14 +7995,35 @@ st\t%r1, %0 ld\t%1, %0 st\t%1, %0 - mov\t0, %0 - mov\t-1, %0" - [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*,*")]) + mov\t%1, %0" + [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*")]) -(define_insn "*mov_insn_sp64" - [(set (match_operand:VM64 0 "nonimmediate_operand" "=e, e,e,e,m, m,r,m, r, r") - (match_operand:VM64 1 "input_operand" "GY,ZC,e,m,e,GY,m,r,GY,ZC"))] +(define_insn "*mov_insn_vis3" + [(set (match_operand:VM32 0 "nonimmediate_operand" "=f, f,f,f,m, m,*r, m,*r,*r, f") + (match_operand:VM32 1 "input_operand" "GY,ZC,f,m,f,GY, m,*r,*r, f,*r"))] "TARGET_VIS + && TARGET_VIS3 + && (register_operand (operands[0], mode) + || register_or_zero_or_all_ones_operand (operands[1], mode))" + "@ + fzeros\t%0 + fones\t%0 + fsrc1s\t%1, %0 + ld\t%1, %0 + st\t%1, %0 + st\t%r1, %0 + ld\t%1, %0 + st\t%1, %0 + mov\t%1, %0 + movstouw\t%1, %0 + movwtos\t%1, %0" + [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*,*,*")]) + +(define_insn "*mov_insn_sp64_novis3" + [(set (match_operand:VM64 0 "nonimmediate_operand" "=e, e,e,e,m, m,r,m,*r") + (match_operand:VM64 1 "input_operand" "GY,ZC,e,m,e,GY,m,r,*r"))] + "TARGET_VIS + && ! TARGET_VIS3 && TARGET_ARCH64 && (register_operand (operands[0], mode) || register_or_zero_or_all_ones_operand (operands[1], mode))" @@ -7804,14 +8036,36 @@ stx\t%r1, %0 ldx\t%1, %0 stx\t%1, %0 - mov\t0, %0 - mov\t-1, %0" - [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*,*")]) + mov\t%1, %0" + [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*")]) -(define_insn "*mov_insn_sp32" - [(set (match_operand:VM64 0 "nonimmediate_operand" "=e, e,e,e,m, m,U,T,o, r, r") - (match_operand:VM64 1 "input_operand" "GY,ZC,e,m,e,GY,T,U,r,GY,ZC"))] +(define_insn "*mov_insn_sp64_vis3" + [(set (match_operand:VM64 0 "nonimmediate_operand" "=e, e,e,e,m, m,*r, m,*r, f,*r") + (match_operand:VM64 1 "input_operand" "GY,ZC,e,m,e,GY, m,*r, f,*r,*r"))] "TARGET_VIS + && TARGET_VIS3 + && TARGET_ARCH64 + && (register_operand (operands[0], mode) + || register_or_zero_or_all_ones_operand (operands[1], mode))" + "@ + fzero\t%0 + fone\t%0 + fsrc1\t%1, %0 + ldd\t%1, %0 + std\t%1, %0 + stx\t%r1, %0 + ldx\t%1, %0 + stx\t%1, %0 + movdtox\t%1, %0 + movxtod\t%1, %0 + mov\t%1, %0" + [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*,*,*")]) + +(define_insn "*mov_insn_sp32_novis3" + [(set (match_operand:VM64 0 "nonimmediate_operand" "=e, e,e,e,m, m,U,T,o,*r") + (match_operand:VM64 1 "input_operand" "GY,ZC,e,m,e,GY,T,U,r,*r"))] + "TARGET_VIS + && ! TARGET_VIS3 && ! TARGET_ARCH64 && (register_operand (operands[0], mode) || register_or_zero_or_all_ones_operand (operands[1], mode))" @@ -7825,10 +8079,33 @@ ldd\t%1, %0 std\t%1, %0 # - mov 0, %L0; mov 0, %H0 - mov -1, %L0; mov -1, %H0" - [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*,*,*") - (set_attr "length" "*,*,*,*,*,*,*,*,2,2,2")]) + #" + [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*,*") + (set_attr "length" "*,*,*,*,*,*,*,*,2,2")]) + +(define_insn "*mov_insn_sp32_vis3" + [(set (match_operand:VM64 0 "nonimmediate_operand" "=e, e,e,*r, f,e,m, m,U,T, o,*r") + (match_operand:VM64 1 "input_operand" "GY,ZC,e, f,*r,m,e,GY,T,U,*r,*r"))] + "TARGET_VIS + && TARGET_VIS3 + && ! TARGET_ARCH64 + && (register_operand (operands[0], mode) + || register_or_zero_or_all_ones_operand (operands[1], mode))" + "@ + fzero\t%0 + fone\t%0 + fsrc1\t%1, %0 + # + # + ldd\t%1, %0 + std\t%1, %0 + stx\t%r1, %0 + ldd\t%1, %0 + std\t%1, %0 + # + #" + [(set_attr "type" "fga,fga,fga,*,*,fpload,fpstore,store,load,store,*,*") + (set_attr "length" "*,*,*,2,2,*,*,*,*,*,2,2")]) (define_split [(set (match_operand:VM64 0 "memory_operand" "") @@ -7851,6 +8128,40 @@ DONE; }) +(define_split + [(set (match_operand:VM64 0 "register_operand" "") + (match_operand:VM64 1 "register_operand" ""))] + "reload_completed + && TARGET_VIS + && ! TARGET_ARCH64 + && sparc_split_regreg_legitimate (operands[0], operands[1])" + [(clobber (const_int 0))] +{ + rtx set_dest = operands[0]; + rtx set_src = operands[1]; + rtx dest1, dest2; + rtx src1, src2; + + dest1 = gen_highpart (SImode, set_dest); + dest2 = gen_lowpart (SImode, set_dest); + src1 = gen_highpart (SImode, set_src); + src2 = gen_lowpart (SImode, set_src); + + /* Now emit using the real source and destination we found, swapping + the order if we detect overlap. */ + if (reg_overlap_mentioned_p (dest1, src2)) + { + emit_insn (gen_movsi (dest2, src2)); + emit_insn (gen_movsi (dest1, src1)); + } + else + { + emit_insn (gen_movsi (dest1, src1)); + emit_insn (gen_movsi (dest2, src2)); + } + DONE; +}) + (define_expand "vec_init" [(match_operand:VMALL 0 "register_operand" "") (match_operand:VMALL 1 "" "")]