From patchwork Fri Feb 1 20:15:05 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 217573 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 8F3DD2C0292 for ; Sat, 2 Feb 2013 07:16:54 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1360354615; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Received:Received:Received:Received:Received:Date:From:To: Subject:Message-ID:Mail-Followup-To:MIME-Version:Content-Type: Content-Disposition:User-Agent:Mailing-List:Precedence:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:Sender: Delivered-To; bh=oxpjCzRAlvxZ5lOSftQmn4g4jQw=; b=pdfZCFv0Xy7FIGB hJ05WsSb6gyTDB6Lx/XsN/T/KohwM/Etic6zwqvs6BZBH4Lf9NE7aAsyf52MrTHc 0uS2wRTtkpnoZv4EBZFwAtbXr5ZKFgIPWz/rjhZ3NtlyRKqH1ADfnel23qwdl7Xd ySyiOwEdhDS5P1HCIZYDeVxje2pM= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Received:Received:Received:Received:Date:From:To:Subject:Message-ID:Mail-Followup-To:MIME-Version:Content-Type:Content-Disposition:User-Agent:X-Content-Scanned:x-cbid:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=eIPItHAwU006Stlmwm5OI0zj8sfh52Nyiv2snm7GbRA+ZxsAFIVD80mAi+Lw6E dqzJlytH3OuI3CGPu3vgxqFZQIQMAGkV7FuUBvyuGAcAEQuy+kQwBIGrWpN+oZ3T gOKRutKzNVccLyHsHMpimrqdnzlvv6pkif6AXjTW/9RQI=; Received: (qmail 17390 invoked by alias); 1 Feb 2013 20:16:44 -0000 Received: (qmail 17364 invoked by uid 22791); 1 Feb 2013 20:16:39 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=AWL, BAYES_40, KHOP_RCVD_UNTRUST, KHOP_SPAMHAUS_DROP, RCVD_IN_DNSWL_HI, RCVD_IN_HOSTKARMA_W, TW_FP X-Spam-Check-By: sourceware.org Received: from e33.co.us.ibm.com (HELO e33.co.us.ibm.com) (32.97.110.151) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 01 Feb 2013 20:16:22 +0000 Received: from /spool/local by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 1 Feb 2013 13:16:20 -0700 Received: from d03dlp01.boulder.ibm.com (9.17.202.177) by e33.co.us.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 1 Feb 2013 13:16:18 -0700 Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com [9.17.195.107]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 1738B1FF003C for ; Fri, 1 Feb 2013 13:16:16 -0700 (MST) Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay05.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r11KGCBM057126 for ; Fri, 1 Feb 2013 13:16:13 -0700 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r11KFDif022772 for ; Fri, 1 Feb 2013 13:15:13 -0700 Received: from ibm-tiger.the-meissners.org ([9.33.37.85]) by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id r11KF80Q021939; Fri, 1 Feb 2013 13:15:09 -0700 Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id BB4AB424B9; Fri, 1 Feb 2013 15:15:05 -0500 (EST) Date: Fri, 1 Feb 2013 15:15:05 -0500 From: Michael Meissner To: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com Subject: [PATCH, RFC] GCC 4.9, powerpc, allow TImode in VSX registers Message-ID: <20130201201505.GA30409@ibm-tiger.the-meissners.org> Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, dje.gcc@gmail.com MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-12-10) X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13020120-2398-0000-0000-000010A1A49F X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org When I did the initial power7 port, I punted on allowing TImode in the VSX registers because I couldn't get it to work. I am now revisiting it, and these patches are my current effort, and I was wondering if people had comments on them. In terms of performance, there are two benchmarks in the Spec 2006 suite that have minor regressions (perlbench and gamess), and 3 that have minor improvements (hmmer, h264ref, and gromacs), so overall it looks like a wash. I do want to look the regressions, and see if there is something simple to tweak. Some issues I ran into include: I needed to set CANNOT_CHANGE_MODE so that TImode won't overlap with smaller data types, due to the scalar portion of the register being in the upper 64-bits of the VSX register. I limited the available address formats for TImode to be REG+REG needed for VSX instructions. I discovered that setjmp/longjmp and exception handling needed to create TImode values with the STACK_SAVEAREA_MODE macro. However, the implementation of this needs REG+OFFSET addressing. So, I added a new type PTImode, which is only used for STACK_SAVEAREA_MODE, and PTImode is limited to the GPRs. If I enable logical operations in TImode mode (and, xor, etc.), the compiler will convert DImode logical operations to TImode for 32-bit programs. In the future, I think I will tune this and/or provide insn splitters for DImode logical operations. For now, I just disallow logical operations on TImode if 32-bit. I added a debug switch (-mvsx-timode) to disable putting TImode into VSX registers. 2013-01-31 Michael Meissner * config/rs6000/vector.md (mul3): Use the combined macro VECTOR_UNIT_ALTIVEC_OR_VSX_P instead of separate calls to VECTOR_UNIT_ALTIVEC_P and VECTOR_UNIT_VSX_P. (vcond): Likewise. (vcondu): Likewise. (vector_gtu): Likewise. (vector_gte): Likewise. (xor3): Don't allow logical operations on TImode in 32-bit to prevent the compiler from converting DImode operations to TImode. (ior3): Likewise. (and3): Likewise. (one_cmpl2): Likewise. (nor3): Likewise. (andc3): Likewise. * config/rs6000/constraints.md (wt constraint): New constraint that returns VSX_REGS if TImode is allowed in VSX registers. * config/rs6000/predicates.md (easy_fp_constant): 0.0f is an easy constant under VSX. * config/rs6000/rs6000-modes.def (PTImode): Define, PTImode is similar to TImode, but it is restricted to being in the GPRs. * config/rs6000/rs6000.opt (-mvsx-timode): New switch to allow TImode to occupy a single VSX register. * config/rs6000/rs6000-cpus.def (ISA_2_6_MASKS_SERVER): Default to -mvsx-timode for power7/power8. (power7 cpu): Likewise. (power8 cpu): Likewise. * config/rs6000/rs6000.c (rs6000_hard_regno_nregs_internal): Make sure that TFmode/TDmode take up two registers if they are ever allowed in the upper VSX registers. (rs6000_hard_regno_mode_ok): If -mvsx-timode, allow TImode in VSX registers. (rs6000_init_hard_regno_mode_ok): Likewise. (rs6000_debug_reg_global): Add debugging for PTImode and wt constraint. Print if LRA is turned on. (rs6000_option_override_internal): Give an error if -mvsx-timode and VSX is not enabled. (invalid_e500_subreg): Handle PTImode, restricting it to GPRs. If -mvsx-timode, restrict TImode to reg+reg addressing, and PTImode to reg+offset addressing. Use PTImode when checking offset addresses for validity. (reg_offset_addressing_ok_p): Likewise. (rs6000_legitimate_offset_address_p): Likewise. (rs6000_legitimize_address): Likewise. (rs6000_legitimize_reload_address): Likewise. (rs6000_legitimate_address_p): Likewise. (rs6000_eliminate_indexed_memrefs): Likewise. (rs6000_emit_move): Likewise. (rs6000_secondary_reload): Likewise. (rs6000_secondary_reload_inner): Handle PTImode. Allow 64-bit reloads to fpr registers to continue to use reg+offset addressing, but 64-bit reloads to altivec registers need reg+reg addressing. Drop test for PRE_MODIFY, since VSX loads/stores no longer support it. Treat LO_SUM like a PLUS operation. (rs6000_secondary_reload_class): If type is 64-bit, prefer to use FLOAT_REGS instead of VSX_RGS to allow use of reg+offset addressing. (rs6000_cannot_change_mode_class): Do not allow TImode in VSX registers to share a register with a smaller sized type, since VSX puts scalars in the upper 64-bits. (print_operand): Add support for PTImode. (rs6000_register_move_cost): Use VECTOR_MEM_VSX_P instead of VECTOR_UNIT_VSX_P to catch types that can be loaded in VSX registers, but don't have arithmetic support. (rs6000_memory_move_cost): Add test for VSX. (rs6000_opt_masks): Add -mvsx-timode. * config/rs6000/vsx.md (VSm): Change to use 64-bit aligned moves for TImode. (VSs): Likewise. (VSr): Use wt constraint for TImode. (VSv): Drop TImode support. (vsx_movti): Delete, replace with versions for 32-bit and 64-bit. (vsx_movti_64bit): Likewise. (vsx_movti_32bit): Likewise. (vec_store_): Use VSX iterator instead of vector iterator. (vsx_and3): Delete use of '?' constraint on inputs, just put one '?' on the appropriate output constraint. Do not allow TImode logical operations on 32-bit systems. (vsx_ior3): Likewise. (vsx_xor3): Likewise. (vsx_one_cmpl2): Likewise. (vsx_nor3): Likewise. (vsx_andc3): Likewise. (vsx_concat_): Likewise. (vsx_xxpermdi_): Fix thinko for non V2DF/V2DI modes. * config/rs6000/rs6000.h (MASK_VSX_TIMODE): Map from OPTION_MASK_VSX_TIMODE. (enum rs6000_reg_class_enum): Add RS6000_CONSTRAINT_wt. (STACK_SAVEAREA_MODE): Use PTImode instead of TImode. * config/rs6000/rs6000.md (INT mode attribute): Add PTImode. (TI2 iterator): New iterator for TImode, PTImode. (wd mode attribute): Add values for vector types. (movti_string): Replace TI move operations with operations for TImode and PTImode. Add support for TImode being allowed in VSX registers. (mov_string, TImode/PTImode): Likewise. (movti_ppc64): Likewise. (mov_ppc64, TImode/PTImode): Likewise. (TI mode splitters): Likewise. * doc/md.texi (PowerPC and IBM RS6000 constraints): Document wt constraint. Index: gcc/config/rs6000/vector.md =================================================================== --- gcc/config/rs6000/vector.md (revision 195592) +++ gcc/config/rs6000/vector.md (working copy) @@ -249,7 +249,7 @@ (define_expand "mul3" [(set (match_operand:VEC_F 0 "vfloat_operand" "") (mult:VEC_F (match_operand:VEC_F 1 "vfloat_operand" "") (match_operand:VEC_F 2 "vfloat_operand" "")))] - "VECTOR_UNIT_VSX_P (mode) || VECTOR_UNIT_ALTIVEC_P (mode)" + "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)" { if (mode == V4SFmode && VECTOR_UNIT_ALTIVEC_P (mode)) { @@ -395,7 +395,7 @@ (define_expand "vcond" (match_operand:VEC_I 5 "vint_operand" "")]) (match_operand:VEC_I 1 "vint_operand" "") (match_operand:VEC_I 2 "vint_operand" "")))] - "VECTOR_UNIT_ALTIVEC_P (mode)" + "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)" " { if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2], @@ -451,7 +451,7 @@ (define_expand "vcondu" (match_operand:VEC_I 5 "vint_operand" "")]) (match_operand:VEC_I 1 "vint_operand" "") (match_operand:VEC_I 2 "vint_operand" "")))] - "VECTOR_UNIT_ALTIVEC_P (mode)" + "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)" " { if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2], @@ -505,14 +505,14 @@ (define_expand "vector_gtu" [(set (match_operand:VEC_I 0 "vint_operand" "") (gtu:VEC_I (match_operand:VEC_I 1 "vint_operand" "") (match_operand:VEC_I 2 "vint_operand" "")))] - "VECTOR_UNIT_ALTIVEC_P (mode)" + "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)" "") (define_expand "vector_geu" [(set (match_operand:VEC_I 0 "vint_operand" "") (geu:VEC_I (match_operand:VEC_I 1 "vint_operand" "") (match_operand:VEC_I 2 "vint_operand" "")))] - "VECTOR_UNIT_ALTIVEC_P (mode)" + "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)" "") (define_insn_and_split "*vector_uneq" @@ -709,45 +709,55 @@ (define_expand "cr6_test_for_lt_reverse" ;; Vector logical instructions +;; Do not support TImode logical instructions on 32-bit at present, because the +;; compiler will see that we have a TImode and when it wanted DImode, and +;; convert the DImode to TImode, store it on the stack, and load it in a VSX +;; register. (define_expand "xor3" [(set (match_operand:VEC_L 0 "vlogical_operand" "") (xor:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "") (match_operand:VEC_L 2 "vlogical_operand" "")))] - "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)" + "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) + && (mode != TImode || TARGET_POWERPC64)" "") (define_expand "ior3" [(set (match_operand:VEC_L 0 "vlogical_operand" "") (ior:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "") (match_operand:VEC_L 2 "vlogical_operand" "")))] - "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)" + "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) + && (mode != TImode || TARGET_POWERPC64)" "") (define_expand "and3" [(set (match_operand:VEC_L 0 "vlogical_operand" "") (and:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "") (match_operand:VEC_L 2 "vlogical_operand" "")))] - "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)" + "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) + && (mode != TImode || TARGET_POWERPC64)" "") (define_expand "one_cmpl2" [(set (match_operand:VEC_L 0 "vlogical_operand" "") (not:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")))] - "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)" + "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) + && (mode != TImode || TARGET_POWERPC64)" "") (define_expand "nor3" [(set (match_operand:VEC_L 0 "vlogical_operand" "") (not:VEC_L (ior:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "") (match_operand:VEC_L 2 "vlogical_operand" ""))))] - "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)" + "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) + && (mode != TImode || TARGET_POWERPC64)" "") (define_expand "andc3" [(set (match_operand:VEC_L 0 "vlogical_operand" "") (and:VEC_L (not:VEC_L (match_operand:VEC_L 2 "vlogical_operand" "")) (match_operand:VEC_L 1 "vlogical_operand" "")))] - "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)" + "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) + && (mode != TImode || TARGET_POWERPC64)" "") ;; Same size conversions Index: gcc/config/rs6000/constraints.md =================================================================== --- gcc/config/rs6000/constraints.md (revision 195592) +++ gcc/config/rs6000/constraints.md (working copy) @@ -64,6 +64,10 @@ (define_register_constraint "wf" "rs6000 (define_register_constraint "ws" "rs6000_constraints[RS6000_CONSTRAINT_ws]" "@internal") +;; TImode in VSX registers +(define_register_constraint "wt" "rs6000_constraints[RS6000_CONSTRAINT_wt]" + "@internal") + ;; any VSX register (define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]" "@internal") Index: gcc/config/rs6000/predicates.md =================================================================== --- gcc/config/rs6000/predicates.md (revision 195592) +++ gcc/config/rs6000/predicates.md (working copy) @@ -329,6 +329,11 @@ (define_predicate "easy_fp_constant" && mode != DImode) return 1; + /* The constant 0.0 is easy under VSX. */ + if ((mode == SFmode || mode == DFmode || mode == SDmode || mode == DDmode) + && VECTOR_UNIT_VSX_P (DFmode) && op == CONST0_RTX (mode)) + return 1; + if (DECIMAL_FLOAT_MODE_P (mode)) return 0; Index: gcc/config/rs6000/rs6000-modes.def =================================================================== --- gcc/config/rs6000/rs6000-modes.def (revision 195592) +++ gcc/config/rs6000/rs6000-modes.def (working copy) @@ -41,3 +41,6 @@ VECTOR_MODE (INT, DI, 1); VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */ VECTOR_MODES (FLOAT, 16); /* V8HF V4SF V2DF */ VECTOR_MODES (FLOAT, 32); /* V16HF V8SF V4DF */ + +/* Replacement for TImode that only is allowed in GPRs. */ +PARTIAL_INT_MODE (TI); Index: gcc/config/rs6000/rs6000-cpus.def =================================================================== --- gcc/config/rs6000/rs6000-cpus.def (revision 195592) +++ gcc/config/rs6000/rs6000-cpus.def (working copy) @@ -42,7 +42,8 @@ #define ISA_2_6_MASKS_SERVER (ISA_2_5_MASKS_SERVER \ | OPTION_MASK_POPCNTD \ | OPTION_MASK_ALTIVEC \ - | OPTION_MASK_VSX) + | OPTION_MASK_VSX \ + | OPTION_MASK_VSX_TIMODE) #define POWERPC_7400_MASK (OPTION_MASK_PPC_GFXOPT | OPTION_MASK_ALTIVEC) @@ -76,7 +77,8 @@ | OPTION_MASK_RECIP_PRECISION \ | OPTION_MASK_SOFT_FLOAT \ | OPTION_MASK_STRICT_ALIGN_OPTIONAL \ - | OPTION_MASK_VSX) + | OPTION_MASK_VSX \ + | OPTION_MASK_VSX_TIMODE) #endif @@ -165,11 +167,11 @@ RS6000_CPU ("power6x", PROCESSOR_POWER6, RS6000_CPU ("power7", PROCESSOR_POWER7, /* Don't add MASK_ISEL by default */ POWERPC_7400_MASK | MASK_POWERPC64 | MASK_PPC_GPOPT | MASK_MFCRF | MASK_POPCNTB | MASK_FPRND | MASK_CMPB | MASK_DFP | MASK_POPCNTD - | MASK_VSX | MASK_RECIP_PRECISION) + | MASK_VSX | MASK_RECIP_PRECISION | MASK_VSX_TIMODE) RS6000_CPU ("power8", PROCESSOR_POWER7, /* Don't add MASK_ISEL by default */ POWERPC_7400_MASK | MASK_POWERPC64 | MASK_PPC_GPOPT | MASK_MFCRF | MASK_POPCNTB | MASK_FPRND | MASK_CMPB | MASK_DFP | MASK_POPCNTD - | MASK_VSX | MASK_RECIP_PRECISION) + | MASK_VSX | MASK_RECIP_PRECISION | MASK_VSX_TIMODE) RS6000_CPU ("powerpc", PROCESSOR_POWERPC, 0) RS6000_CPU ("powerpc64", PROCESSOR_POWERPC64, MASK_PPC_GFXOPT | MASK_POWERPC64) RS6000_CPU ("rs64", PROCESSOR_RS64A, MASK_PPC_GFXOPT | MASK_POWERPC64) Index: gcc/config/rs6000/rs6000.opt =================================================================== --- gcc/config/rs6000/rs6000.opt (revision 195592) +++ gcc/config/rs6000/rs6000.opt (working copy) @@ -514,3 +514,7 @@ Use/do not use r11 to hold the static li msave-toc-indirect Target Report Var(TARGET_SAVE_TOC_INDIRECT) Save Control whether we save the TOC in the prologue for indirect calls or generate the save inline + +mvsx-timode +Target Undocumented Mask(VSX_TIMODE) Var(rs6000_isa_flags) +; Allow/disallow TImode in VSX registers Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 195625) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1516,8 +1516,9 @@ rs6000_hard_regno_nregs_internal (int re { unsigned HOST_WIDE_INT reg_size; + /* TF/TD modes are special in that they always take 2 registers. */ if (FP_REGNO_P (regno)) - reg_size = (VECTOR_MEM_VSX_P (mode) + reg_size = ((VECTOR_MEM_VSX_P (mode) && mode != TDmode && mode != TFmode) ? UNITS_PER_VSX_WORD : UNITS_PER_FP_WORD); @@ -1561,14 +1562,18 @@ rs6000_hard_regno_mode_ok (int regno, en return ALTIVEC_REGNO_P (last_regno); } + /* Allow TImode in all VSX registers if the user asked for it. Note, PTImode + can only go in GPRs. */ + if (mode == TImode && TARGET_VSX_TIMODE && VSX_REGNO_P (regno)) + return 1; + /* The GPRs can hold any mode, but values bigger than one register cannot go past R31. */ if (INT_REGNO_P (regno)) return INT_REGNO_P (last_regno); /* The float registers (except for VSX vector modes) can only hold floating - modes and DImode. This excludes the 32-bit decimal float mode for - now. */ + modes and DImode. */ if (FP_REGNO_P (regno)) { if (SCALAR_FLOAT_MODE_P (mode) @@ -1602,9 +1607,8 @@ rs6000_hard_regno_mode_ok (int regno, en if (SPE_SIMD_REGNO_P (regno) && TARGET_SPE && SPE_VECTOR_MODE (mode)) return 1; - /* We cannot put TImode anywhere except general register and it must be able - to fit within the register set. In the future, allow TImode in the - Altivec or VSX registers. */ + /* We cannot put non-VSX TImode or PTImode anywhere except general register + and it must be able to fit within the register set. */ return GET_MODE_SIZE (mode) <= UNITS_PER_WORD; } @@ -1721,6 +1725,7 @@ rs6000_debug_reg_global (void) SImode, DImode, TImode, + PTImode, SFmode, DFmode, TFmode, @@ -1801,6 +1806,7 @@ rs6000_debug_reg_global (void) "wg reg_class = %s\n" "wl reg_class = %s\n" "ws reg_class = %s\n" + "wt reg_class = %s\n" "wx reg_class = %s\n" "wz reg_class = %s\n" "\n", @@ -1813,6 +1819,7 @@ rs6000_debug_reg_global (void) reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wg]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wl]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_ws]], + reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wt]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wx]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wz]]); @@ -2043,6 +2050,9 @@ rs6000_debug_reg_global (void) if (TARGET_LINK_STACK) fprintf (stderr, DEBUG_FMT_S, "link_stack", "true"); + if (targetm.lra_p ()) + fprintf (stderr, DEBUG_FMT_S, "lra", "true"); + fprintf (stderr, DEBUG_FMT_S, "plt-format", TARGET_SECURE_PLT ? "secure" : "bss"); fprintf (stderr, DEBUG_FMT_S, "struct-return", @@ -2188,6 +2198,13 @@ rs6000_init_hard_regno_mode_ok (bool glo rs6000_vector_align[DFmode] = align64; } + /* Allow TImode in VSX register and set the VSX memory macros. */ + if (TARGET_VSX && TARGET_VSX_TIMODE) + { + rs6000_vector_mem[TImode] = VECTOR_VSX; + rs6000_vector_align[TImode] = align64; + } + /* TODO add SPE and paired floating point vector support. */ /* Register class constraints for the constraints that depend on compile @@ -2211,6 +2228,8 @@ rs6000_init_hard_regno_mode_ok (bool glo rs6000_constraints[RS6000_CONSTRAINT_ws] = (TARGET_VSX_SCALAR_MEMORY ? VSX_REGS : FLOAT_REGS); + if (TARGET_VSX_TIMODE) + rs6000_constraints[RS6000_CONSTRAINT_wt] = VSX_REGS; } /* Add conditional constraints based on various options, to allow us to @@ -2254,6 +2273,11 @@ rs6000_init_hard_regno_mode_ok (bool glo rs6000_vector_reload[DDmode][0] = CODE_FOR_reload_dd_di_store; rs6000_vector_reload[DDmode][1] = CODE_FOR_reload_dd_di_load; } + if (TARGET_VSX_TIMODE) + { + rs6000_vector_reload[TImode][0] = CODE_FOR_reload_ti_di_store; + rs6000_vector_reload[TImode][1] = CODE_FOR_reload_ti_di_load; + } } else { @@ -2276,6 +2300,11 @@ rs6000_init_hard_regno_mode_ok (bool glo rs6000_vector_reload[DDmode][0] = CODE_FOR_reload_dd_si_store; rs6000_vector_reload[DDmode][1] = CODE_FOR_reload_dd_si_load; } + if (TARGET_VSX_TIMODE) + { + rs6000_vector_reload[TImode][0] = CODE_FOR_reload_ti_si_store; + rs6000_vector_reload[TImode][1] = CODE_FOR_reload_ti_si_load; + } } } @@ -2784,6 +2813,13 @@ rs6000_option_override_internal (bool gl else if (TARGET_ALTIVEC) rs6000_isa_flags |= (OPTION_MASK_PPC_GFXOPT & ~rs6000_isa_flags_explicit); + if (TARGET_VSX_TIMODE && !TARGET_VSX) + { + if (rs6000_isa_flags_explicit & OPTION_MASK_VSX_TIMODE) + error ("-mvsx-timode requires -mvsx"); + rs6000_isa_flags &= ~OPTION_MASK_VSX_TIMODE; + } + if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET) rs6000_print_isa_options (stderr, 0, "after defaults", rs6000_isa_flags); @@ -5062,7 +5098,7 @@ invalid_e500_subreg (rtx op, enum machin purpose. */ if (GET_CODE (op) == SUBREG && (mode == SImode || mode == DImode || mode == TImode - || mode == DDmode || mode == TDmode) + || mode == DDmode || mode == TDmode || mode == PTImode) && REG_P (SUBREG_REG (op)) && (GET_MODE (SUBREG_REG (op)) == DFmode || GET_MODE (SUBREG_REG (op)) == TFmode)) @@ -5075,6 +5111,7 @@ invalid_e500_subreg (rtx op, enum machin && REG_P (SUBREG_REG (op)) && (GET_MODE (SUBREG_REG (op)) == DImode || GET_MODE (SUBREG_REG (op)) == TImode + || GET_MODE (SUBREG_REG (op)) == PTImode || GET_MODE (SUBREG_REG (op)) == DDmode || GET_MODE (SUBREG_REG (op)) == TDmode)) return true; @@ -5297,7 +5334,11 @@ reg_offset_addressing_ok_p (enum machine case V4SImode: case V2DFmode: case V2DImode: - /* AltiVec/VSX vector modes. Only reg+reg addressing is valid. */ + case TImode: + /* AltiVec/VSX vector modes. Only reg+reg addressing is valid. While + TImode is not a vector mode, if we want to use the VSX registers to + move it around, we need to restrict ourselves to reg+reg + addressing. */ if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)) return false; break; @@ -5550,7 +5591,7 @@ rs6000_legitimate_offset_address_p (enum /* If we are using VSX scalar loads, restrict ourselves to reg+reg addressing. */ - if (mode == DFmode && VECTOR_MEM_VSX_P (DFmode)) + if (VECTOR_MEM_VSX_P (mode)) return false; if (!worst_case) @@ -5564,6 +5605,7 @@ rs6000_legitimate_offset_address_p (enum case TFmode: case TDmode: case TImode: + case PTImode: if (TARGET_E500_DOUBLE) return (SPE_CONST_OFFSET_OK (offset) && SPE_CONST_OFFSET_OK (offset + 8)); @@ -5737,11 +5779,12 @@ rs6000_legitimize_address (rtx x, rtx ol case TFmode: case TDmode: case TImode: + case PTImode: /* As in legitimate_offset_address_p we do not assume worst-case. The mode here is just a hint as to the registers used. A TImode is usually in gprs, but may actually be in fprs. Leave worst-case scenario for reload to handle via - insn constraints. */ + insn constraints. PTImode is only GPRs. */ extra = 8; break; default: @@ -6472,7 +6515,7 @@ rs6000_legitimize_reload_address (rtx x, && !(TARGET_E500_DOUBLE && (mode == DFmode || mode == TFmode || mode == DDmode || mode == TDmode || mode == DImode)) - && VECTOR_MEM_NONE_P (mode)) + && (!VECTOR_MODE_P (mode) || VECTOR_MEM_NONE_P (mode))) { HOST_WIDE_INT val = INTVAL (XEXP (x, 1)); HOST_WIDE_INT low = ((val & 0xffff) ^ 0x8000) - 0x8000; @@ -6503,7 +6546,7 @@ rs6000_legitimize_reload_address (rtx x, if (GET_CODE (x) == SYMBOL_REF && reg_offset_p - && VECTOR_MEM_NONE_P (mode) + && (!VECTOR_MODE_P (mode) || VECTOR_MEM_NONE_P (mode)) && !SPE_VECTOR_MODE (mode) #if TARGET_MACHO && DEFAULT_ABI == ABI_DARWIN @@ -6529,6 +6572,8 @@ rs6000_legitimize_reload_address (rtx x, mem is sufficiently aligned. */ && mode != TFmode && mode != TDmode + && (mode != TImode || !TARGET_VSX_TIMODE) + && mode != PTImode && (mode != DImode || TARGET_POWERPC64) && ((mode != DFmode && mode != DDmode) || TARGET_POWERPC64 || (TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT))) @@ -6650,10 +6695,12 @@ rs6000_legitimate_address_p (enum machin if (legitimate_indirect_address_p (x, reg_ok_strict)) return 1; if ((GET_CODE (x) == PRE_INC || GET_CODE (x) == PRE_DEC) - && !VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) + && !ALTIVEC_OR_VSX_VECTOR_MODE (mode) && !SPE_VECTOR_MODE (mode) && mode != TFmode && mode != TDmode + && mode != TImode + && mode != PTImode /* Restrict addressing for DI because of our SUBREG hackery. */ && !(TARGET_E500_DOUBLE && (mode == DFmode || mode == DDmode || mode == DImode)) @@ -6678,26 +6725,28 @@ rs6000_legitimate_address_p (enum machin return 1; if (rs6000_legitimate_offset_address_p (mode, x, reg_ok_strict, false)) return 1; - if (mode != TImode - && mode != TFmode + if (mode != TFmode && mode != TDmode && ((TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT) || TARGET_POWERPC64 || (mode != DFmode && mode != DDmode) || (TARGET_E500_DOUBLE && mode != DDmode)) && (TARGET_POWERPC64 || mode != DImode) + && (mode != TImode || VECTOR_MEM_VSX_P (TImode)) + && mode != PTImode && !avoiding_indexed_address_p (mode) && legitimate_indexed_address_p (x, reg_ok_strict)) return 1; if (GET_CODE (x) == PRE_MODIFY && mode != TImode + && mode != PTImode && mode != TFmode && mode != TDmode && ((TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT) || TARGET_POWERPC64 || ((mode != DFmode && mode != DDmode) || TARGET_E500_DOUBLE)) && (TARGET_POWERPC64 || mode != DImode) - && !VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) + && !ALTIVEC_OR_VSX_VECTOR_MODE (mode) && !SPE_VECTOR_MODE (mode) /* Restrict addressing for DI because of our SUBREG hackery. */ && !(TARGET_E500_DOUBLE @@ -7140,7 +7189,7 @@ rs6000_emit_set_long_const (rtx dest, HO } /* Helper for the following. Get rid of [r+r] memory refs - in cases where it won't work (TImode, TFmode, TDmode). */ + in cases where it won't work (TImode, TFmode, TDmode, PTImode). */ static void rs6000_eliminate_indexed_memrefs (rtx operands[2]) @@ -7524,6 +7573,11 @@ rs6000_emit_move (rtx dest, rtx source, break; case TImode: + if (!VECTOR_MEM_VSX_P (TImode)) + rs6000_eliminate_indexed_memrefs (operands); + break; + + case PTImode: rs6000_eliminate_indexed_memrefs (operands); break; @@ -13893,7 +13947,7 @@ rs6000_secondary_reload (bool in_p, if (rclass == GENERAL_REGS || rclass == BASE_REGS) { if (!legitimate_indirect_address_p (addr, false) - && !rs6000_legitimate_offset_address_p (TImode, addr, + && !rs6000_legitimate_offset_address_p (PTImode, addr, false, true)) { sri->icode = icode; @@ -13903,8 +13957,20 @@ rs6000_secondary_reload (bool in_p, + ((GET_CODE (addr) == AND) ? 1 : 0)); } } - /* Loads to and stores from vector registers can only do reg+reg - addressing. Altivec registers can also do (reg+reg)&(-16). */ + /* Allow scalar loads to/from the traditional floating point + registers, even if VSX memory is set. */ + else if ((rclass == FLOAT_REGS || rclass == NO_REGS) + && (GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8) + && (legitimate_indirect_address_p (addr, false) + || legitimate_indirect_address_p (XEXP (addr, 0), false) + || rs6000_legitimate_offset_address_p (mode, addr, + false, true))) + + ; + /* Loads to and stores from vector registers can only do reg+reg + addressing. Altivec registers can also do (reg+reg)&(-16). Allow + scalar modes loading up the traditional floating point registers + to use offset addresses. */ else if (rclass == VSX_REGS || rclass == ALTIVEC_REGS || rclass == FLOAT_REGS || rclass == NO_REGS) { @@ -14130,7 +14196,7 @@ rs6000_secondary_reload_inner (rtx reg, if (GET_CODE (addr) == PLUS && (and_op2 != NULL_RTX - || !rs6000_legitimate_offset_address_p (TImode, addr, + || !rs6000_legitimate_offset_address_p (PTImode, addr, false, true))) { addr_op1 = XEXP (addr, 0); @@ -14164,7 +14230,7 @@ rs6000_secondary_reload_inner (rtx reg, scratch_or_premodify = scratch; } else if (!legitimate_indirect_address_p (addr, false) - && !rs6000_legitimate_offset_address_p (TImode, addr, + && !rs6000_legitimate_offset_address_p (PTImode, addr, false, true)) { if (TARGET_DEBUG_ADDR) @@ -14180,9 +14246,21 @@ rs6000_secondary_reload_inner (rtx reg, } break; - /* Float/Altivec registers can only handle reg+reg addressing. Move - other addresses into a scratch register. */ + /* Float registers can do offset+reg addressing for scalar types. */ case FLOAT_REGS: + if (legitimate_indirect_address_p (addr, false) /* reg */ + || legitimate_indexed_address_p (addr, false) /* reg+reg */ + || ((GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8) + && and_op2 == NULL_RTX + && scratch_or_premodify == scratch + && rs6000_legitimate_offset_address_p (mode, addr, false, false))) + break; + + /* If this isn't a legacy floating point load/store, fall through to the + VSX defaults. */ + + /* VSX/Altivec registers can only handle reg+reg addressing. Move other + addresses into a scratch register. */ case VSX_REGS: case ALTIVEC_REGS: @@ -14202,7 +14280,9 @@ rs6000_secondary_reload_inner (rtx reg, /* If we aren't using a VSX load, save the PRE_MODIFY register and use it as the address later. */ if (GET_CODE (addr) == PRE_MODIFY - && (!VECTOR_MEM_VSX_P (mode) + && ((ALTIVEC_OR_VSX_VECTOR_MODE (mode) + && (rclass != FLOAT_REGS + || (GET_MODE_SIZE (mode) != 4 && GET_MODE_SIZE (mode) != 8))) || and_op2 != NULL_RTX || !legitimate_indexed_address_p (XEXP (addr, 1), false))) { @@ -14218,16 +14298,12 @@ rs6000_secondary_reload_inner (rtx reg, if (legitimate_indirect_address_p (addr, false) /* reg */ || legitimate_indexed_address_p (addr, false) /* reg+reg */ - || GET_CODE (addr) == PRE_MODIFY /* VSX pre-modify */ || (GET_CODE (addr) == AND /* Altivec memory */ + && rclass == ALTIVEC_REGS && GET_CODE (XEXP (addr, 1)) == CONST_INT && INTVAL (XEXP (addr, 1)) == -16 - && VECTOR_MEM_ALTIVEC_P (mode)) - || (rclass == FLOAT_REGS /* legacy float mem */ - && GET_MODE_SIZE (mode) == 8 - && and_op2 == NULL_RTX - && scratch_or_premodify == scratch - && rs6000_legitimate_offset_address_p (mode, addr, false, false))) + && (legitimate_indirect_address_p (XEXP (addr, 0), false) + || legitimate_indexed_address_p (XEXP (addr, 0), false)))) ; else if (GET_CODE (addr) == PLUS) @@ -14254,7 +14330,8 @@ rs6000_secondary_reload_inner (rtx reg, } else if (GET_CODE (addr) == SYMBOL_REF || GET_CODE (addr) == CONST - || GET_CODE (addr) == CONST_INT || REG_P (addr)) + || GET_CODE (addr) == CONST_INT || GET_CODE (addr) == LO_SUM + || REG_P (addr)) { if (TARGET_DEBUG_ADDR) { @@ -14639,11 +14716,17 @@ rs6000_secondary_reload_class (enum reg_ return (mode != SDmode) ? NO_REGS : GENERAL_REGS; /* Memory, and FP/altivec registers can go into fp/altivec registers under - VSX. */ + VSX. However, for scalar variables, use the traditional floating point + registers so that we can use offset+register addressing. */ if (TARGET_VSX && (regno == -1 || VSX_REGNO_P (regno)) && VSX_REG_CLASS_P (rclass)) - return NO_REGS; + { + if (GET_MODE_SIZE (mode) < 16) + return FLOAT_REGS; + + return NO_REGS; + } /* Memory, and AltiVec registers can go into AltiVec registers. */ if ((regno == -1 || ALTIVEC_REGNO_P (regno)) @@ -14688,8 +14771,35 @@ rs6000_cannot_change_mode_class (enum ma if (from_size != to_size) { enum reg_class xclass = (TARGET_VSX) ? VSX_REGS : FLOAT_REGS; - return ((from_size < 8 || to_size < 8 || TARGET_IEEEQUAD) - && reg_classes_intersect_p (xclass, rclass)); + + if (reg_classes_intersect_p (xclass, rclass)) + { + unsigned to_nregs = hard_regno_nregs[FIRST_FPR_REGNO][to]; + unsigned from_nregs = hard_regno_nregs[FIRST_FPR_REGNO][from]; + + /* Don't allow 64-bit types to overlap with 128-bit types that take a + single register under VSX because the scalar part of the register + is in the upper 64-bits, and not the lower 64-bits. Types like + TFmode/TDmode that take 2 scalar register can overlap. 128-bit + IEEE floating point can't overlap, and neither can small + values. */ + + if (TARGET_IEEEQUAD && (to == TFmode || from == TFmode)) + return true; + + if (from_size < 8 || to_size < 8) + return true; + + if (from_size == 8 && (8 * to_nregs) != to_size) + return true; + + if (to_size == 8 && (8 * from_nregs) != from_size) + return true; + + return false; + } + else + return false; } if (TARGET_E500_DOUBLE @@ -14703,9 +14813,18 @@ rs6000_cannot_change_mode_class (enum ma /* Since the VSX register set includes traditional floating point registers and altivec registers, just check for the size being different instead of trying to check whether the modes are vector modes. Otherwise it won't - allow say DF and DI to change classes. */ + allow say DF and DI to change classes. For types like TFmode and TDmode + that take 2 64-bit registers, rather than a single 128-bit register, don't + allow subregs of those types to other 128 bit types. */ if (TARGET_VSX && VSX_REG_CLASS_P (rclass)) - return (from_size != 8 && from_size != 16); + { + unsigned num_regs = (from_size + 15) / 16; + if (hard_regno_nregs[FIRST_FPR_REGNO][to] > num_regs + || hard_regno_nregs[FIRST_FPR_REGNO][from] > num_regs) + return true; + + return (from_size != 8 && from_size != 16); + } if (TARGET_ALTIVEC && rclass == ALTIVEC_REGS && (ALTIVEC_VECTOR_MODE (from) + ALTIVEC_VECTOR_MODE (to)) == 1) @@ -15440,7 +15559,7 @@ print_operand (FILE *file, rtx x, int co return; case 'Y': - /* Like 'L', for third word of TImode */ + /* Like 'L', for third word of TImode/PTImode */ if (REG_P (x)) fputs (reg_names[REGNO (x) + 2], file); else if (MEM_P (x)) @@ -15490,7 +15609,7 @@ print_operand (FILE *file, rtx x, int co return; case 'Z': - /* Like 'L', for last word of TImode. */ + /* Like 'L', for last word of TImode/PTImode. */ if (REG_P (x)) fputs (reg_names[REGNO (x) + 3], file); else if (MEM_P (x)) @@ -15521,7 +15640,8 @@ print_operand (FILE *file, rtx x, int co if ((TARGET_SPE || TARGET_E500_DOUBLE) && (GET_MODE_SIZE (GET_MODE (x)) == 8 || GET_MODE (x) == TFmode - || GET_MODE (x) == TImode)) + || GET_MODE (x) == TImode + || GET_MODE (x) == PTImode)) { /* Handle [reg]. */ if (REG_P (tmp)) @@ -26574,7 +26694,7 @@ rs6000_register_move_cost (enum machine_ } /* If we have VSX, we can easily move between FPR or Altivec registers. */ - else if (VECTOR_UNIT_VSX_P (mode) + else if (VECTOR_MEM_VSX_P (mode) && reg_classes_intersect_p (to, VSX_REGS) && reg_classes_intersect_p (from, VSX_REGS)) ret = 2 * hard_regno_nregs[32][mode]; @@ -26615,7 +26735,8 @@ rs6000_memory_move_cost (enum machine_mo if (reg_classes_intersect_p (rclass, GENERAL_REGS)) ret = 4 * hard_regno_nregs[0][mode]; - else if (reg_classes_intersect_p (rclass, FLOAT_REGS)) + else if ((reg_classes_intersect_p (rclass, FLOAT_REGS) + || reg_classes_intersect_p (rclass, VSX_REGS))) ret = 4 * hard_regno_nregs[32][mode]; else if (reg_classes_intersect_p (rclass, ALTIVEC_REGS)) ret = 4 * hard_regno_nregs[FIRST_ALTIVEC_REGNO][mode]; @@ -27829,6 +27950,7 @@ static struct rs6000_opt_mask const rs60 { "recip-precision", OPTION_MASK_RECIP_PRECISION, false, true }, { "string", OPTION_MASK_STRING, false, true }, { "vsx", OPTION_MASK_VSX, false, true }, + { "vsx-timode", OPTION_MASK_VSX_TIMODE, false, true }, #ifdef OPTION_MASK_64BIT #if TARGET_AIX_OS { "aix64", OPTION_MASK_64BIT, false, false }, Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 195592) +++ gcc/config/rs6000/vsx.md (working copy) @@ -48,7 +48,7 @@ (define_mode_attr VSm [(V16QI "vw4") (V2DF "vd2") (V2DI "vd2") (DF "d") - (TI "vw4")]) + (TI "vd2")]) ;; Map into the appropriate suffix based on the type (define_mode_attr VSs [(V16QI "sp") @@ -59,7 +59,7 @@ (define_mode_attr VSs [(V16QI "sp") (V2DI "dp") (DF "dp") (SF "sp") - (TI "sp")]) + (TI "dp")]) ;; Map the register class used (define_mode_attr VSr [(V16QI "v") @@ -70,7 +70,7 @@ (define_mode_attr VSr [(V16QI "v") (V2DF "wd") (DF "ws") (SF "d") - (TI "wd")]) + (TI "wt")]) ;; Map the register class used for float<->int conversions (define_mode_attr VSr2 [(V2DF "wd") @@ -115,7 +115,6 @@ (define_mode_attr VSv [(V16QI "v") (V4SF "v") (V2DI "v") (V2DF "v") - (TI "v") (DF "s")]) ;; Appropriate type for add ops (and other simple FP ops) @@ -268,12 +267,13 @@ (define_insn "*vsx_mov" } [(set_attr "type" "vecstore,vecload,vecsimple,vecstore,vecload,vecsimple,*,*,*,vecsimple,vecsimple,*,vecstore,vecload")]) -;; Unlike other VSX moves, allow the GPRs, since a normal use of TImode is for -;; unions. However for plain data movement, slightly favor the vector loads -(define_insn "*vsx_movti" - [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,?Y,?r,?r,wa,v,v,wZ") - (match_operand:TI 1 "input_operand" "wa,Z,wa,r,Y,r,j,W,wZ,v"))] - "VECTOR_MEM_VSX_P (TImode) +;; Unlike other VSX moves, allow the GPRs even for reloading, since a normal +;; use of TImode is for unions. However for plain data movement, slightly +;; favor the vector loads +(define_insn "*vsx_movti_64bit" + [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,v, v,wZ,?Y,?r,?r") + (match_operand:TI 1 "input_operand" "wa, Z,wa, j,W,wZ, v, r, Y, r"))] + "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (TImode) && (register_operand (operands[0], TImode) || register_operand (operands[1], TImode))" { @@ -289,27 +289,87 @@ (define_insn "*vsx_movti" return "xxlor %x0,%x1,%x1"; case 3: + return "xxlxor %x0,%x0,%x0"; + case 4: + return output_vec_const_move (operands); + case 5: - return "#"; + return "stvx %1,%y0"; case 6: - return "xxlxor %x0,%x0,%x0"; + return "lvx %0,%y1"; case 7: + case 8: + case 9: + return "#"; + + default: + gcc_unreachable (); + } +} + [(set_attr "type" "vecstore,vecload,vecsimple,vecsimple,vecsimple,vecstore,vecload,*,*,*") + (set_attr "length" " 4, 4, 4, 4, 8, 4, 4,8,8,8")]) + +(define_insn "*vsx_movti_32bit" + [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,v, v,wZ,Q,Y,????r,????r,????r,r") + (match_operand:TI 1 "input_operand" "wa, Z,wa, j,W,wZ, v,r,r, Q, Y, r,n"))] + "! TARGET_POWERPC64 && VECTOR_MEM_VSX_P (TImode) + && (register_operand (operands[0], TImode) + || register_operand (operands[1], TImode))" +{ + switch (which_alternative) + { + case 0: + return "stxvd2x %x1,%y0"; + + case 1: + return "lxvd2x %x0,%y1"; + + case 2: + return "xxlor %x0,%x1,%x1"; + + case 3: + return "xxlxor %x0,%x0,%x0"; + + case 4: return output_vec_const_move (operands); - case 8: + case 5: return "stvx %1,%y0"; - case 9: + case 6: return "lvx %0,%y1"; + case 7: + if (TARGET_STRING) + return \"stswi %1,%P0,16\"; + + case 8: + return \"#\"; + + case 9: + /* If the address is not used in the output, we can use lsi. Otherwise, + fall through to generating four loads. */ + if (TARGET_STRING + && ! reg_overlap_mentioned_p (operands[0], operands[1])) + return \"lswi %0,%P1,16\"; + /* ... fall through ... */ + + case 10: + case 11: + case 12: + return \"#\"; default: gcc_unreachable (); } } - [(set_attr "type" "vecstore,vecload,vecsimple,*,*,*,vecsimple,*,vecstore,vecload")]) + [(set_attr "type" "vecstore,vecload,vecsimple,vecsimple,vecsimple,vecstore,vecload,store_ux,store_ux,load_ux,load_ux, *, *") + (set_attr "length" " 4, 4, 4, 4, 8, 4, 4, 16, 16, 16, 16,16,16") + (set (attr "cell_micro") (if_then_else (match_test "TARGET_STRING") + (const_string "always") + (const_string "conditional")))]) ;; Explicit load/store expanders for the builtin functions (define_expand "vsx_load_" @@ -319,8 +379,8 @@ (define_expand "vsx_load_" "") (define_expand "vsx_store_" - [(set (match_operand:VEC_M 0 "memory_operand" "") - (match_operand:VEC_M 1 "vsx_register_operand" ""))] + [(set (match_operand:VSX_M 0 "memory_operand" "") + (match_operand:VSX_M 1 "vsx_register_operand" ""))] "VECTOR_MEM_VSX_P (mode)" "") @@ -1026,38 +1086,46 @@ (define_insn "*vsx_float_fix_2" (set_attr "fp_type" "")]) -;; Logical and permute operations +;; Logical operations +;; Do not support TImode logical instructions on 32-bit at present, because the +;; compiler will see that we have a TImode and when it wanted DImode, and +;; convert the DImode to TImode, store it on the stack, and load it in a VSX +;; register. (define_insn "*vsx_and3" [(set (match_operand:VSX_L 0 "vsx_register_operand" "=,?wa") (and:VSX_L - (match_operand:VSX_L 1 "vsx_register_operand" ",?wa") - (match_operand:VSX_L 2 "vsx_register_operand" ",?wa")))] - "VECTOR_MEM_VSX_P (mode)" + (match_operand:VSX_L 1 "vsx_register_operand" ",wa") + (match_operand:VSX_L 2 "vsx_register_operand" ",wa")))] + "VECTOR_MEM_VSX_P (mode) + && (mode != TImode || TARGET_POWERPC64)" "xxland %x0,%x1,%x2" [(set_attr "type" "vecsimple")]) (define_insn "*vsx_ior3" [(set (match_operand:VSX_L 0 "vsx_register_operand" "=,?wa") - (ior:VSX_L (match_operand:VSX_L 1 "vsx_register_operand" ",?wa") - (match_operand:VSX_L 2 "vsx_register_operand" ",?wa")))] - "VECTOR_MEM_VSX_P (mode)" + (ior:VSX_L (match_operand:VSX_L 1 "vsx_register_operand" ",wa") + (match_operand:VSX_L 2 "vsx_register_operand" ",wa")))] + "VECTOR_MEM_VSX_P (mode) + && (mode != TImode || TARGET_POWERPC64)" "xxlor %x0,%x1,%x2" [(set_attr "type" "vecsimple")]) (define_insn "*vsx_xor3" [(set (match_operand:VSX_L 0 "vsx_register_operand" "=,?wa") (xor:VSX_L - (match_operand:VSX_L 1 "vsx_register_operand" ",?wa") - (match_operand:VSX_L 2 "vsx_register_operand" ",?wa")))] - "VECTOR_MEM_VSX_P (mode)" + (match_operand:VSX_L 1 "vsx_register_operand" ",wa") + (match_operand:VSX_L 2 "vsx_register_operand" ",wa")))] + "VECTOR_MEM_VSX_P (mode) + && (mode != TImode || TARGET_POWERPC64)" "xxlxor %x0,%x1,%x2" [(set_attr "type" "vecsimple")]) (define_insn "*vsx_one_cmpl2" [(set (match_operand:VSX_L 0 "vsx_register_operand" "=,?wa") (not:VSX_L - (match_operand:VSX_L 1 "vsx_register_operand" ",?wa")))] - "VECTOR_MEM_VSX_P (mode)" + (match_operand:VSX_L 1 "vsx_register_operand" ",wa")))] + "VECTOR_MEM_VSX_P (mode) + && (mode != TImode || TARGET_POWERPC64)" "xxlnor %x0,%x1,%x1" [(set_attr "type" "vecsimple")]) @@ -1067,7 +1135,8 @@ (define_insn "*vsx_nor3" (ior:VSX_L (match_operand:VSX_L 1 "vsx_register_operand" ",?wa") (match_operand:VSX_L 2 "vsx_register_operand" ",?wa"))))] - "VECTOR_MEM_VSX_P (mode)" + "VECTOR_MEM_VSX_P (mode) + && (mode != TImode || TARGET_POWERPC64)" "xxlnor %x0,%x1,%x2" [(set_attr "type" "vecsimple")]) @@ -1077,7 +1146,8 @@ (define_insn "*vsx_andc3" (not:VSX_L (match_operand:VSX_L 2 "vsx_register_operand" ",?wa")) (match_operand:VSX_L 1 "vsx_register_operand" ",?wa")))] - "VECTOR_MEM_VSX_P (mode)" + "VECTOR_MEM_VSX_P (mode) + && (mode != TImode || TARGET_POWERPC64)" "xxlandc %x0,%x1,%x2" [(set_attr "type" "vecsimple")]) @@ -1086,11 +1156,10 @@ (define_insn "*vsx_andc3" ;; Build a V2DF/V2DI vector from two scalars (define_insn "vsx_concat_" - [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?wa") - (unspec:VSX_D - [(match_operand: 1 "vsx_register_operand" "ws,wa") - (match_operand: 2 "vsx_register_operand" "ws,wa")] - UNSPEC_VSX_CONCAT))] + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=,?wa") + (vec_concat:VSX_D + (match_operand: 1 "vsx_register_operand" "ws,wa") + (match_operand: 2 "vsx_register_operand" "ws,wa")))] "VECTOR_MEM_VSX_P (mode)" "xxpermdi %x0,%x1,%x2,0" [(set_attr "type" "vecperm")]) @@ -1212,8 +1281,8 @@ (define_expand "vsx_xxpermdi_" if (mode != V2DImode) { target = gen_lowpart (V2DImode, target); - op0 = gen_lowpart (V2DImode, target); - op1 = gen_lowpart (V2DImode, target); + op0 = gen_lowpart (V2DImode, op0); + op1 = gen_lowpart (V2DImode, op1); } } emit_insn (gen (target, op0, op1, perm0, perm1)); Index: gcc/config/rs6000/rs6000.h =================================================================== --- gcc/config/rs6000/rs6000.h (revision 195592) +++ gcc/config/rs6000/rs6000.h (working copy) @@ -504,6 +504,7 @@ extern int rs6000_vector_align[]; #define MASK_STRING OPTION_MASK_STRING #define MASK_UPDATE OPTION_MASK_UPDATE #define MASK_VSX OPTION_MASK_VSX +#define MASK_VSX_TIMODE OPTION_MASK_VSX_TIMODE #ifndef IN_LIBGCC2 #define MASK_POWERPC64 OPTION_MASK_POWERPC64 @@ -1328,6 +1329,7 @@ enum r6000_reg_class_enum { RS6000_CONSTRAINT_wf, /* VSX register for V4SF */ RS6000_CONSTRAINT_wl, /* FPR register for LFIWAX */ RS6000_CONSTRAINT_ws, /* VSX register for DF */ + RS6000_CONSTRAINT_wt, /* VSX register for TImode */ RS6000_CONSTRAINT_wx, /* FPR register for STFIWX */ RS6000_CONSTRAINT_wz, /* FPR register for LFIWZX */ RS6000_CONSTRAINT_MAX @@ -1514,7 +1516,7 @@ extern enum reg_class rs6000_constraints NONLOCAL needs twice Pmode to maintain both backchain and SP. */ #define STACK_SAVEAREA_MODE(LEVEL) \ (LEVEL == SAVE_FUNCTION ? VOIDmode \ - : LEVEL == SAVE_NONLOCAL ? (TARGET_32BIT ? DImode : TImode) : Pmode) + : LEVEL == SAVE_NONLOCAL ? (TARGET_32BIT ? DImode : PTImode) : Pmode) /* Minimum and maximum general purpose registers used to hold arguments. */ #define GP_ARG_MIN_REG 3 Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 195625) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -219,7 +219,7 @@ (define_attr "cell_micro" "not,condition (define_mode_iterator GPR [SI (DI "TARGET_POWERPC64")]) ; Any supported integer mode. -(define_mode_iterator INT [QI HI SI DI TI]) +(define_mode_iterator INT [QI HI SI DI TI PTI]) ; Any supported integer mode that fits in one register. (define_mode_iterator INT1 [QI HI SI (DI "TARGET_POWERPC64")]) @@ -234,6 +234,10 @@ (define_mode_iterator SDI [SI DI]) ; (one with a '.') will compare; and the size used for arithmetic carries. (define_mode_iterator P [(SI "TARGET_32BIT") (DI "TARGET_64BIT")]) +; Iterator to add PTImode along with TImode (TImode can go in VSX registers, +; PTImode is GPR only) +(define_mode_iterator TI2 [TI PTI]) + ; Any hardware-supported floating-point mode (define_mode_iterator FP [ (SF "TARGET_HARD_FLOAT @@ -304,7 +308,14 @@ (define_code_attr return_str [(return "" ; Various instructions that come in SI and DI forms. ; A generic w/d attribute, for things like cmpw/cmpd. -(define_mode_attr wd [(QI "b") (HI "h") (SI "w") (DI "d")]) +(define_mode_attr wd [(QI "b") + (HI "h") + (SI "w") + (DI "d") + (V16QI "b") + (V8HI "h") + (V4SI "w") + (V2DI "d")]) ; DImode bits (define_mode_attr dbits [(QI "56") (HI "48") (SI "32")]) @@ -8635,14 +8646,16 @@ (define_split FAIL; }") -;; TImode is similar, except that we usually want to compute the address into -;; a register and use lsi/stsi (the exception is during reload). +;; TImode/PTImode is similar, except that we usually want to compute the +;; address into a register and use lsi/stsi (the exception is during reload). -(define_insn "*movti_string" - [(set (match_operand:TI 0 "reg_or_mem_operand" "=Q,Y,????r,????r,????r,r") - (match_operand:TI 1 "input_operand" "r,r,Q,Y,r,n"))] +(define_insn "*mov_string" + [(set (match_operand:TI2 0 "reg_or_mem_operand" "=Q,Y,????r,????r,????r,r") + (match_operand:TI2 1 "input_operand" "r,r,Q,Y,r,n"))] "! TARGET_POWERPC64 - && (gpc_reg_operand (operands[0], TImode) || gpc_reg_operand (operands[1], TImode))" + && (mode != TImode || VECTOR_MEM_NONE_P (TImode)) + && (gpc_reg_operand (operands[0], mode) + || gpc_reg_operand (operands[1], mode))" "* { switch (which_alternative) @@ -8672,27 +8685,28 @@ (define_insn "*movti_string" (const_string "always") (const_string "conditional")))]) -(define_insn "*movti_ppc64" - [(set (match_operand:TI 0 "nonimmediate_operand" "=Y,r,r") - (match_operand:TI 1 "input_operand" "r,Y,r"))] - "(TARGET_POWERPC64 && (gpc_reg_operand (operands[0], TImode) - || gpc_reg_operand (operands[1], TImode))) - && VECTOR_MEM_NONE_P (TImode)" +(define_insn "*mov_ppc64" + [(set (match_operand:TI2 0 "nonimmediate_operand" "=Y,r,r") + (match_operand:TI2 1 "input_operand" "r,Y,r"))] + "(TARGET_POWERPC64 + && (mode != TImode || VECTOR_MEM_NONE_P (TImode)) + && (gpc_reg_operand (operands[0], mode) + || gpc_reg_operand (operands[1], mode)))" "#" [(set_attr "type" "store,load,*")]) (define_split - [(set (match_operand:TI 0 "gpc_reg_operand" "") - (match_operand:TI 1 "const_double_operand" ""))] - "TARGET_POWERPC64 && VECTOR_MEM_NONE_P (TImode)" + [(set (match_operand:TI2 0 "gpc_reg_operand" "") + (match_operand:TI2 1 "const_double_operand" ""))] + "TARGET_POWERPC64" [(set (match_dup 2) (match_dup 4)) (set (match_dup 3) (match_dup 5))] " { operands[2] = operand_subword_force (operands[0], WORDS_BIG_ENDIAN == 0, - TImode); + mode); operands[3] = operand_subword_force (operands[0], WORDS_BIG_ENDIAN != 0, - TImode); + mode); if (GET_CODE (operands[1]) == CONST_DOUBLE) { operands[4] = GEN_INT (CONST_DOUBLE_HIGH (operands[1])); @@ -8708,9 +8722,9 @@ (define_split }") (define_split - [(set (match_operand:TI 0 "nonimmediate_operand" "") - (match_operand:TI 1 "input_operand" ""))] - "reload_completed && VECTOR_MEM_NONE_P (TImode) + [(set (match_operand:TI2 0 "nonimmediate_operand" "") + (match_operand:TI2 1 "input_operand" ""))] + "reload_completed && gpr_or_gpr_p (operands[0], operands[1])" [(pc)] { rs6000_split_multireg_move (operands[0], operands[1]); DONE; }) Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi (revision 195592) +++ gcc/doc/md.texi (working copy) @@ -2084,6 +2084,9 @@ If the LFIWAX instruction is enabled, a @item ws VSX vector register to hold scalar float data +@item wt +VSX vector register to hold 128 bit integer + @item wx If the STFIWX instruction is enabled, a floating point register