Message ID | 20150729200428.GA30347@ibm-tiger.the-meissners.org |
---|---|
State | New |
Headers | show |
On Wed, Jul 29, 2015 at 04:04:28PM -0400, Michael Meissner wrote: > +;; Return constant 0x80000000000000000000000000000000 in an Altivec register. > + > +(define_expand "altivec_high_bit" > + [(set (match_dup 1) > + (vec_duplicate:V16QI (const_int 7))) > + (set (match_dup 2) > + (ashift:V16QI (match_dup 1) > + (match_dup 1))) > + (set (match_dup 3) > + (match_dup 4)) > + (set (match_operand:V16QI 0 "register_operand" "") > + (unspec:V16QI [(match_dup 2) > + (match_dup 3) > + (const_int 15)] UNSPEC_VSLDOI))] > + "TARGET_ALTIVEC" > +{ > + if (can_create_pseudo_p ()) > + { > + operands[1] = gen_reg_rtx (V16QImode); > + operands[2] = gen_reg_rtx (V16QImode); > + operands[3] = gen_reg_rtx (V16QImode); > + } > + else > + operands[1] = operands[2] = operands[3] = operands[0]; This won't work (in the pattern you write to op 3 before reading from op 2). Do you ever call this expander late, anyway? Segher
On Wed, Jul 29, 2015 at 04:59:23PM -0500, Segher Boessenkool wrote: > On Wed, Jul 29, 2015 at 04:04:28PM -0400, Michael Meissner wrote: > > +;; Return constant 0x80000000000000000000000000000000 in an Altivec register. > > + > > +(define_expand "altivec_high_bit" > > + [(set (match_dup 1) > > + (vec_duplicate:V16QI (const_int 7))) > > + (set (match_dup 2) > > + (ashift:V16QI (match_dup 1) > > + (match_dup 1))) > > + (set (match_dup 3) > > + (match_dup 4)) > > + (set (match_operand:V16QI 0 "register_operand" "") > > + (unspec:V16QI [(match_dup 2) > > + (match_dup 3) > > + (const_int 15)] UNSPEC_VSLDOI))] > > + "TARGET_ALTIVEC" > > +{ > > + if (can_create_pseudo_p ()) > > + { > > + operands[1] = gen_reg_rtx (V16QImode); > > + operands[2] = gen_reg_rtx (V16QImode); > > + operands[3] = gen_reg_rtx (V16QImode); > > + } > > + else > > + operands[1] = operands[2] = operands[3] = operands[0]; > > This won't work (in the pattern you write to op 3 before reading from op 2). > Do you ever call this expander late, anyway? I'm not sure I follow you. Without the patch lines the insns are as follows (I put in blank lines to separate the insns): (define_expand "altivec_high_bit" [(set (match_dup 1) (vec_duplicate:V16QI (const_int 7))) (set (match_dup 2) (ashift:V16QI (match_dup 1) (match_dup 1))) (set (match_dup 3) (match_dup 4)) (set (match_operand:V16QI 0 "register_operand" "") (unspec:V16QI [(match_dup 2) (match_dup 3) (const_int 15)] UNSPEC_VSLDOI))] "TARGET_ALTIVEC" { if (can_create_pseudo_p ()) { operands[1] = gen_reg_rtx (V16QImode); operands[2] = gen_reg_rtx (V16QImode); operands[3] = gen_reg_rtx (V16QImode); } else operands[1] = operands[2] = operands[3] = operands[0]; operands[4] = CONST0_RTX (V16QImode); }) The first insn sets operands[1] to be 0x07070707070707070707070707070707LL. The second insn sets operands[2] to be operands[1] << operands[1], i.e. 0x80808080808080808080808080808080LL. The third insn sets operands[3] to be 0. The fourth does a double vector shift left 15 bytes, filing in 0's in the bottom bits, which leaves the following in the register: 0x80000000000000000000000000000000LL This is negative -0.0 in IEEE 128-bit, which is used to flip the sign bit. The code is used for negate and absolute value (which is done during rtl expansion). Here is the negate use case. (define_insn_and_split "ieee_128bit_vsx_neg<mode>2" [(set (match_operand:TFIFKF 0 "register_operand" "=wa") (neg:TFIFKF (match_operand:TFIFKF 1 "register_operand" "wa"))) (clobber (match_scratch:V16QI 2 "=v"))] "TARGET_FLOAT128 && FLOAT128_IEEE_P (<MODE>mode)" "#" "" [(parallel [(set (match_dup 0) (neg:TFIFKF (match_dup 1))) (use (match_dup 2))])] { if (GET_CODE (operands[2]) == SCRATCH) operands[2] = gen_reg_rtx (V16QImode); operands[3] = gen_reg_rtx (V16QImode); emit_insn (gen_altivec_high_bit (operands[2])); } [(set_attr "length" "8") (set_attr "type" "vecsimple")]) (define_insn "*ieee_128bit_vsx_neg<mode>2_internal" [(set (match_operand:TFIFKF 0 "register_operand" "=wa") (neg:TFIFKF (match_operand:TFIFKF 1 "register_operand" "wa"))) (use (match_operand:V16QI 2 "register_operand" "=v"))] "TARGET_FLOAT128" "xxlxor %x0,%x1,%x2" [(set_attr "length" "4") (set_attr "type" "vecsimple")])
On Wed, Jul 29, 2015 at 06:38:45PM -0400, Michael Meissner wrote: > On Wed, Jul 29, 2015 at 04:59:23PM -0500, Segher Boessenkool wrote: > > On Wed, Jul 29, 2015 at 04:04:28PM -0400, Michael Meissner wrote: > > > +;; Return constant 0x80000000000000000000000000000000 in an Altivec register. > > > + > > > +(define_expand "altivec_high_bit" > > > + [(set (match_dup 1) > > > + (vec_duplicate:V16QI (const_int 7))) > > > + (set (match_dup 2) > > > + (ashift:V16QI (match_dup 1) > > > + (match_dup 1))) > > > + (set (match_dup 3) > > > + (match_dup 4)) > > > + (set (match_operand:V16QI 0 "register_operand" "") > > > + (unspec:V16QI [(match_dup 2) > > > + (match_dup 3) > > > + (const_int 15)] UNSPEC_VSLDOI))] > > > + "TARGET_ALTIVEC" > > > +{ > > > + if (can_create_pseudo_p ()) > > > + { > > > + operands[1] = gen_reg_rtx (V16QImode); > > > + operands[2] = gen_reg_rtx (V16QImode); > > > + operands[3] = gen_reg_rtx (V16QImode); > > > + } > > > + else > > > + operands[1] = operands[2] = operands[3] = operands[0]; > > > > This won't work (in the pattern you write to op 3 before reading from op 2). > > Do you ever call this expander late, anyway? > > I'm not sure I follow you. I'm sorry, I meant that very last line I quoted, the !can_create_pseudo_p () one. If that is executed operands[2] will be the same reg as operands[3], and things fall apart. Segher
On Wed, Jul 29, 2015 at 05:46:42PM -0500, Segher Boessenkool wrote: > On Wed, Jul 29, 2015 at 06:38:45PM -0400, Michael Meissner wrote: > > On Wed, Jul 29, 2015 at 04:59:23PM -0500, Segher Boessenkool wrote: > > > On Wed, Jul 29, 2015 at 04:04:28PM -0400, Michael Meissner wrote: > > > > +;; Return constant 0x80000000000000000000000000000000 in an Altivec register. > > > > + > > > > +(define_expand "altivec_high_bit" > > > > + [(set (match_dup 1) > > > > + (vec_duplicate:V16QI (const_int 7))) > > > > + (set (match_dup 2) > > > > + (ashift:V16QI (match_dup 1) > > > > + (match_dup 1))) > > > > + (set (match_dup 3) > > > > + (match_dup 4)) > > > > + (set (match_operand:V16QI 0 "register_operand" "") > > > > + (unspec:V16QI [(match_dup 2) > > > > + (match_dup 3) > > > > + (const_int 15)] UNSPEC_VSLDOI))] > > > > + "TARGET_ALTIVEC" > > > > +{ > > > > + if (can_create_pseudo_p ()) > > > > + { > > > > + operands[1] = gen_reg_rtx (V16QImode); > > > > + operands[2] = gen_reg_rtx (V16QImode); > > > > + operands[3] = gen_reg_rtx (V16QImode); > > > > + } > > > > + else > > > > + operands[1] = operands[2] = operands[3] = operands[0]; > > > > > > This won't work (in the pattern you write to op 3 before reading from op 2). > > > Do you ever call this expander late, anyway? > > > > I'm not sure I follow you. > > I'm sorry, I meant that very last line I quoted, the !can_create_pseudo_p () > one. If that is executed operands[2] will be the same reg as operands[3], > and things fall apart. Yes, you are right. I'll put an abort in there if we can't allocate pseudos. But since it is called during RTL expansion of abskf2, negkf2, etc. we won't run into it. Thanks.
On Wed, 29 Jul 2015, Michael Meissner wrote: > #6 Add support for using different names for the 64/128-bit integer > conversion to IBM extended double, to allow a future version to > switch the default for what long double is. It is not expected that GCC > 6.x will make this switch, but we would like to eventually use the > standard TF names for the library when the default change is made. If > this isn't clear, the following names use 'tf' in them, when they use > IBM extended double: That would be for a completely separate ABI, incompatible with all existing objects both static and shared, since the existing ABI defines these names to have their existing meanings?
On Fri, Jul 31, 2015 at 12:43:20AM +0000, Joseph Myers wrote: > On Wed, 29 Jul 2015, Michael Meissner wrote: > > > #6 Add support for using different names for the 64/128-bit integer > > conversion to IBM extended double, to allow a future version to > > switch the default for what long double is. It is not expected that GCC > > 6.x will make this switch, but we would like to eventually use the > > standard TF names for the library when the default change is made. If > > this isn't clear, the following names use 'tf' in them, when they use > > IBM extended double: > > That would be for a completely separate ABI, incompatible with all > existing objects both static and shared, since the existing ABI defines > these names to have their existing meanings? The OpenPower 1.1 ABI for little endian PowerPC 64-bit does not mention the names at all. This ABI leaves it open whether long double is IBM extended double or IEEE 128-bit floating point. One of the goals of these patches is someday in the future change the default. A lot of work, particularly in the library space needs to be done to change the default. Until we can make the switch, users that want IEEE 128-bit support will need to use __float128. The intention of theese changes (currently unwritten) is to change the existing problematical names that use TF in their name to be something else, and provide via a weak reference an alias for the old name. So if for example, we change the default in GCC 7.0, code compiled by GCC 6.0 would work because it uses say __gcc_ltoq to call convert a 64-bit integer to IBM extended double instead of __floatditf. Older code that refers to __floatditf would still work fine. Then sometime later (such as GCC 8.0) we could make __floatditf be a weak reference to __floatdikt. If you have any ideas of how to do this seemlessly, please let me know. Steve Monroe and David Edelsohn requested that I explore that some date in the future, we will be able to use the standard names. I tend to be skeptical that we can do it without running into some incompatibility, and I feel that we just have to live with the existing TF names, and not use TF for IEEE 128-bit. Currently, I'm using KF for the IEEE 128-bit functions, even if long double is mapped to IEEE 128-bit instead of long double. Another wrinkle is the 32-bit RTEMS port actually uses IEEE 128-bit floating point with the standard names, because they never used the IBM extended double.
On Mon, 3 Aug 2015, Michael Meissner wrote: > The intention of theese changes (currently unwritten) is to change the existing > problematical names that use TF in their name to be something else, and provide > via a weak reference an alias for the old name. So if for example, we change > the default in GCC 7.0, code compiled by GCC 6.0 would work because it uses say > __gcc_ltoq to call convert a 64-bit integer to IBM extended double instead of > __floatditf. Older code that refers to __floatditf would still work fine. But as those names are in the implementation namespace and aren't user-visible, I don't see the point in such a change (other than as part of a complete new incompatible ABI which gets its own copies of libgcc, libstdc++, libc, libm and other affected libraries). Of course it *can* be done via symbol versioning to keep working with existing binaries / shared libraries using the existing symbols from libgcc, but the libgcc build system doesn't make that sort of target-specific symbol versioning particularly convenient. And as usual with symbol versioning, you'd break compatibility with existing static libraries / .o files.
There are 3 patches left in the basic IEEE 128-bit floating point support for the compiler. I will submit these at the same time. They are split to make the review process similar. Patch #5 and #6 are indpendent of each other and can be applied in either order. Patch #7 assumes that patches 1-6 have been applied. Patch #5 adds the following: * Support for the reload handlers that will be enabled in patch #7. * Adds IFmode/KFmode to other iterators as appropriate. * Adds the basic negate, absolute value, and negative absolute value support. * Adds the insns for the 128-bit pack/unpack routines. Patch #6 adds the following: * Adds support for comparisons. * Updates the cannot change mode support. Patch #7 finishes up the initial basic support. * It defines macros for IEEE 128-bit floating point users. * It defines the basic move support. * It sets up the calling sequence. * It registers the __float128 and __ibm128 keywords. * It sets up the various handler functions. * It adds 'q' and 'Q' as the suffix for IEEE 128-bit floating point. * It adds target attribute/pragma support for the IEEE 128-bit options. * It treats IEEE 128-bit in VSX register modes as vector. * It uses a unique mangling for IEEE 128-bit in VSX registers. * It moves vector modes tieable above scalar floating point. * It adds a simple minded test to make sure IEEE args are passed as vectors. Things to be done: * Work with GDB to add debug support. * Work with GLIBC to add basic software emulation support. * Work with GLIBC on other IEEE 128-bit support. * Look into Complex support. * Look into libquadmath support. * Enable -mfloat128-software if -mvsx. * Add more tests. * Fix bugs that show up if -mabi=ieeelongdouble is used. Each patch bootstraps without error and has no regressions. Are they ok to install in the trunk? This is patch #6: 2015-08-13 Michael Meissner <meissner@linux.vnet.ibm.com> * config/rs6000/rs6000-protos.h (rs6000_expand_float128_convert): Add declaration. * config/rs6000/rs6000.c (rs6000_emit_le_vsx_store): Fix a comment. (rs6000_cannot_change_mode_class): Add support for IEEE 128-bit floating point in VSX registers. (rs6000_output_move_128bit): Always print out the set insn if we can't generate an appropriate 128-bit move. (rs6000_generate_compare): Add support for IEEE 128-bit floating point in VSX registers comparisons. (rs6000_expand_float128_convert): Likewise. * config/rs6000/rs6000.md (extenddftf2): Add support for IEEE 128-bit floating point in VSX registers. (extenddftf2_internal): Likewise. (trunctfdf2): Likewise. (trunctfdf2_internal2): Likewise. (fix_trunc_helper): Likewise. (fix_trunctfdi2"): Likewise. (floatditf2): Likewise. (floatuns<mode>tf2): Likewise. (extend<FLOAT128_SFDFTF:mode><IFKF:mode>2): Likewise. (trunc<IFKF:mode><FLOAT128_SFDFTF:mode>2): Likewise. (fix_trunc<IFKF:mode><SDI:mode>2): Likewise. (fixuns_trunc<IFKF:mode><SDI:mode>2): Likewise. (float<SDI:mode><IFKF:mode>2): Likewise. (floatuns<SDI:mode><IFKF:mode>2): Likewise.
On Fri, Aug 14, 2015 at 11:47 AM, Michael Meissner <meissner@linux.vnet.ibm.com> wrote: > This is patch #6: > > 2015-08-13 Michael Meissner <meissner@linux.vnet.ibm.com> > > * config/rs6000/rs6000-protos.h (rs6000_expand_float128_convert): > Add declaration. > > * config/rs6000/rs6000.c (rs6000_emit_le_vsx_store): Fix a > comment. > (rs6000_cannot_change_mode_class): Add support for IEEE 128-bit > floating point in VSX registers. > (rs6000_output_move_128bit): Always print out the set insn if we > can't generate an appropriate 128-bit move. > (rs6000_generate_compare): Add support for IEEE 128-bit floating > point in VSX registers comparisons. > (rs6000_expand_float128_convert): Likewise. > > * config/rs6000/rs6000.md (extenddftf2): Add support for IEEE > 128-bit floating point in VSX registers. > (extenddftf2_internal): Likewise. > (trunctfdf2): Likewise. > (trunctfdf2_internal2): Likewise. > (fix_trunc_helper): Likewise. > (fix_trunctfdi2"): Likewise. > (floatditf2): Likewise. > (floatuns<mode>tf2): Likewise. > (extend<FLOAT128_SFDFTF:mode><IFKF:mode>2): Likewise. > (trunc<IFKF:mode><FLOAT128_SFDFTF:mode>2): Likewise. > (fix_trunc<IFKF:mode><SDI:mode>2): Likewise. > (fixuns_trunc<IFKF:mode><SDI:mode>2): Likewise. > (float<SDI:mode><IFKF:mode>2): Likewise. > (floatuns<SDI:mode><IFKF:mode>2): Likewise. This patch is okay. Thanks, David
Index: gcc/config/rs6000/vector.md =================================================================== --- gcc/config/rs6000/vector.md (revision 226275) +++ gcc/config/rs6000/vector.md (working copy) @@ -36,13 +36,14 @@ (define_mode_iterator VEC_A [V16QI V8HI (define_mode_iterator VEC_K [V16QI V8HI V4SI V4SF]) ;; Vector logical modes -(define_mode_iterator VEC_L [V16QI V8HI V4SI V2DI V4SF V2DF V1TI TI]) +(define_mode_iterator VEC_L [V16QI V8HI V4SI V2DI V4SF V2DF V1TI TI KF TF]) -;; Vector modes for moves. Don't do TImode here. -(define_mode_iterator VEC_M [V16QI V8HI V4SI V2DI V4SF V2DF V1TI]) +;; Vector modes for moves. Don't do TImode or TFmode here, since their +;; moves are handled elsewhere. +(define_mode_iterator VEC_M [V16QI V8HI V4SI V2DI V4SF V2DF V1TI KF]) ;; Vector modes for types that don't need a realignment under VSX -(define_mode_iterator VEC_N [V4SI V4SF V2DI V2DF V1TI]) +(define_mode_iterator VEC_N [V4SI V4SF V2DI V2DF V1TI KF TF]) ;; Vector comparison modes (define_mode_iterator VEC_C [V16QI V8HI V4SI V2DI V4SF V2DF]) @@ -95,12 +96,19 @@ (define_expand "mov<mode>" { if (can_create_pseudo_p ()) { - if (CONSTANT_P (operands[1]) - && !easy_vector_constant (operands[1], <MODE>mode)) - operands[1] = force_const_mem (<MODE>mode, operands[1]); + if (CONSTANT_P (operands[1])) + { + if (FLOAT128_VECTOR_P (<MODE>mode)) + { + if (!easy_fp_constant (operands[1], <MODE>mode)) + operands[1] = force_const_mem (<MODE>mode, operands[1]); + } + else if (!easy_vector_constant (operands[1], <MODE>mode)) + operands[1] = force_const_mem (<MODE>mode, operands[1]); + } - else if (!vlogical_operand (operands[0], <MODE>mode) - && !vlogical_operand (operands[1], <MODE>mode)) + if (!vlogical_operand (operands[0], <MODE>mode) + && !vlogical_operand (operands[1], <MODE>mode)) operands[1] = force_reg (<MODE>mode, operands[1]); } if (!BYTES_BIG_ENDIAN Index: gcc/config/rs6000/constraints.md =================================================================== --- gcc/config/rs6000/constraints.md (revision 226275) +++ gcc/config/rs6000/constraints.md (working copy) @@ -56,12 +56,16 @@ (define_register_constraint "z" "CA_REGS (define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]" "Any VSX register if the -mvsx option was used or NO_REGS.") +;; wb is not currently used + ;; NOTE: For compatibility, "wc" is reserved to represent individual CR bits. ;; It is currently used for that purpose in LLVM. (define_register_constraint "wd" "rs6000_constraints[RS6000_CONSTRAINT_wd]" "VSX vector register to hold vector double data or NO_REGS.") +;; we is not currently used + (define_register_constraint "wf" "rs6000_constraints[RS6000_CONSTRAINT_wf]" "VSX vector register to hold vector float data or NO_REGS.") @@ -93,6 +97,14 @@ (define_register_constraint "wm" "rs6000 ;; There is a mode_attr that resolves to wm for SDmode and wn for SFmode (define_register_constraint "wn" "NO_REGS" "No register (NO_REGS).") +;; wo is not currently used + +(define_register_constraint "wp" "rs6000_constraints[RS6000_CONSTRAINT_wp]" + "VSX register to use for IEEE 128-bit fp TFmode, or NO_REGS.") + +(define_register_constraint "wq" "rs6000_constraints[RS6000_CONSTRAINT_wq]" + "VSX register to use for IEEE 128-bit fp KFmode, or NO_REGS.") + (define_register_constraint "wr" "rs6000_constraints[RS6000_CONSTRAINT_wr]" "General purpose register if 64-bit instructions are enabled or NO_REGS.") Index: gcc/config/rs6000/predicates.md =================================================================== --- gcc/config/rs6000/predicates.md (revision 226275) +++ gcc/config/rs6000/predicates.md (working copy) @@ -460,6 +460,8 @@ (define_predicate "easy_fp_constant" switch (mode) { + case KFmode: + case IFmode: case TFmode: case DFmode: case SFmode: @@ -486,6 +488,12 @@ (define_predicate "easy_vector_constant" if (TARGET_PAIRED_FLOAT) return false; + /* Because IEEE 128-bit floating point is considered a vector type + in order to pass it in VSX registers, it might use this function + instead of easy_fp_constant. */ + if (FLOAT128_VECTOR_P (mode)) + return easy_fp_constant (op, mode); + if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)) { if (zero_constant (op, mode)) Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 226275) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -2167,6 +2167,8 @@ rs6000_debug_reg_global (void) "wk reg_class = %s\n" "wl reg_class = %s\n" "wm reg_class = %s\n" + "wp reg_class = %s\n" + "wq reg_class = %s\n" "wr reg_class = %s\n" "ws reg_class = %s\n" "wt reg_class = %s\n" @@ -2190,6 +2192,8 @@ rs6000_debug_reg_global (void) reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wk]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wl]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wm]], + reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wp]], + reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wq]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wr]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_ws]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wt]], @@ -2856,6 +2860,13 @@ rs6000_init_hard_regno_mode_ok (bool glo if (TARGET_LFIWZX) rs6000_constraints[RS6000_CONSTRAINT_wz] = FLOAT_REGS; /* DImode */ + if (TARGET_FLOAT128) + { + rs6000_constraints[RS6000_CONSTRAINT_wq] = VSX_REGS; /* KFmode */ + if (rs6000_ieeequad) + rs6000_constraints[RS6000_CONSTRAINT_wp] = VSX_REGS; /* TFmode */ + } + /* Set up the reload helper and direct move functions. */ if (TARGET_VSX || TARGET_ALTIVEC) { Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 226275) +++ gcc/config/rs6000/vsx.md (working copy) @@ -31,6 +31,11 @@ (define_mode_iterator VSX_LE [V2DF V1TI (TI "VECTOR_MEM_VSX_P (TImode)")]) +;; Mode iterator to handle swapping words on little endian for the 128-bit +;; types that goes in a single vector register. +(define_mode_iterator VSX_LE_128 [(KF "FLOAT128_VECTOR_P (KFmode)") + (TF "FLOAT128_VECTOR_P (TFmode)")]) + ;; Iterator for the 2 32-bit vector types (define_mode_iterator VSX_W [V4SF V4SI]) @@ -41,11 +46,31 @@ (define_mode_iterator VSX_DF [V2DF DF]) (define_mode_iterator VSX_F [V4SF V2DF]) ;; Iterator for logical types supported by VSX -(define_mode_iterator VSX_L [V16QI V8HI V4SI V2DI V4SF V2DF V1TI TI]) +;; Note, IFmode won't actually be used since it isn't a VSX type, but it simplifies +;; the code by using 128-bit iterators for floating point. +(define_mode_iterator VSX_L [V16QI + V8HI + V4SI + V2DI + V4SF + V2DF + V1TI + TI + (KF "FLOAT128_VECTOR_P (KFmode)") + (TF "FLOAT128_VECTOR_P (TFmode)") + (IF "FLOAT128_VECTOR_P (IFmode)")]) ;; Iterator for memory move. Handle TImode specially to allow ;; it to use gprs as well as vsx registers. -(define_mode_iterator VSX_M [V16QI V8HI V4SI V2DI V4SF V2DF V1TI]) +(define_mode_iterator VSX_M [V16QI + V8HI + V4SI + V2DI + V4SF + V2DF + V1TI + (KF "FLOAT128_VECTOR_P (KFmode)") + (TF "FLOAT128_VECTOR_P (TFmode)")]) (define_mode_iterator VSX_M2 [V16QI V8HI @@ -54,6 +79,8 @@ (define_mode_iterator VSX_M2 [V16QI V4SF V2DF V1TI + (KF "FLOAT128_VECTOR_P (KFmode)") + (TF "FLOAT128_VECTOR_P (TFmode)") (TI "TARGET_VSX_TIMODE")]) ;; Map into the appropriate load/store name based on the type @@ -64,6 +91,8 @@ (define_mode_attr VSm [(V16QI "vw4") (V2DF "vd2") (V2DI "vd2") (DF "d") + (TF "vd2") + (KF "vd2") (V1TI "vd2") (TI "vd2")]) @@ -76,6 +105,8 @@ (define_mode_attr VSs [(V16QI "sp") (V2DI "dp") (DF "dp") (SF "sp") + (TF "dp") + (KF "dp") (V1TI "dp") (TI "dp")]) @@ -89,6 +120,8 @@ (define_mode_attr VSr [(V16QI "v") (DI "wi") (DF "ws") (SF "ww") + (TF "wp") + (KF "wq") (V1TI "v") (TI "wt")]) @@ -132,7 +165,9 @@ (define_mode_attr VSa [(V16QI "wa") (DF "ws") (SF "ww") (V1TI "wa") - (TI "wt")]) + (TI "wt") + (TF "wp") + (KF "wq")]) ;; Same size integer type for floating point data (define_mode_attr VSi [(V4SF "v4si") @@ -157,7 +192,8 @@ (define_mode_attr VSv [(V16QI "v") (V2DI "v") (V2DF "v") (V1TI "v") - (DF "s")]) + (DF "s") + (KF "v")]) ;; Appropriate type for add ops (and other simple FP ops) (define_mode_attr VStype_simple [(V2DF "vecdouble") @@ -623,6 +659,105 @@ (define_split (const_int 6) (const_int 7)])))] "") +;; Little endian word swapping for 128-bit types that are either scalars or the +;; special V1TI container class, which it is not appropriate to use vec_select +;; for the type. +(define_insn "*vsx_le_permute_<mode>" + [(set (match_operand:VSX_LE_128 0 "nonimmediate_operand" "=<VSa>,<VSa>,Z") + (rotate:VSX_LE_128 + (match_operand:VSX_LE_128 1 "input_operand" "<VSa>,Z,<VSa>") + (const_int 64)))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "@ + xxpermdi %x0,%x1,%x1,2 + lxvd2x %x0,%y1 + stxvd2x %x1,%y0" + [(set_attr "length" "4") + (set_attr "type" "vecperm,vecload,vecstore")]) + +(define_insn_and_split "*vsx_le_undo_permute_<mode>" + [(set (match_operand:VSX_LE_128 0 "vsx_register_operand" "=<VSa>,<VSa>") + (rotate:VSX_LE_128 + (rotate:VSX_LE_128 + (match_operand:VSX_LE_128 1 "vsx_register_operand" "0,<VSa>") + (const_int 64)) + (const_int 64)))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "@ + # + xxlor %x0,%x1" + "" + [(set (match_dup 0) (match_dup 1))] +{ + if (reload_completed && REGNO (operands[0]) == REGNO (operands[1])) + { + emit_note (NOTE_INSN_DELETED); + DONE; + } +} + [(set_attr "length" "0,4") + (set_attr "type" "vecsimple")]) + +(define_insn_and_split "*vsx_le_perm_load_<mode>" + [(set (match_operand:VSX_LE_128 0 "vsx_register_operand" "=<VSa>") + (match_operand:VSX_LE_128 1 "memory_operand" "Z"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set (match_dup 2) + (rotate:VSX_LE_128 (match_dup 1) + (const_int 64))) + (set (match_dup 0) + (rotate:VSX_LE_128 (match_dup 2) + (const_int 64)))] + " +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0]) + : operands[0]; +} + " + [(set_attr "type" "vecload") + (set_attr "length" "8")]) + +(define_insn "*vsx_le_perm_store_<mode>" + [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z") + (match_operand:VSX_LE_128 1 "vsx_register_operand" "+<VSa>"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + [(set_attr "type" "vecstore") + (set_attr "length" "12")]) + +(define_split + [(set (match_operand:VSX_LE_128 0 "memory_operand" "") + (match_operand:VSX_LE_128 1 "vsx_register_operand" ""))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" + [(set (match_dup 2) + (rotate:VSX_LE_128 (match_dup 1) + (const_int 64))) + (set (match_dup 0) + (rotate:VSX_LE_128 (match_dup 2) + (const_int 64)))] +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0]) + : operands[0]; +}) + +;; The post-reload split requires that we re-permute the source +;; register in case it is still live. +(define_split + [(set (match_operand:VSX_LE_128 0 "memory_operand" "") + (match_operand:VSX_LE_128 1 "vsx_register_operand" ""))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed" + [(set (match_dup 1) + (rotate:VSX_LE_128 (match_dup 1) + (const_int 64))) + (set (match_dup 0) + (rotate:VSX_LE_128 (match_dup 1) + (const_int 64))) + (set (match_dup 1) + (rotate:VSX_LE_128 (match_dup 1) + (const_int 64)))] + "") (define_insn "*vsx_mov<mode>" [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?<VSa>,?<VSa>,wQ,?&r,??Y,??r,??r,<VSr>,?<VSa>,*r,v,wZ, v") Index: gcc/config/rs6000/rs6000.h =================================================================== --- gcc/config/rs6000/rs6000.h (revision 226275) +++ gcc/config/rs6000/rs6000.h (working copy) @@ -1496,6 +1496,8 @@ enum r6000_reg_class_enum { RS6000_CONSTRAINT_wk, /* FPR/VSX register for DFmode direct moves. */ RS6000_CONSTRAINT_wl, /* FPR register for LFIWAX */ RS6000_CONSTRAINT_wm, /* VSX register for direct move */ + RS6000_CONSTRAINT_wp, /* VSX reg for IEEE 128-bit fp TFmode. */ + RS6000_CONSTRAINT_wq, /* VSX reg for IEEE 128-bit fp KFmode. */ RS6000_CONSTRAINT_wr, /* GPR register if 64-bit */ RS6000_CONSTRAINT_ws, /* VSX register for DF */ RS6000_CONSTRAINT_wt, /* VSX register for TImode */ Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (revision 226275) +++ gcc/config/rs6000/altivec.md (working copy) @@ -167,10 +167,27 @@ (define_mode_iterator VF [V4SF]) (define_mode_iterator V [V4SI V8HI V16QI V4SF]) ;; Vec modes for move/logical/permute ops, include vector types for move not ;; otherwise handled by altivec (v2df, v2di, ti) -(define_mode_iterator VM [V4SI V8HI V16QI V4SF V2DF V2DI V1TI TI]) +(define_mode_iterator VM [V4SI + V8HI + V16QI + V4SF + V2DF + V2DI + V1TI + TI + (KF "FLOAT128_VECTOR_P (KFmode)") + (TF "FLOAT128_VECTOR_P (TFmode)")]) ;; Like VM, except don't do TImode -(define_mode_iterator VM2 [V4SI V8HI V16QI V4SF V2DF V2DI V1TI]) +(define_mode_iterator VM2 [V4SI + V8HI + V16QI + V4SF + V2DF + V2DI + V1TI + (KF "FLOAT128_VECTOR_P (KFmode)") + (TF "FLOAT128_VECTOR_P (TFmode)")]) (define_mode_attr VI_char [(V2DI "d") (V4SI "w") (V8HI "h") (V16QI "b")]) (define_mode_attr VI_scalar [(V2DI "DI") (V4SI "SI") (V8HI "HI") (V16QI "QI")]) @@ -3488,3 +3505,32 @@ (define_peephole2 (match_dup 3)] UNSPEC_BCD_ADD_SUB) (match_dup 4)))])]) + + +;; Return constant 0x80000000000000000000000000000000 in an Altivec register. + +(define_expand "altivec_high_bit" + [(set (match_dup 1) + (vec_duplicate:V16QI (const_int 7))) + (set (match_dup 2) + (ashift:V16QI (match_dup 1) + (match_dup 1))) + (set (match_dup 3) + (match_dup 4)) + (set (match_operand:V16QI 0 "register_operand" "") + (unspec:V16QI [(match_dup 2) + (match_dup 3) + (const_int 15)] UNSPEC_VSLDOI))] + "TARGET_ALTIVEC" +{ + if (can_create_pseudo_p ()) + { + operands[1] = gen_reg_rtx (V16QImode); + operands[2] = gen_reg_rtx (V16QImode); + operands[3] = gen_reg_rtx (V16QImode); + } + else + operands[1] = operands[2] = operands[3] = operands[0]; + + operands[4] = CONST0_RTX (V16QImode); +}) Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi (revision 226275) +++ gcc/doc/md.texi (working copy) @@ -3087,12 +3087,13 @@ Any VSX register if the -mvsx option was When using any of the register constraints (@code{wa}, @code{wd}, @code{wf}, @code{wg}, @code{wh}, @code{wi}, @code{wj}, @code{wk}, -@code{wl}, @code{wm}, @code{ws}, @code{wt}, @code{wu}, @code{wv}, -@code{ww}, or @code{wy}) that take VSX registers, you must use -@code{%x<n>} in the template so that the correct register is used. -Otherwise the register number output in the assembly file will be -incorrect if an Altivec register is an operand of a VSX instruction -that expects VSX register numbering. +@code{wl}, @code{wm}, @code{wp}, @code{wq}, @code{ws}, @code{wt}, +@code{wu}, @code{wv}, @code{ww}, or @code{wy}) +that take VSX registers, you must use @code{%x<n>} in the template so +that the correct register is used. Otherwise the register number +output in the assembly file will be incorrect if an Altivec register +is an operand of a VSX instruction that expects VSX register +numbering. @smallexample asm ("xvadddp %x0,%x1,%x2" : "=wa" (v1) : "wa" (v2), "wa" (v3)); @@ -3136,6 +3137,12 @@ VSX register if direct move instructions @item wn No register (NO_REGS). +@item wp +VSX register to use for IEEE 128-bit floating point TFmode, or NO_REGS. + +@item wq +VSX register to use for IEEE 128-bit floating point, or NO_REGS. + @item wr General purpose register if 64-bit instructions are enabled or NO_REGS.