diff mbox

, PowerPC IEEE 128-bit patch #4

Message ID 20150729200428.GA30347@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner July 29, 2015, 8:04 p.m. UTC
This is another intermediate patch to get IEEE 128-bit support on the PowerPC
into the GCC compiler. This patch adds a lot of the support to allow IEEE
128-bit support in VSX registers. Note, it will need future patches that
updates rs6000.c and rs6000.md to enable the basic IEEE 128-bit support.

This patch bootstraps and has no test suite regressions on a big endian Power7
system and a little endian system. Is it ok to install?

The expected future patches are:

  #5	Finish the enablement of the basic support (rs6000.c & rs6000.md
	changes);

  #6	Add support for using different names for the 64/128-bit integer
	conversion to IBM extended double, to allow a future version to
	switch the default for what long double is. It is not expected that GCC
	6.x will make this switch, but we would like to eventually use the
	standard TF names for the library when the default change is made. If
	this isn't clear, the following names use 'tf' in them, when they use
	IBM extended double:

		__dpd_extendddtf
		__dpd_extendsdtf
		__dpd_extendtftd
		__dpd_trunctdtf
		__dpd_trunctfdd
		__dpd_trunctfsd
		__fixtfdi
		__fixtfti
		__fixunstfti
		__floattitf
		__floatuntitf
		__powitf2
		__floatditf
		__floatunditf
		__fixunstfdi

  #7	Basic patches to enable libgcc support. It is anticipated that these
	patches may be temporary changes, to allow for the glibc team to do the
	soft-float emulator changes that are shared with libgcc (but they can't
	really start until there is basic support in there).

  #8	Enable IEEE 128-bit floating point in VSX registers by default for VSX
	systems, add tests for IEEE 128-bit support and correct calling
	sequence.

  #9...	Various fixes of things I haven't yet covered (complex, libquadmath,
	etc.).

2015-07-28  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/vector.md (VEC_L): Add KFmode and TFmode.
	(VEC_M): Likewise.
	(VEC_N): Likewise.
	(mov<mode>, VEC_M iterator): Add support for IEEE 128-bit floating
	point in VSX registers.

	* config/rs6000/constraints.md (wb constraint): Document unused
	w<x> constraint.
	(we constraint): Likewise.
	(wo constraint): Likewise.
	(wp constraint): New constraint for IEEE 128-bit floating point in
	VSX registers.
	(wq constraint): Likewise.

	* config/rs6000/predicates.md (easy_fp_constant): Add support for
	IEEE 128-bit floating point in VSX registers.
	(easy_scalar_constant): Likewise.

	* config/rs6000/rs6000.c (rs6000_debug_reg_global): Add new
	constraints (wp, wq) for IEEE 128-bit floating point in VSX
	registers.
	(rs6000_init_hard_regno_mode_ok): Likewise.

	* config/rs6000/vsx.md (VSX_LE_128): Add support for IEEE 128-bit
	floating point in VSX registers.
	(VSX_L): Likewise.
	(VSX_M): Likewise.
	(VSX_M2): Likewise.
	(VSm): Likewise.
	(VSs): Likewise.
	(VSr): Likewise.
	(VSa): Likewise.
	(VSv): Likewise.
	(vsx_le_permute_<mode>): Add support to properly swap bytes for
	IEEE 128-bit floating point in VSX registers on little endian.
	(vsx_le_undo_permute_<mode>): Likewise.
	(vsx_le_perm_load_<mode>): Likewise.
	(vsx_le_perm_store_<mode>): Likewise.
	(splitters for IEEE 128-bit fp moves): Likewise.

	* config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add wp and
	wq constraints.

	* config/rs6000/altivec.md (VM): Add support for IEEE 128-bit
	floating point in VSX registers.
	(VM2): Likewise.
	(altivec_high_bit): New insn to set just the high bit in an
	altivec register.

	* doc/md.text (Machine Constraints): Document wp and wq
	constraints on PowerPC.

Comments

Segher Boessenkool July 29, 2015, 9:59 p.m. UTC | #1
On Wed, Jul 29, 2015 at 04:04:28PM -0400, Michael Meissner wrote:
> +;; Return constant 0x80000000000000000000000000000000 in an Altivec register.
> +
> +(define_expand "altivec_high_bit"
> +  [(set (match_dup 1)
> +	(vec_duplicate:V16QI (const_int 7)))
> +   (set (match_dup 2)
> +	(ashift:V16QI (match_dup 1)
> +		      (match_dup 1)))
> +   (set (match_dup 3)
> +	(match_dup 4))
> +   (set (match_operand:V16QI 0 "register_operand" "")
> +	(unspec:V16QI [(match_dup 2)
> +		       (match_dup 3)
> +		       (const_int 15)] UNSPEC_VSLDOI))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (can_create_pseudo_p ())
> +    {
> +      operands[1] = gen_reg_rtx (V16QImode);
> +      operands[2] = gen_reg_rtx (V16QImode);
> +      operands[3] = gen_reg_rtx (V16QImode);
> +    }
> +  else
> +    operands[1] = operands[2] = operands[3] = operands[0];

This won't work (in the pattern you write to op 3 before reading from op 2).
Do you ever call this expander late, anyway?


Segher
Michael Meissner July 29, 2015, 10:38 p.m. UTC | #2
On Wed, Jul 29, 2015 at 04:59:23PM -0500, Segher Boessenkool wrote:
> On Wed, Jul 29, 2015 at 04:04:28PM -0400, Michael Meissner wrote:
> > +;; Return constant 0x80000000000000000000000000000000 in an Altivec register.
> > +
> > +(define_expand "altivec_high_bit"
> > +  [(set (match_dup 1)
> > +	(vec_duplicate:V16QI (const_int 7)))
> > +   (set (match_dup 2)
> > +	(ashift:V16QI (match_dup 1)
> > +		      (match_dup 1)))
> > +   (set (match_dup 3)
> > +	(match_dup 4))
> > +   (set (match_operand:V16QI 0 "register_operand" "")
> > +	(unspec:V16QI [(match_dup 2)
> > +		       (match_dup 3)
> > +		       (const_int 15)] UNSPEC_VSLDOI))]
> > +  "TARGET_ALTIVEC"
> > +{
> > +  if (can_create_pseudo_p ())
> > +    {
> > +      operands[1] = gen_reg_rtx (V16QImode);
> > +      operands[2] = gen_reg_rtx (V16QImode);
> > +      operands[3] = gen_reg_rtx (V16QImode);
> > +    }
> > +  else
> > +    operands[1] = operands[2] = operands[3] = operands[0];
> 
> This won't work (in the pattern you write to op 3 before reading from op 2).
> Do you ever call this expander late, anyway?

I'm not sure I follow you.  Without the patch lines the insns are as follows (I
put in blank lines to separate the insns):

(define_expand "altivec_high_bit"
  [(set (match_dup 1)
	(vec_duplicate:V16QI (const_int 7)))

   (set (match_dup 2)
	(ashift:V16QI (match_dup 1)
		      (match_dup 1)))

   (set (match_dup 3)
	(match_dup 4))

   (set (match_operand:V16QI 0 "register_operand" "")
	(unspec:V16QI [(match_dup 2)
		       (match_dup 3)
		       (const_int 15)] UNSPEC_VSLDOI))]
  "TARGET_ALTIVEC"
{
  if (can_create_pseudo_p ())
    {
      operands[1] = gen_reg_rtx (V16QImode);
      operands[2] = gen_reg_rtx (V16QImode);
      operands[3] = gen_reg_rtx (V16QImode);
    }
  else
    operands[1] = operands[2] = operands[3] = operands[0];

  operands[4] = CONST0_RTX (V16QImode);
})

The first insn sets operands[1] to be 0x07070707070707070707070707070707LL.

The second insn sets operands[2] to be operands[1] << operands[1], i.e.
0x80808080808080808080808080808080LL.

The third insn sets operands[3] to be 0.

The fourth does a double vector shift left 15 bytes, filing in 0's in the
bottom bits, which leaves the following in the register:
0x80000000000000000000000000000000LL

This is negative -0.0 in IEEE 128-bit, which is used to flip the sign bit.

The code is used for negate and absolute value (which is done during rtl
expansion). Here is the negate use case.

(define_insn_and_split "ieee_128bit_vsx_neg<mode>2"
  [(set (match_operand:TFIFKF 0 "register_operand" "=wa")
	(neg:TFIFKF (match_operand:TFIFKF 1 "register_operand" "wa")))
   (clobber (match_scratch:V16QI 2 "=v"))]
  "TARGET_FLOAT128 && FLOAT128_IEEE_P (<MODE>mode)"
  "#"
  ""
  [(parallel [(set (match_dup 0)
		   (neg:TFIFKF (match_dup 1)))
	      (use (match_dup 2))])]
{
  if (GET_CODE (operands[2]) == SCRATCH)
    operands[2] = gen_reg_rtx (V16QImode);

  operands[3] = gen_reg_rtx (V16QImode);
  emit_insn (gen_altivec_high_bit (operands[2]));
}
  [(set_attr "length" "8")
   (set_attr "type" "vecsimple")])

(define_insn "*ieee_128bit_vsx_neg<mode>2_internal"
  [(set (match_operand:TFIFKF 0 "register_operand" "=wa")
	(neg:TFIFKF (match_operand:TFIFKF 1 "register_operand" "wa")))
   (use (match_operand:V16QI 2 "register_operand" "=v"))]
  "TARGET_FLOAT128"
  "xxlxor %x0,%x1,%x2"
  [(set_attr "length" "4")
   (set_attr "type" "vecsimple")])
Segher Boessenkool July 29, 2015, 10:46 p.m. UTC | #3
On Wed, Jul 29, 2015 at 06:38:45PM -0400, Michael Meissner wrote:
> On Wed, Jul 29, 2015 at 04:59:23PM -0500, Segher Boessenkool wrote:
> > On Wed, Jul 29, 2015 at 04:04:28PM -0400, Michael Meissner wrote:
> > > +;; Return constant 0x80000000000000000000000000000000 in an Altivec register.
> > > +
> > > +(define_expand "altivec_high_bit"
> > > +  [(set (match_dup 1)
> > > +	(vec_duplicate:V16QI (const_int 7)))
> > > +   (set (match_dup 2)
> > > +	(ashift:V16QI (match_dup 1)
> > > +		      (match_dup 1)))
> > > +   (set (match_dup 3)
> > > +	(match_dup 4))
> > > +   (set (match_operand:V16QI 0 "register_operand" "")
> > > +	(unspec:V16QI [(match_dup 2)
> > > +		       (match_dup 3)
> > > +		       (const_int 15)] UNSPEC_VSLDOI))]
> > > +  "TARGET_ALTIVEC"
> > > +{
> > > +  if (can_create_pseudo_p ())
> > > +    {
> > > +      operands[1] = gen_reg_rtx (V16QImode);
> > > +      operands[2] = gen_reg_rtx (V16QImode);
> > > +      operands[3] = gen_reg_rtx (V16QImode);
> > > +    }
> > > +  else
> > > +    operands[1] = operands[2] = operands[3] = operands[0];
> > 
> > This won't work (in the pattern you write to op 3 before reading from op 2).
> > Do you ever call this expander late, anyway?
> 
> I'm not sure I follow you.

I'm sorry, I meant that very last line I quoted, the !can_create_pseudo_p ()
one.  If that is executed operands[2] will be the same reg as operands[3],
and things fall apart.


Segher
Michael Meissner July 29, 2015, 11:03 p.m. UTC | #4
On Wed, Jul 29, 2015 at 05:46:42PM -0500, Segher Boessenkool wrote:
> On Wed, Jul 29, 2015 at 06:38:45PM -0400, Michael Meissner wrote:
> > On Wed, Jul 29, 2015 at 04:59:23PM -0500, Segher Boessenkool wrote:
> > > On Wed, Jul 29, 2015 at 04:04:28PM -0400, Michael Meissner wrote:
> > > > +;; Return constant 0x80000000000000000000000000000000 in an Altivec register.
> > > > +
> > > > +(define_expand "altivec_high_bit"
> > > > +  [(set (match_dup 1)
> > > > +	(vec_duplicate:V16QI (const_int 7)))
> > > > +   (set (match_dup 2)
> > > > +	(ashift:V16QI (match_dup 1)
> > > > +		      (match_dup 1)))
> > > > +   (set (match_dup 3)
> > > > +	(match_dup 4))
> > > > +   (set (match_operand:V16QI 0 "register_operand" "")
> > > > +	(unspec:V16QI [(match_dup 2)
> > > > +		       (match_dup 3)
> > > > +		       (const_int 15)] UNSPEC_VSLDOI))]
> > > > +  "TARGET_ALTIVEC"
> > > > +{
> > > > +  if (can_create_pseudo_p ())
> > > > +    {
> > > > +      operands[1] = gen_reg_rtx (V16QImode);
> > > > +      operands[2] = gen_reg_rtx (V16QImode);
> > > > +      operands[3] = gen_reg_rtx (V16QImode);
> > > > +    }
> > > > +  else
> > > > +    operands[1] = operands[2] = operands[3] = operands[0];
> > > 
> > > This won't work (in the pattern you write to op 3 before reading from op 2).
> > > Do you ever call this expander late, anyway?
> > 
> > I'm not sure I follow you.
> 
> I'm sorry, I meant that very last line I quoted, the !can_create_pseudo_p ()
> one.  If that is executed operands[2] will be the same reg as operands[3],
> and things fall apart.

Yes, you are right. I'll put an abort in there if we can't allocate
pseudos. But since it is called during RTL expansion of abskf2, negkf2, etc. we
won't run into it.  Thanks.
Joseph Myers July 31, 2015, 12:43 a.m. UTC | #5
On Wed, 29 Jul 2015, Michael Meissner wrote:

>   #6	Add support for using different names for the 64/128-bit integer
> 	conversion to IBM extended double, to allow a future version to
> 	switch the default for what long double is. It is not expected that GCC
> 	6.x will make this switch, but we would like to eventually use the
> 	standard TF names for the library when the default change is made. If
> 	this isn't clear, the following names use 'tf' in them, when they use
> 	IBM extended double:

That would be for a completely separate ABI, incompatible with all 
existing objects both static and shared, since the existing ABI defines 
these names to have their existing meanings?
Michael Meissner Aug. 3, 2015, 7:02 p.m. UTC | #6
On Fri, Jul 31, 2015 at 12:43:20AM +0000, Joseph Myers wrote:
> On Wed, 29 Jul 2015, Michael Meissner wrote:
> 
> >   #6	Add support for using different names for the 64/128-bit integer
> > 	conversion to IBM extended double, to allow a future version to
> > 	switch the default for what long double is. It is not expected that GCC
> > 	6.x will make this switch, but we would like to eventually use the
> > 	standard TF names for the library when the default change is made. If
> > 	this isn't clear, the following names use 'tf' in them, when they use
> > 	IBM extended double:
> 
> That would be for a completely separate ABI, incompatible with all 
> existing objects both static and shared, since the existing ABI defines 
> these names to have their existing meanings?

The OpenPower 1.1 ABI for little endian PowerPC 64-bit does not mention the
names at all.  This ABI leaves it open whether long double is IBM extended
double or IEEE 128-bit floating point. One of the goals of these patches is
someday in the future change the default.

A lot of work, particularly in the library space needs to be done to change the
default. Until we can make the switch, users that want IEEE 128-bit support
will need to use __float128.

The intention of theese changes (currently unwritten) is to change the existing
problematical names that use TF in their name to be something else, and provide
via a weak reference an alias for the old name. So if for example, we change
the default in GCC 7.0, code compiled by GCC 6.0 would work because it uses say
__gcc_ltoq to call convert a 64-bit integer to IBM extended double instead of
__floatditf. Older code that refers to __floatditf would still work fine.

Then sometime later (such as GCC 8.0) we could make __floatditf be a weak
reference to __floatdikt.

If you have any ideas of how to do this seemlessly, please let me know.  Steve
Monroe and David Edelsohn requested that I explore that some date in the
future, we will be able to use the standard names.

I tend to be skeptical that we can do it without running into some
incompatibility, and I feel that we just have to live with the existing TF
names, and not use TF for IEEE 128-bit.

Currently, I'm using KF for the IEEE 128-bit functions, even if long double is
mapped to IEEE 128-bit instead of long double.

Another wrinkle is the 32-bit RTEMS port actually uses IEEE 128-bit floating
point with the standard names, because they never used the IBM extended
double.
Joseph Myers Aug. 3, 2015, 9:18 p.m. UTC | #7
On Mon, 3 Aug 2015, Michael Meissner wrote:

> The intention of theese changes (currently unwritten) is to change the existing
> problematical names that use TF in their name to be something else, and provide
> via a weak reference an alias for the old name. So if for example, we change
> the default in GCC 7.0, code compiled by GCC 6.0 would work because it uses say
> __gcc_ltoq to call convert a 64-bit integer to IBM extended double instead of
> __floatditf. Older code that refers to __floatditf would still work fine.

But as those names are in the implementation namespace and aren't 
user-visible, I don't see the point in such a change (other than as part 
of a complete new incompatible ABI which gets its own copies of libgcc, 
libstdc++, libc, libm and other affected libraries).  Of course it *can* 
be done via symbol versioning to keep working with existing binaries / 
shared libraries using the existing symbols from libgcc, but the libgcc 
build system doesn't make that sort of target-specific symbol versioning 
particularly convenient.  And as usual with symbol versioning, you'd break 
compatibility with existing static libraries / .o files.
Michael Meissner Aug. 14, 2015, 3:47 p.m. UTC | #8
There are 3 patches left in the basic IEEE 128-bit floating point support for
the compiler. I will submit these at the same time. They are split to make the
review process similar.  Patch #5 and #6 are indpendent of each other and can
be applied in either order. Patch #7 assumes that patches 1-6 have been
applied.

Patch #5 adds the following:
 * Support for the reload handlers that will be enabled in patch #7.
 * Adds IFmode/KFmode to other iterators as appropriate.
 * Adds the basic negate, absolute value, and negative absolute value support.
 * Adds the insns for the 128-bit pack/unpack routines.

Patch #6 adds the following:
 * Adds support for comparisons.
 * Updates the cannot change mode support.

Patch #7 finishes up the initial basic support.
 * It defines macros for IEEE 128-bit floating point users.
 * It defines the basic move support.
 * It sets up the calling sequence.
 * It registers the __float128 and __ibm128 keywords.
 * It sets up the various handler functions.
 * It adds 'q' and 'Q' as the suffix for IEEE 128-bit floating point.
 * It adds target attribute/pragma support for the IEEE 128-bit options.
 * It treats IEEE 128-bit in VSX register modes as vector.
 * It uses a unique mangling for IEEE 128-bit in VSX registers.
 * It moves vector modes tieable above scalar floating point.
 * It adds a simple minded test to make sure IEEE args are passed as vectors.

Things to be done:
 * Work with GDB to add debug support.
 * Work with GLIBC to add basic software emulation support.
 * Work with GLIBC on other IEEE 128-bit support.
 * Look into Complex support.
 * Look into libquadmath support.
 * Enable -mfloat128-software if -mvsx.
 * Add more tests.
 * Fix bugs that show up if -mabi=ieeelongdouble is used.

Each patch bootstraps without error and has no regressions. Are they ok to
install in the trunk?

This is patch #6:

2015-08-13  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000-protos.h (rs6000_expand_float128_convert):
	Add declaration.

	* config/rs6000/rs6000.c (rs6000_emit_le_vsx_store): Fix a
	comment.
	(rs6000_cannot_change_mode_class): Add support for IEEE 128-bit
	floating point in VSX registers.
	(rs6000_output_move_128bit): Always print out the set insn if we
	can't generate an appropriate 128-bit move.
	(rs6000_generate_compare): Add support for IEEE 128-bit floating
	point in VSX registers comparisons.
	(rs6000_expand_float128_convert): Likewise.

	* config/rs6000/rs6000.md (extenddftf2): Add support for IEEE
	128-bit floating point in VSX registers.
	(extenddftf2_internal): Likewise.
	(trunctfdf2): Likewise.
	(trunctfdf2_internal2): Likewise.
	(fix_trunc_helper): Likewise.
	(fix_trunctfdi2"): Likewise.
	(floatditf2): Likewise.
	(floatuns<mode>tf2): Likewise.
	(extend<FLOAT128_SFDFTF:mode><IFKF:mode>2): Likewise.
	(trunc<IFKF:mode><FLOAT128_SFDFTF:mode>2): Likewise.
	(fix_trunc<IFKF:mode><SDI:mode>2): Likewise.
	(fixuns_trunc<IFKF:mode><SDI:mode>2): Likewise.
	(float<SDI:mode><IFKF:mode>2): Likewise.
	(floatuns<SDI:mode><IFKF:mode>2): Likewise.
David Edelsohn Aug. 26, 2015, 1:51 p.m. UTC | #9
On Fri, Aug 14, 2015 at 11:47 AM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:

> This is patch #6:
>
> 2015-08-13  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000-protos.h (rs6000_expand_float128_convert):
>         Add declaration.
>
>         * config/rs6000/rs6000.c (rs6000_emit_le_vsx_store): Fix a
>         comment.
>         (rs6000_cannot_change_mode_class): Add support for IEEE 128-bit
>         floating point in VSX registers.
>         (rs6000_output_move_128bit): Always print out the set insn if we
>         can't generate an appropriate 128-bit move.
>         (rs6000_generate_compare): Add support for IEEE 128-bit floating
>         point in VSX registers comparisons.
>         (rs6000_expand_float128_convert): Likewise.
>
>         * config/rs6000/rs6000.md (extenddftf2): Add support for IEEE
>         128-bit floating point in VSX registers.
>         (extenddftf2_internal): Likewise.
>         (trunctfdf2): Likewise.
>         (trunctfdf2_internal2): Likewise.
>         (fix_trunc_helper): Likewise.
>         (fix_trunctfdi2"): Likewise.
>         (floatditf2): Likewise.
>         (floatuns<mode>tf2): Likewise.
>         (extend<FLOAT128_SFDFTF:mode><IFKF:mode>2): Likewise.
>         (trunc<IFKF:mode><FLOAT128_SFDFTF:mode>2): Likewise.
>         (fix_trunc<IFKF:mode><SDI:mode>2): Likewise.
>         (fixuns_trunc<IFKF:mode><SDI:mode>2): Likewise.
>         (float<SDI:mode><IFKF:mode>2): Likewise.
>         (floatuns<SDI:mode><IFKF:mode>2): Likewise.

This patch is okay.

Thanks, David
diff mbox

Patch

Index: gcc/config/rs6000/vector.md
===================================================================
--- gcc/config/rs6000/vector.md	(revision 226275)
+++ gcc/config/rs6000/vector.md	(working copy)
@@ -36,13 +36,14 @@  (define_mode_iterator VEC_A [V16QI V8HI 
 (define_mode_iterator VEC_K [V16QI V8HI V4SI V4SF])
 
 ;; Vector logical modes
-(define_mode_iterator VEC_L [V16QI V8HI V4SI V2DI V4SF V2DF V1TI TI])
+(define_mode_iterator VEC_L [V16QI V8HI V4SI V2DI V4SF V2DF V1TI TI KF TF])
 
-;; Vector modes for moves.  Don't do TImode here.
-(define_mode_iterator VEC_M [V16QI V8HI V4SI V2DI V4SF V2DF V1TI])
+;; Vector modes for moves.  Don't do TImode or TFmode here, since their
+;; moves are handled elsewhere.
+(define_mode_iterator VEC_M [V16QI V8HI V4SI V2DI V4SF V2DF V1TI KF])
 
 ;; Vector modes for types that don't need a realignment under VSX
-(define_mode_iterator VEC_N [V4SI V4SF V2DI V2DF V1TI])
+(define_mode_iterator VEC_N [V4SI V4SF V2DI V2DF V1TI KF TF])
 
 ;; Vector comparison modes
 (define_mode_iterator VEC_C [V16QI V8HI V4SI V2DI V4SF V2DF])
@@ -95,12 +96,19 @@  (define_expand "mov<mode>"
 {
   if (can_create_pseudo_p ())
     {
-      if (CONSTANT_P (operands[1])
-	  && !easy_vector_constant (operands[1], <MODE>mode))
-	operands[1] = force_const_mem (<MODE>mode, operands[1]);
+      if (CONSTANT_P (operands[1]))
+	{
+	  if (FLOAT128_VECTOR_P (<MODE>mode))
+	    {
+	      if (!easy_fp_constant (operands[1], <MODE>mode))
+		operands[1] = force_const_mem (<MODE>mode, operands[1]);
+	    }
+	  else if (!easy_vector_constant (operands[1], <MODE>mode))
+	    operands[1] = force_const_mem (<MODE>mode, operands[1]);
+	}
 
-      else if (!vlogical_operand (operands[0], <MODE>mode)
-	       && !vlogical_operand (operands[1], <MODE>mode))
+      if (!vlogical_operand (operands[0], <MODE>mode)
+	  && !vlogical_operand (operands[1], <MODE>mode))
 	operands[1] = force_reg (<MODE>mode, operands[1]);
     }
   if (!BYTES_BIG_ENDIAN
Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 226275)
+++ gcc/config/rs6000/constraints.md	(working copy)
@@ -56,12 +56,16 @@  (define_register_constraint "z" "CA_REGS
 (define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]"
   "Any VSX register if the -mvsx option was used or NO_REGS.")
 
+;; wb is not currently used
+
 ;; NOTE: For compatibility, "wc" is reserved to represent individual CR bits.
 ;; It is currently used for that purpose in LLVM.
 
 (define_register_constraint "wd" "rs6000_constraints[RS6000_CONSTRAINT_wd]"
   "VSX vector register to hold vector double data or NO_REGS.")
 
+;; we is not currently used
+
 (define_register_constraint "wf" "rs6000_constraints[RS6000_CONSTRAINT_wf]"
   "VSX vector register to hold vector float data or NO_REGS.")
 
@@ -93,6 +97,14 @@  (define_register_constraint "wm" "rs6000
 ;; There is a mode_attr that resolves to wm for SDmode and wn for SFmode
 (define_register_constraint "wn" "NO_REGS" "No register (NO_REGS).")
 
+;; wo is not currently used
+
+(define_register_constraint "wp" "rs6000_constraints[RS6000_CONSTRAINT_wp]"
+  "VSX register to use for IEEE 128-bit fp TFmode, or NO_REGS.")
+
+(define_register_constraint "wq" "rs6000_constraints[RS6000_CONSTRAINT_wq]"
+  "VSX register to use for IEEE 128-bit fp KFmode, or NO_REGS.")
+
 (define_register_constraint "wr" "rs6000_constraints[RS6000_CONSTRAINT_wr]"
   "General purpose register if 64-bit instructions are enabled or NO_REGS.")
 
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 226275)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -460,6 +460,8 @@  (define_predicate "easy_fp_constant"
 
   switch (mode)
     {
+    case KFmode:
+    case IFmode:
     case TFmode:
     case DFmode:
     case SFmode:
@@ -486,6 +488,12 @@  (define_predicate "easy_vector_constant"
   if (TARGET_PAIRED_FLOAT)
     return false;
 
+  /* Because IEEE 128-bit floating point is considered a vector type
+     in order to pass it in VSX registers, it might use this function
+     instead of easy_fp_constant.  */
+  if (FLOAT128_VECTOR_P (mode))
+    return easy_fp_constant (op, mode);
+
   if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode))
     {
       if (zero_constant (op, mode))
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 226275)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -2167,6 +2167,8 @@  rs6000_debug_reg_global (void)
 	   "wk reg_class = %s\n"
 	   "wl reg_class = %s\n"
 	   "wm reg_class = %s\n"
+	   "wp reg_class = %s\n"
+	   "wq reg_class = %s\n"
 	   "wr reg_class = %s\n"
 	   "ws reg_class = %s\n"
 	   "wt reg_class = %s\n"
@@ -2190,6 +2192,8 @@  rs6000_debug_reg_global (void)
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wk]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wl]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wm]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wp]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wq]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wr]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_ws]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wt]],
@@ -2856,6 +2860,13 @@  rs6000_init_hard_regno_mode_ok (bool glo
   if (TARGET_LFIWZX)
     rs6000_constraints[RS6000_CONSTRAINT_wz] = FLOAT_REGS;	/* DImode  */
 
+  if (TARGET_FLOAT128)
+    {
+      rs6000_constraints[RS6000_CONSTRAINT_wq] = VSX_REGS;	/* KFmode  */
+      if (rs6000_ieeequad)
+	rs6000_constraints[RS6000_CONSTRAINT_wp] = VSX_REGS;	/* TFmode  */
+    }
+
   /* Set up the reload helper and direct move functions.  */
   if (TARGET_VSX || TARGET_ALTIVEC)
     {
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 226275)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -31,6 +31,11 @@  (define_mode_iterator VSX_LE [V2DF
 			      V1TI
 			      (TI	"VECTOR_MEM_VSX_P (TImode)")])
 
+;; Mode iterator to handle swapping words on little endian for the 128-bit
+;; types that goes in a single vector register.
+(define_mode_iterator VSX_LE_128 [(KF   "FLOAT128_VECTOR_P (KFmode)")
+				  (TF   "FLOAT128_VECTOR_P (TFmode)")])
+
 ;; Iterator for the 2 32-bit vector types
 (define_mode_iterator VSX_W [V4SF V4SI])
 
@@ -41,11 +46,31 @@  (define_mode_iterator VSX_DF [V2DF DF])
 (define_mode_iterator VSX_F [V4SF V2DF])
 
 ;; Iterator for logical types supported by VSX
-(define_mode_iterator VSX_L [V16QI V8HI V4SI V2DI V4SF V2DF V1TI TI])
+;; Note, IFmode won't actually be used since it isn't a VSX type, but it simplifies
+;; the code by using 128-bit iterators for floating point.
+(define_mode_iterator VSX_L [V16QI
+			     V8HI
+			     V4SI
+			     V2DI
+			     V4SF
+			     V2DF
+			     V1TI
+			     TI
+			     (KF	"FLOAT128_VECTOR_P (KFmode)")
+			     (TF	"FLOAT128_VECTOR_P (TFmode)")
+			     (IF	"FLOAT128_VECTOR_P (IFmode)")])
 
 ;; Iterator for memory move.  Handle TImode specially to allow
 ;; it to use gprs as well as vsx registers.
-(define_mode_iterator VSX_M [V16QI V8HI V4SI V2DI V4SF V2DF V1TI])
+(define_mode_iterator VSX_M [V16QI
+			     V8HI
+			     V4SI
+			     V2DI
+			     V4SF
+			     V2DF
+			     V1TI
+			     (KF	"FLOAT128_VECTOR_P (KFmode)")
+			     (TF	"FLOAT128_VECTOR_P (TFmode)")])
 
 (define_mode_iterator VSX_M2 [V16QI
 			      V8HI
@@ -54,6 +79,8 @@  (define_mode_iterator VSX_M2 [V16QI
 			      V4SF
 			      V2DF
 			      V1TI
+			      (KF	"FLOAT128_VECTOR_P (KFmode)")
+			      (TF	"FLOAT128_VECTOR_P (TFmode)")
 			      (TI	"TARGET_VSX_TIMODE")])
 
 ;; Map into the appropriate load/store name based on the type
@@ -64,6 +91,8 @@  (define_mode_attr VSm  [(V16QI "vw4")
 			(V2DF  "vd2")
 			(V2DI  "vd2")
 			(DF    "d")
+			(TF    "vd2")
+			(KF    "vd2")
 			(V1TI  "vd2")
 			(TI    "vd2")])
 
@@ -76,6 +105,8 @@  (define_mode_attr VSs	[(V16QI "sp")
 			 (V2DI  "dp")
 			 (DF    "dp")
 			 (SF	"sp")
+			 (TF    "dp")
+			 (KF    "dp")
 			 (V1TI  "dp")
 			 (TI    "dp")])
 
@@ -89,6 +120,8 @@  (define_mode_attr VSr	[(V16QI "v")
 			 (DI	"wi")
 			 (DF    "ws")
 			 (SF	"ww")
+			 (TF	"wp")
+			 (KF	"wq")
 			 (V1TI  "v")
 			 (TI    "wt")])
 
@@ -132,7 +165,9 @@  (define_mode_attr VSa	[(V16QI "wa")
 			 (DF    "ws")
 			 (SF	"ww")
 			 (V1TI	"wa")
-			 (TI    "wt")])
+			 (TI    "wt")
+			 (TF	"wp")
+			 (KF	"wq")])
 
 ;; Same size integer type for floating point data
 (define_mode_attr VSi [(V4SF  "v4si")
@@ -157,7 +192,8 @@  (define_mode_attr VSv	[(V16QI "v")
 			 (V2DI  "v")
 			 (V2DF  "v")
 			 (V1TI  "v")
-			 (DF    "s")])
+			 (DF    "s")
+			 (KF	"v")])
 
 ;; Appropriate type for add ops (and other simple FP ops)
 (define_mode_attr VStype_simple	[(V2DF "vecdouble")
@@ -623,6 +659,105 @@  (define_split
                      (const_int 6) (const_int 7)])))]
   "")
 
+;; Little endian word swapping for 128-bit types that are either scalars or the
+;; special V1TI container class, which it is not appropriate to use vec_select
+;; for the type.
+(define_insn "*vsx_le_permute_<mode>"
+  [(set (match_operand:VSX_LE_128 0 "nonimmediate_operand" "=<VSa>,<VSa>,Z")
+	(rotate:VSX_LE_128
+	 (match_operand:VSX_LE_128 1 "input_operand" "<VSa>,Z,<VSa>")
+	 (const_int 64)))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  "@
+   xxpermdi %x0,%x1,%x1,2
+   lxvd2x %x0,%y1
+   stxvd2x %x1,%y0"
+  [(set_attr "length" "4")
+   (set_attr "type" "vecperm,vecload,vecstore")])
+
+(define_insn_and_split "*vsx_le_undo_permute_<mode>"
+  [(set (match_operand:VSX_LE_128 0 "vsx_register_operand" "=<VSa>,<VSa>")
+	(rotate:VSX_LE_128
+	 (rotate:VSX_LE_128
+	  (match_operand:VSX_LE_128 1 "vsx_register_operand" "0,<VSa>")
+	  (const_int 64))
+	 (const_int 64)))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  "@
+   #
+   xxlor %x0,%x1"
+  ""
+  [(set (match_dup 0) (match_dup 1))]
+{
+  if (reload_completed && REGNO (operands[0]) == REGNO (operands[1]))
+    {
+      emit_note (NOTE_INSN_DELETED);
+      DONE;
+    }
+}
+  [(set_attr "length" "0,4")
+   (set_attr "type" "vecsimple")])
+
+(define_insn_and_split "*vsx_le_perm_load_<mode>"
+  [(set (match_operand:VSX_LE_128 0 "vsx_register_operand" "=<VSa>")
+        (match_operand:VSX_LE_128 1 "memory_operand" "Z"))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  "#"
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  [(set (match_dup 2)
+	(rotate:VSX_LE_128 (match_dup 1)
+			   (const_int 64)))
+   (set (match_dup 0)
+	(rotate:VSX_LE_128 (match_dup 2)
+			   (const_int 64)))]
+  "
+{
+  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
+                                       : operands[0];
+}
+  "
+  [(set_attr "type" "vecload")
+   (set_attr "length" "8")])
+
+(define_insn "*vsx_le_perm_store_<mode>"
+  [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z")
+        (match_operand:VSX_LE_128 1 "vsx_register_operand" "+<VSa>"))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  "#"
+  [(set_attr "type" "vecstore")
+   (set_attr "length" "12")])
+
+(define_split
+  [(set (match_operand:VSX_LE_128 0 "memory_operand" "")
+        (match_operand:VSX_LE_128 1 "vsx_register_operand" ""))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
+  [(set (match_dup 2)
+	(rotate:VSX_LE_128 (match_dup 1)
+			   (const_int 64)))
+   (set (match_dup 0)
+	(rotate:VSX_LE_128 (match_dup 2)
+			   (const_int 64)))]
+{
+  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
+                                       : operands[0];
+})
+
+;; The post-reload split requires that we re-permute the source
+;; register in case it is still live.
+(define_split
+  [(set (match_operand:VSX_LE_128 0 "memory_operand" "")
+        (match_operand:VSX_LE_128 1 "vsx_register_operand" ""))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
+  [(set (match_dup 1)
+	(rotate:VSX_LE_128 (match_dup 1)
+			   (const_int 64)))
+   (set (match_dup 0)
+	(rotate:VSX_LE_128 (match_dup 1)
+			   (const_int 64)))
+   (set (match_dup 1)
+	(rotate:VSX_LE_128 (match_dup 1)
+			   (const_int 64)))]
+  "")
 
 (define_insn "*vsx_mov<mode>"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?<VSa>,?<VSa>,wQ,?&r,??Y,??r,??r,<VSr>,?<VSa>,*r,v,wZ, v")
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 226275)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -1496,6 +1496,8 @@  enum r6000_reg_class_enum {
   RS6000_CONSTRAINT_wk,		/* FPR/VSX register for DFmode direct moves. */
   RS6000_CONSTRAINT_wl,		/* FPR register for LFIWAX */
   RS6000_CONSTRAINT_wm,		/* VSX register for direct move */
+  RS6000_CONSTRAINT_wp,		/* VSX reg for IEEE 128-bit fp TFmode. */
+  RS6000_CONSTRAINT_wq,		/* VSX reg for IEEE 128-bit fp KFmode.  */
   RS6000_CONSTRAINT_wr,		/* GPR register if 64-bit  */
   RS6000_CONSTRAINT_ws,		/* VSX register for DF */
   RS6000_CONSTRAINT_wt,		/* VSX register for TImode */
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 226275)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -167,10 +167,27 @@  (define_mode_iterator VF [V4SF])
 (define_mode_iterator V [V4SI V8HI V16QI V4SF])
 ;; Vec modes for move/logical/permute ops, include vector types for move not
 ;; otherwise handled by altivec (v2df, v2di, ti)
-(define_mode_iterator VM [V4SI V8HI V16QI V4SF V2DF V2DI V1TI TI])
+(define_mode_iterator VM [V4SI
+			  V8HI
+			  V16QI
+			  V4SF
+			  V2DF
+			  V2DI
+			  V1TI
+			  TI
+			  (KF "FLOAT128_VECTOR_P (KFmode)")
+			  (TF "FLOAT128_VECTOR_P (TFmode)")])
 
 ;; Like VM, except don't do TImode
-(define_mode_iterator VM2 [V4SI V8HI V16QI V4SF V2DF V2DI V1TI])
+(define_mode_iterator VM2 [V4SI
+			   V8HI
+			   V16QI
+			   V4SF
+			   V2DF
+			   V2DI
+			   V1TI
+			   (KF "FLOAT128_VECTOR_P (KFmode)")
+			   (TF "FLOAT128_VECTOR_P (TFmode)")])
 
 (define_mode_attr VI_char [(V2DI "d") (V4SI "w") (V8HI "h") (V16QI "b")])
 (define_mode_attr VI_scalar [(V2DI "DI") (V4SI "SI") (V8HI "HI") (V16QI "QI")])
@@ -3488,3 +3505,32 @@  (define_peephole2
 				  (match_dup 3)]
 				 UNSPEC_BCD_ADD_SUB)
 		    (match_dup 4)))])])
+
+
+;; Return constant 0x80000000000000000000000000000000 in an Altivec register.
+
+(define_expand "altivec_high_bit"
+  [(set (match_dup 1)
+	(vec_duplicate:V16QI (const_int 7)))
+   (set (match_dup 2)
+	(ashift:V16QI (match_dup 1)
+		      (match_dup 1)))
+   (set (match_dup 3)
+	(match_dup 4))
+   (set (match_operand:V16QI 0 "register_operand" "")
+	(unspec:V16QI [(match_dup 2)
+		       (match_dup 3)
+		       (const_int 15)] UNSPEC_VSLDOI))]
+  "TARGET_ALTIVEC"
+{
+  if (can_create_pseudo_p ())
+    {
+      operands[1] = gen_reg_rtx (V16QImode);
+      operands[2] = gen_reg_rtx (V16QImode);
+      operands[3] = gen_reg_rtx (V16QImode);
+    }
+  else
+    operands[1] = operands[2] = operands[3] = operands[0];
+
+  operands[4] = CONST0_RTX (V16QImode);
+})
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 226275)
+++ gcc/doc/md.texi	(working copy)
@@ -3087,12 +3087,13 @@  Any VSX register if the -mvsx option was
 
 When using any of the register constraints (@code{wa}, @code{wd},
 @code{wf}, @code{wg}, @code{wh}, @code{wi}, @code{wj}, @code{wk},
-@code{wl}, @code{wm}, @code{ws}, @code{wt}, @code{wu}, @code{wv},
-@code{ww}, or @code{wy}) that take VSX registers, you must use
-@code{%x<n>} in the template so that the correct register is used.
-Otherwise the register number output in the assembly file will be
-incorrect if an Altivec register is an operand of a VSX instruction
-that expects VSX register numbering.
+@code{wl}, @code{wm}, @code{wp}, @code{wq}, @code{ws}, @code{wt},
+@code{wu}, @code{wv}, @code{ww}, or @code{wy})
+that take VSX registers, you must use @code{%x<n>} in the template so
+that the correct register is used.  Otherwise the register number
+output in the assembly file will be incorrect if an Altivec register
+is an operand of a VSX instruction that expects VSX register
+numbering.
 
 @smallexample
 asm ("xvadddp %x0,%x1,%x2" : "=wa" (v1) : "wa" (v2), "wa" (v3));
@@ -3136,6 +3137,12 @@  VSX register if direct move instructions
 @item wn
 No register (NO_REGS).
 
+@item wp
+VSX register to use for IEEE 128-bit floating point TFmode, or NO_REGS.
+
+@item wq
+VSX register to use for IEEE 128-bit floating point, or NO_REGS.
+
 @item wr
 General purpose register if 64-bit instructions are enabled or NO_REGS.