diff mbox

PR59909, Fix powerpcle64 power8 bootstrap (quad memory support)

Message ID 20140122185042.GA27367@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner Jan. 22, 2014, 6:50 p.m. UTC
The current trunk will not bootstrap on a powerpcle64 power8 box if
--with-cpu=power8 is selected.  This has been traced down to the fact that we
did not swap the double words when we used the quad memory instructions.

In this patch, I split the ISA 2.07 quad memory support into two parts, one for
the non-atomic quad memory support instructions, and the other for atomic quad
memory support.  In big endian systems, both forms are generated, while in
little endian systems only the atomic instructions are generated, since the
overhead of swapping the words would diminish the appeal of using the
non-atomic quad memory instructions.

At a future date, I would like to optimize the quad memory atomic instructions
not to do the swapping in some cases.  That optimization is not in these
patches.

I'm attaching two versions of the patch.  The first version is against the
current trunk.  The second version is against the 4.8 compiler in the IBM
branch.

This has been tested by doing a full bootstrap on a power8 box in little endian
mode.  There are no regressions in the powerpc tests.  Is this patch ok to
apply?

2014-01-22  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/59909
	* doc/invoke.texi (RS/6000 and PowerPC Options): Document
	-mquad-memory-atomic.  Update -mquad-memory documentation to say
	it is only used for non-atomic loads/stores.

	* config/rs6000/predicates.md (quad_int_reg_operand): Allow either
	-mquad-memory or -mquad-memory-atomic switches.

	* config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): Add
	-mquad-memory-atomic to ISA 2.07 support.

	* config/rs6000/rs6000.opt (-mquad-memory-atomic): Add new switch
	to separate support of normal quad word memory operations (ldq,
	stq) from the atomic quad word memory operations.

	* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
	support to separate non-atomic quad word operations from atomic
	quad word operations.  Disable non-atomic quad word operations in
	little endian mode so that we don't have to swap words after the
	load and before the store.
	(quad_load_store_p): Add comment about atomic quad word support.
	(rs6000_opt_masks): Add -mquad-memory-atomic.

	* config/rs6000/rs6000.h (TARGET_SYNC_TI): Add support for
	-mquad-memory-atomic.
	(TARGET_SYNC_HI_QI): Likewise.

	* config/rs6000/sync.md (qswap_pred1): Add support for swapping
	double word registers with quad word atomic instructions in little
	endian mode.
	(qswap_pred2): Likewise.
	(qswap_other): Likewise.
	(qswap_<mode>): Likewise.
	(load_lockedti): Likewise.
	(load_lockedpti): Likewise.
	(store_conditionalti): Likewise.
	(store_conditionalpti): Likewise.

	* gcc/config/rs6000/rs6000.md (UNSPEC_TI_PTI_SWAP): New UNSPEC for
	dealing with quad word atomic instructions in little endian mode.

	* gcc/config/rs6000/rs6000-c.c (rs6000_target_modify_macros):
	Define __QUAD_MEMORY__ and __QUAD_MEMORY_ATOMIC__ based on what
	type of quad memory support is available.

Comments

David Edelsohn Jan. 23, 2014, 6:58 p.m. UTC | #1
On Wed, Jan 22, 2014 at 1:50 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> The current trunk will not bootstrap on a powerpcle64 power8 box if
> --with-cpu=power8 is selected.  This has been traced down to the fact that we
> did not swap the double words when we used the quad memory instructions.
>
> In this patch, I split the ISA 2.07 quad memory support into two parts, one for
> the non-atomic quad memory support instructions, and the other for atomic quad
> memory support.  In big endian systems, both forms are generated, while in
> little endian systems only the atomic instructions are generated, since the
> overhead of swapping the words would diminish the appeal of using the
> non-atomic quad memory instructions.
>
> At a future date, I would like to optimize the quad memory atomic instructions
> not to do the swapping in some cases.  That optimization is not in these
> patches.
>
> I'm attaching two versions of the patch.  The first version is against the
> current trunk.  The second version is against the 4.8 compiler in the IBM
> branch.
>
> This has been tested by doing a full bootstrap on a power8 box in little endian
> mode.  There are no regressions in the powerpc tests.  Is this patch ok to
> apply?
>
> 2014-01-22  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         PR target/59909
>         * doc/invoke.texi (RS/6000 and PowerPC Options): Document
>         -mquad-memory-atomic.  Update -mquad-memory documentation to say
>         it is only used for non-atomic loads/stores.
>
>         * config/rs6000/predicates.md (quad_int_reg_operand): Allow either
>         -mquad-memory or -mquad-memory-atomic switches.
>
>         * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): Add
>         -mquad-memory-atomic to ISA 2.07 support.
>
>         * config/rs6000/rs6000.opt (-mquad-memory-atomic): Add new switch
>         to separate support of normal quad word memory operations (ldq,
>         stq) from the atomic quad word memory operations.
>
>         * config/rs6000/rs6000.c (rs6000_option_override_internal): Add
>         support to separate non-atomic quad word operations from atomic
>         quad word operations.  Disable non-atomic quad word operations in
>         little endian mode so that we don't have to swap words after the
>         load and before the store.
>         (quad_load_store_p): Add comment about atomic quad word support.
>         (rs6000_opt_masks): Add -mquad-memory-atomic.
>
>         * config/rs6000/rs6000.h (TARGET_SYNC_TI): Add support for
>         -mquad-memory-atomic.
>         (TARGET_SYNC_HI_QI): Likewise.
>
>         * config/rs6000/sync.md (qswap_pred1): Add support for swapping
>         double word registers with quad word atomic instructions in little
>         endian mode.
>         (qswap_pred2): Likewise.
>         (qswap_other): Likewise.
>         (qswap_<mode>): Likewise.
>         (load_lockedti): Likewise.
>         (load_lockedpti): Likewise.
>         (store_conditionalti): Likewise.
>         (store_conditionalpti): Likewise.
>
>         * gcc/config/rs6000/rs6000.md (UNSPEC_TI_PTI_SWAP): New UNSPEC for
>         dealing with quad word atomic instructions in little endian mode.
>
>         * gcc/config/rs6000/rs6000-c.c (rs6000_target_modify_macros):
>         Define __QUAD_MEMORY__ and __QUAD_MEMORY_ATOMIC__ based on what
>         type of quad memory support is available.

The patch is okay, but please clarify the sync.md ChangeLog to mention
what you did for the patterns, like new patterns, new attrs, which
patterns are changed to swap double words in LE mode.

Thanks, David
Michael Meissner Jan. 23, 2014, 7:46 p.m. UTC | #2
On Thu, Jan 23, 2014 at 01:58:15PM -0500, David Edelsohn wrote:
> The patch is okay, but please clarify the sync.md ChangeLog to mention
> what you did for the patterns, like new patterns, new attrs, which
> patterns are changed to swap double words in LE mode.

In looking to write better comments, I discovered that the new patterns were
wrong in that they didn't properly swap words on quad atomics, so I will be
submitting a tweaked patch after testing.
diff mbox

Patch

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 206895)
+++ gcc/doc/invoke.texi	(working copy)
@@ -919,6 +919,7 @@  See RS/6000 and PowerPC Options.
 -mpower8-fusion -mno-mpower8-fusion -mpower8-vector -mno-power8-vector @gol
 -mcrypto -mno-crypto -mdirect-move -mno-direct-move @gol
 -mquad-memory -mno-quad-memory @gol
+-mquad-memory-atomic -mno-quad-memory-atomic @gol
 -mcompat-align-parm -mno-compat-align-parm}
 
 @emph{RX Options}
@@ -18853,7 +18854,8 @@  following options:
 -mpopcntb -mpopcntd  -mpowerpc64 @gol
 -mpowerpc-gpopt  -mpowerpc-gfxopt  -msingle-float -mdouble-float @gol
 -msimple-fpu -mstring  -mmulhw  -mdlmzb  -mmfpgpr -mvsx @gol
--mcrypto -mdirect-move -mpower8-fusion -mpower8-vector -mquad-memory}
+-mcrypto -mdirect-move -mpower8-fusion -mpower8-vector @gol
+-mquad-memory -mquad-memory-atomic}
 
 The particular options set for any particular CPU varies between
 compiler versions, depending on what setting seems to produce optimal
@@ -19040,10 +19042,18 @@  the vector instructions.
 @itemx -mno-quad-memory
 @opindex mquad-memory
 @opindex mno-quad-memory
-Generate code that uses (does not use) the quad word memory
+Generate code that uses (does not use) the non-atomic quad word memory
 instructions.  The @option{-mquad-memory} option requires use of
 64-bit mode.
 
+@item -mquad-memory-atomic
+@itemx -mno-quad-memory-atomic
+@opindex mquad-memory-atomic
+@opindex mno-quad-memory-atomic
+Generate code that uses (does not use) the atomic quad word memory
+instructions.  The @option{-mquad-memory-atomic} option requires use of
+64-bit mode.
+
 @item -mfloat-gprs=@var{yes/single/double/no}
 @itemx -mfloat-gprs
 @opindex mfloat-gprs
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 206895)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -270,7 +270,7 @@  (define_predicate "quad_int_reg_operand"
 {
   HOST_WIDE_INT r;
 
-  if (!TARGET_QUAD_MEMORY)
+  if (!TARGET_QUAD_MEMORY && !TARGET_QUAD_MEMORY_ATOMIC)
     return 0;
 
   if (GET_CODE (op) == SUBREG)
@@ -624,6 +624,7 @@  (define_predicate "offsettable_mem_opera
        (match_test "offsettable_nonstrict_memref_p (op)")))
 
 ;; Return 1 if the operand is suitable for load/store quad memory.
+;; This predicate only checks for non-atomic loads/stores.
 (define_predicate "quad_memory_operand"
   (match_code "mem")
 {
Index: gcc/config/rs6000/rs6000-cpus.def
===================================================================
--- gcc/config/rs6000/rs6000-cpus.def	(revision 206895)
+++ gcc/config/rs6000/rs6000-cpus.def	(working copy)
@@ -53,7 +53,8 @@ 
 				 | OPTION_MASK_CRYPTO			\
 				 | OPTION_MASK_DIRECT_MOVE		\
 				 | OPTION_MASK_HTM			\
-				 | OPTION_MASK_QUAD_MEMORY)
+				 | OPTION_MASK_QUAD_MEMORY		\
+  				 | OPTION_MASK_QUAD_MEMORY_ATOMIC)
 
 #define POWERPC_7400_MASK	(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_ALTIVEC)
 
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 206895)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -571,7 +571,11 @@  Use ISA 2.07 transactional memory (HTM) 
 
 mquad-memory
 Target Report Mask(QUAD_MEMORY) Var(rs6000_isa_flags)
-Generate the quad word memory instructions (lq/stq/lqarx/stqcx).
+Generate the quad word memory instructions (lq/stq).
+
+mquad-memory-atomic
+Target Report Mask(QUAD_MEMORY_ATOMIC) Var(rs6000_isa_flags)
+Generate the quad word memory atomic instructions (lqarx/stqcx).
 
 mcompat-align-parm
 Target Report Var(rs6000_compat_align_parm) Init(0) Save
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 206895)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -3356,14 +3356,37 @@  rs6000_option_override_internal (bool gl
 
   /* The quad memory instructions only works in 64-bit mode. In 32-bit mode,
      silently turn off quad memory mode.  */
-  if (TARGET_QUAD_MEMORY && !TARGET_POWERPC64)
+  if ((TARGET_QUAD_MEMORY || TARGET_QUAD_MEMORY_ATOMIC) && !TARGET_POWERPC64)
     {
       if ((rs6000_isa_flags_explicit & OPTION_MASK_QUAD_MEMORY) != 0)
 	warning (0, N_("-mquad-memory requires 64-bit mode"));
 
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_QUAD_MEMORY_ATOMIC) != 0)
+	warning (0, N_("-mquad-memory-atomic requires 64-bit mode"));
+
+      rs6000_isa_flags &= ~(OPTION_MASK_QUAD_MEMORY
+			    | OPTION_MASK_QUAD_MEMORY_ATOMIC);
+    }
+
+  /* Non-atomic quad memory load/store are disabled for little endian, since
+     the words are reversed, but atomic operations can still be done by
+     swapping the words.  */
+  if (TARGET_QUAD_MEMORY && !WORDS_BIG_ENDIAN)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_QUAD_MEMORY) != 0)
+	warning (0, N_("-mquad-memory is not available in little endian mode"));
+
       rs6000_isa_flags &= ~OPTION_MASK_QUAD_MEMORY;
     }
 
+  /* Assume if the user asked for normal quad memory instructions, they want
+     the atomic versions as well, unless they explicity told us not to use quad
+     word atomic instructions.  */
+  if (TARGET_QUAD_MEMORY
+      && !TARGET_QUAD_MEMORY_ATOMIC
+      && ((rs6000_isa_flags_explicit & OPTION_MASK_QUAD_MEMORY_ATOMIC) == 0))
+    rs6000_isa_flags |= OPTION_MASK_QUAD_MEMORY_ATOMIC;
+
   /* Enable power8 fusion if we are tuning for power8, even if we aren't
      generating power8 instructions.  */
   if (!(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION))
@@ -5939,7 +5962,8 @@  direct_move_p (rtx op0, rtx op1)
   return false;
 }
 
-/* Return true if this is a load or store quad operation.  */
+/* Return true if this is a load or store quad operation.  This function does
+   not handle the atomic quad memory instructions.  */
 
 bool
 quad_load_store_p (rtx op0, rtx op1)
@@ -30753,6 +30777,7 @@  static struct rs6000_opt_mask const rs60
   { "powerpc-gfxopt",		OPTION_MASK_PPC_GFXOPT,		false, true  },
   { "powerpc-gpopt",		OPTION_MASK_PPC_GPOPT,		false, true  },
   { "quad-memory",		OPTION_MASK_QUAD_MEMORY,	false, true  },
+  { "quad-memory-atomic",	OPTION_MASK_QUAD_MEMORY_ATOMIC,	false, true  },
   { "recip-precision",		OPTION_MASK_RECIP_PRECISION,	false, true  },
   { "string",			OPTION_MASK_STRING,		false, true  },
   { "update",			OPTION_MASK_NO_UPDATE,		true , true  },
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 206895)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -533,8 +533,11 @@  extern int rs6000_vector_align[];
 /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present
    in power7, so conditionalize them on p8 features.  TImode syncs need quad
    memory support.  */
-#define TARGET_SYNC_HI_QI	(TARGET_QUAD_MEMORY || TARGET_DIRECT_MOVE)
-#define TARGET_SYNC_TI		TARGET_QUAD_MEMORY
+#define TARGET_SYNC_HI_QI	(TARGET_QUAD_MEMORY			\
+				 || TARGET_QUAD_MEMORY_ATOMIC		\
+				 || TARGET_DIRECT_MOVE)
+
+#define TARGET_SYNC_TI		TARGET_QUAD_MEMORY_ATOMIC
 
 /* Power7 has both 32-bit load and store integer for the FPRs, so we don't need
    to allocate the SDmode stack slot to get the value into the proper location
Index: gcc/config/rs6000/sync.md
===================================================================
--- gcc/config/rs6000/sync.md	(revision 206895)
+++ gcc/config/rs6000/sync.md	(working copy)
@@ -204,25 +204,66 @@  (define_insn "load_locked<QHI:mode>_si"
   "<QHI:larx> %0,%y1"
   [(set_attr "type" "load_l")])
 
+;; Swap insns to swap words for little endian quad memory systems
+(define_mode_attr qswap_pred1	[(TI  "int_reg_operand")
+				 (PTI "quad_int_reg_operand")])
+
+(define_mode_attr qswap_pred2	[(TI  "quad_int_reg_operand")
+				 (PTI "int_reg_operand")])
+
+(define_mode_attr qswap_other	[(TI  "PTI")
+				 (PTI "TI")])
+
+(define_insn_and_split "qswap_<mode>"
+  [(set (match_operand:TI2 0 "<qswap_pred1>" "=&r")
+	(unspec:TI2 [(match_operand:<qswap_other> 1 "<qswap_pred2>" "r")]
+		    UNSPEC_TI_PTI_SWAP))]
+  "TARGET_SYNC_TI && !WORDS_BIG_ENDIAN"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 4) (match_dup 5))]
+{
+  gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1]));
+
+  operands[2] = gen_lowpart (DImode, operands[0]);
+  operands[3] = gen_lowpart (DImode, operands[1]);
+
+  operands[4] = gen_highpart (DImode, operands[0]);
+  operands[5] = gen_highpart (DImode, operands[1]);
+})
+
 ;; Use PTImode to get even/odd register pairs
+;; Use a temporary register to force getting an even register for the
+;; lqarx/stqcrx. instructions.  Normal optimizations will eliminate this extra
+;; copy on big endian systems.
+
 (define_expand "load_lockedti"
   [(use (match_operand:TI 0 "quad_int_reg_operand" ""))
    (use (match_operand:TI 1 "memory_operand" ""))]
   "TARGET_SYNC_TI"
 {
-  /* Use a temporary register to force getting an even register for the
-     lqarx/stqcrx. instructions.  Normal optimizations will eliminate this
-     extra copy.  */
   rtx pti = gen_reg_rtx (PTImode);
+
+  if (!indexed_or_indirect_operand (operands[1], TImode))
+    {
+      rtx old_addr = XEXP (operands[1], 0);
+      rtx new_addr = force_reg (Pmode, old_addr);
+      operands[1] = change_address (operands[1], TImode, new_addr);
+    }
+
   emit_insn (gen_load_lockedpti (pti, operands[1]));
-  emit_move_insn (operands[0], gen_lowpart (TImode, pti));
+  if (WORDS_BIG_ENDIAN)
+    emit_move_insn (operands[0], gen_lowpart (TImode, pti));
+  else
+    emit_insn (gen_qswap_ti (operands[0], pti));
   DONE;
 })
 
 (define_insn "load_lockedpti"
   [(set (match_operand:PTI 0 "quad_int_reg_operand" "=&r")
 	(unspec_volatile:PTI
-         [(match_operand:TI 1 "memory_operand" "Z")] UNSPECV_LL))]
+         [(match_operand:TI 1 "indexed_or_indirect_operand" "Z")] UNSPECV_LL))]
   "TARGET_SYNC_TI
    && !reg_mentioned_p (operands[0], operands[1])
    && quad_int_reg_operand (operands[0], PTImode)"
@@ -238,6 +279,10 @@  (define_insn "store_conditional<mode>"
   "<stcx> %2,%y1"
   [(set_attr "type" "store_c")])
 
+;; Use a temporary register to force getting an even register for the
+;; lqarx/stqcrx. instructions.  Normal optimizations will eliminate this extra
+;; copy on big endian systems.
+
 (define_expand "store_conditionalti"
   [(use (match_operand:CC 0 "cc_reg_operand" ""))
    (use (match_operand:TI 1 "memory_operand" ""))
@@ -247,13 +292,25 @@  (define_expand "store_conditionalti"
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx pti_op1 = change_address (op1, PTImode, XEXP (op1, 0));
-  rtx pti_op2 = gen_reg_rtx (PTImode);
+  rtx addr = XEXP (op1, 0);
+  rtx pti_op1;
+  rtx pti_op2;
+
+  if (!indexed_or_indirect_operand (op1, TImode))
+    {
+      rtx new_addr = force_reg (Pmode, addr);
+      operands[1] = op1 = change_address (op1, TImode, new_addr);
+      addr = new_addr;
+    }
+
+  pti_op1 = change_address (op1, PTImode, addr);
+  pti_op2 = gen_reg_rtx (PTImode);
+
+  if (WORDS_BIG_ENDIAN)
+    emit_move_insn (pti_op2, gen_lowpart (PTImode, op2));
+  else
+    emit_insn (gen_qswap_pti (pti_op2, op2));
 
-  /* Use a temporary register to force getting an even register for the
-     lqarx/stqcrx. instructions.  Normal optimizations will eliminate this
-     extra copy.  */
-  emit_move_insn (pti_op2, gen_lowpart (PTImode, op2));
   emit_insn (gen_store_conditionalpti (op0, pti_op1, pti_op2));
   DONE;
 })
@@ -261,7 +318,7 @@  (define_expand "store_conditionalti"
 (define_insn "store_conditionalpti"
   [(set (match_operand:CC 0 "cc_reg_operand" "=x")
 	(unspec_volatile:CC [(const_int 0)] UNSPECV_SC))
-   (set (match_operand:PTI 1 "memory_operand" "=Z")
+   (set (match_operand:PTI 1 "indexed_or_indirect_operand" "=Z")
 	(match_operand:PTI 2 "quad_int_reg_operand" "r"))]
   "TARGET_SYNC_TI && quad_int_reg_operand (operands[2], PTImode)"
   "stqcx. %2,%y1"
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 206895)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -125,6 +125,7 @@  (define_c_enum "unspec"
    UNSPEC_P8V_MTVSRD
    UNSPEC_P8V_XXPERMDI
    UNSPEC_P8V_RELOAD_FROM_VSX
+   UNSPEC_TI_PTI_SWAP
   ])
 
 ;;
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 206895)
+++ gcc/config/rs6000/rs6000-c.c	(working copy)
@@ -339,6 +339,10 @@  rs6000_target_modify_macros (bool define
     rs6000_define_or_undefine_macro (define_p, "__HTM__");
   if ((flags & OPTION_MASK_P8_VECTOR) != 0)
     rs6000_define_or_undefine_macro (define_p, "__POWER8_VECTOR__");
+  if ((flags & OPTION_MASK_QUAD_MEMORY) != 0)
+    rs6000_define_or_undefine_macro (define_p, "__QUAD_MEMORY__");
+  if ((flags & OPTION_MASK_QUAD_MEMORY_ATOMIC) != 0)
+    rs6000_define_or_undefine_macro (define_p, "__QUAD_MEMORY_ATOMIC__");
   if ((flags & OPTION_MASK_CRYPTO) != 0)
     rs6000_define_or_undefine_macro (define_p, "__CRYPTO__");