Patchwork s390: Avoid CAS boolean output inefficiency

login
register
mail settings
Submitter Ulrich Weigand
Date Aug. 7, 2012, 5:02 p.m.
Message ID <201208071702.q77H2DFl025511@d06av02.portsmouth.uk.ibm.com>
Download mbox | patch
Permalink /patch/175752/
State New
Headers show

Comments

Ulrich Weigand - Aug. 7, 2012, 5:02 p.m.
Richard Henderson wrote:
> On 08/06/2012 11:34 AM, Ulrich Weigand wrote:
> > There is one particular inefficiency I have noticed.  This code:
> > 
> >   if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0))
> >     abort ();
> > 
> > from atomic-compare-exchange-3.c gets compiled into:
> > 
> >         l       %r3,0(%r2)
> >         larl    %r1,v
> >         cs      %r3,%r4,0(%r1)
> >         ipm     %r1
> >         sra     %r1,28
> >         st      %r3,0(%r2)
> >         ltr     %r1,%r1
> >         jne     .L3
> > 
> > which is extremely inefficient; it converts the condition code into
> > an integer using the slow ipm, sra sequence, just so that it can
> > convert the integer back into a condition code via ltr and branch
> > on it ...
> 
> This was caused (or perhaps abetted by) the representation of EQ
> as NE ^ 1.  With the subsequent truncation and zero-extend, I
> think combine reached its insn limit of 3 before seeing everything
> it needed to see.

Yes, combine isn't able to handle everything in one go, but it finds
an intermediate insn.  With the old __sync_compare_and_swap, it is
then able to optimize everything away in a second step; the only
reason this doesn't work any more is the intervening store.

The following patch changes the builtin expander to pass a MEM oldval
as-is to the back-end expander, so that the back-end can move the
store to before the CC operation.  With that patch I'm also seeing
all the IPMs disappear.

[ Well, all except for this one:

> This happens because CSE notices the cbranch vs 0, and sets r116
> to zero along the path to
> 
>      32   if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG,
>                                             __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
> 
> at which point CSE decides that it would be cheaper to "re-use"
> the zero already in r116 instead of load another constant 0 here.
> After that, combine is ham-strung because r116 is not dead.

As you point out, this does indeed fix this problem as well:

> A short-term possibility is to have the CAS insns accept general_operand,
> so that the 0 gets merged.  
]


What do you think about this solution?  It has the advantage that
we still get the same xor code if we actually do need the ipm ...


Bye,
Ulrich
Richard Henderson - Aug. 7, 2012, 10:13 p.m.
On 08/07/2012 10:02 AM, Ulrich Weigand wrote:
> The following patch changes the builtin expander to pass a MEM oldval
> as-is to the back-end expander, so that the back-end can move the
> store to before the CC operation.  With that patch I'm also seeing
> all the IPMs disappear.
...
> What do you think about this solution?  It has the advantage that
> we still get the same xor code if we actually do need the ipm ...

I'm ok with that patch.


r~

Patch

diff -ur gcc/builtins.c gcc.new/builtins.c
--- gcc/builtins.c	2012-08-07 16:04:45.054348099 +0200
+++ gcc.new/builtins.c	2012-08-07 15:44:01.304349225 +0200
@@ -5376,6 +5376,7 @@ 
 
   expect = expand_normal (CALL_EXPR_ARG (exp, 1));
   expect = convert_memory_address (Pmode, expect);
+  expect = gen_rtx_MEM (mode, expect);
   desired = expand_expr_force_mode (CALL_EXPR_ARG (exp, 2), mode);
 
   weak = CALL_EXPR_ARG (exp, 3);
@@ -5383,14 +5384,15 @@ 
   if (host_integerp (weak, 0) && tree_low_cst (weak, 0) != 0)
     is_weak = true;
 
-  oldval = copy_to_reg (gen_rtx_MEM (mode, expect));
-
+  oldval = expect;
   if (!expand_atomic_compare_and_swap ((target == const0_rtx ? NULL : &target),
 				       &oldval, mem, oldval, desired,
 				       is_weak, success, failure))
     return NULL_RTX;
 
-  emit_move_insn (gen_rtx_MEM (mode, expect), oldval);
+  if (oldval != expect)
+    emit_move_insn (expect, oldval);
+
   return target;
 }
 
diff -ur gcc/config/s390/s390.md gcc.new/config/s390/s390.md
--- gcc/config/s390/s390.md	2012-08-07 16:04:54.204348621 +0200
+++ gcc.new/config/s390/s390.md	2012-08-07 16:00:21.934348628 +0200
@@ -8870,7 +8870,7 @@ 
 
 (define_expand "atomic_compare_and_swap<mode>"
   [(match_operand:SI 0 "register_operand")	;; bool success output
-   (match_operand:DGPR 1 "register_operand")	;; oldval output
+   (match_operand:DGPR 1 "nonimmediate_operand");; oldval output
    (match_operand:DGPR 2 "memory_operand")	;; memory
    (match_operand:DGPR 3 "register_operand")	;; expected intput
    (match_operand:DGPR 4 "register_operand")	;; newval intput
@@ -8879,9 +8879,17 @@ 
    (match_operand:SI 7 "const_int_operand")]	;; failure model
   ""
 {
-  rtx cc, cmp;
+  rtx cc, cmp, output = operands[1];
+
+  if (!register_operand (output, <MODE>mode))
+    output = gen_reg_rtx (<MODE>mode);
+
   emit_insn (gen_atomic_compare_and_swap<mode>_internal
-	     (operands[1], operands[2], operands[3], operands[4]));
+	     (output, operands[2], operands[3], operands[4]));
+
+  if (output != operands[1])
+    emit_move_insn (operands[1], output);
+
   cc = gen_rtx_REG (CCZ1mode, CC_REGNUM);
   cmp = gen_rtx_EQ (SImode, cc, const0_rtx);
   emit_insn (gen_cstorecc4 (operands[0], cmp, cc, const0_rtx));