diff mbox

Optimize __sync_fetch_and_add (x, -N) == N and __sync_add_and_fetch (x, N) == 0 (PR target/48986)

Message ID 20110517070248.GX17079@tyan-ft48-01.lab.bos.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek May 17, 2011, 7:02 a.m. UTC
Hi!

This patch optimizes using peephole2 __sync_fetch_and_add (x, -N) == N
and __sync_add_and_fetch (x, N) == 0 by just doing lock {add,sub,inc,dec}
and testing flags, instead of lock xadd plus comparison.
The sync_old_add<mode> predicate change makes it possible to optimize
__sync_add_and_fetch with constant second argument to same
code as __sync_fetch_and_add.  Doing it in peephole2 has a disadvantage
though, both that the 3 instructions need to be consecutive and
e.g. that xadd insn has to be supported by the CPU.  Other alternative
would be to come up with a new bool builtin that would represent the
whole __sync_fetch_and_add (x, -N) == N operation (perhaps with dot or space
in its name to make it inaccessible), try to match it during some folding
and expand it using special optab.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
this way?

2011-05-16  Jakub Jelinek  <jakub@redhat.com>

	PR target/48986
	* config/i386/sync.md (sync_old_add<mode>): Relax operand 2
	predicate to allow CONST_INT.
	(*sync_old_add_cmp<mode>): New insn and peephole2 for it.


	Jakub

Comments

Uros Bizjak May 17, 2011, 7:28 a.m. UTC | #1
On Tue, May 17, 2011 at 9:02 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> This patch optimizes using peephole2 __sync_fetch_and_add (x, -N) == N
> and __sync_add_and_fetch (x, N) == 0 by just doing lock {add,sub,inc,dec}
> and testing flags, instead of lock xadd plus comparison.
> The sync_old_add<mode> predicate change makes it possible to optimize
> __sync_add_and_fetch with constant second argument to same
> code as __sync_fetch_and_add.  Doing it in peephole2 has a disadvantage
> though, both that the 3 instructions need to be consecutive and
> e.g. that xadd insn has to be supported by the CPU.  Other alternative
> would be to come up with a new bool builtin that would represent the
> whole __sync_fetch_and_add (x, -N) == N operation (perhaps with dot or space
> in its name to make it inaccessible), try to match it during some folding
> and expand it using special optab.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
> this way?
>
> 2011-05-16  Jakub Jelinek  <jakub@redhat.com>
>
>        PR target/48986
>        * config/i386/sync.md (sync_old_add<mode>): Relax operand 2
>        predicate to allow CONST_INT.
>        (*sync_old_add_cmp<mode>): New insn and peephole2 for it.

OK, but please add a comment explaining why we have matched constraint
with non-matched predicate. These operands are otherwise targets for
cleanups ;)

Also, a comment explaining the purpose of the added peephole would be nice.

IMO, the change to sync_old_add<mode> is also appropriate to release branches.

Thanks,
Uros.
diff mbox

Patch

--- gcc/config/i386/sync.md.jj	2010-05-21 11:46:29.000000000 +0200
+++ gcc/config/i386/sync.md	2011-05-16 14:42:08.000000000 +0200
@@ -170,11 +170,62 @@  (define_insn "sync_old_add<mode>"
 	  [(match_operand:SWI 1 "memory_operand" "+m")] UNSPECV_XCHG))
    (set (match_dup 1)
 	(plus:SWI (match_dup 1)
-		  (match_operand:SWI 2 "register_operand" "0")))
+		  (match_operand:SWI 2 "nonmemory_operand" "0")))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_XADD"
   "lock{%;} xadd{<imodesuffix>}\t{%0, %1|%1, %0}")
 
+(define_peephole2
+  [(set (match_operand:SWI 0 "register_operand" "")
+	(match_operand:SWI 2 "const_int_operand" ""))
+   (parallel [(set (match_dup 0)
+		   (unspec_volatile:SWI
+		     [(match_operand:SWI 1 "memory_operand" "")] UNSPECV_XCHG))
+	      (set (match_dup 1)
+		   (plus:SWI (match_dup 1)
+			     (match_dup 0)))
+	      (clobber (reg:CC FLAGS_REG))])
+   (set (reg:CCZ FLAGS_REG)
+	(compare:CCZ (match_dup 0)
+		     (match_operand:SWI 3 "const_int_operand" "")))]
+  "peep2_reg_dead_p (3, operands[0])
+   && (unsigned HOST_WIDE_INT) INTVAL (operands[2])
+      == -(unsigned HOST_WIDE_INT) INTVAL (operands[3])
+   && !reg_overlap_mentioned_p (operands[0], operands[1])"
+  [(parallel [(set (reg:CCZ FLAGS_REG)
+		   (compare:CCZ (unspec_volatile:SWI [(match_dup 1)]
+						     UNSPECV_XCHG)
+				(match_dup 3)))
+	      (set (match_dup 1)
+		   (plus:SWI (match_dup 1)
+			     (match_dup 2)))])])
+
+(define_insn "*sync_old_add_cmp<mode>"
+  [(set (reg:CCZ FLAGS_REG)
+	(compare:CCZ (unspec_volatile:SWI
+		       [(match_operand:SWI 0 "memory_operand" "+m")]
+		       UNSPECV_XCHG)
+		     (match_operand:SWI 2 "const_int_operand" "i")))
+   (set (match_dup 0)
+	(plus:SWI (match_dup 0)
+		  (match_operand:SWI 1 "const_int_operand" "i")))]
+  "(unsigned HOST_WIDE_INT) INTVAL (operands[1])
+   == -(unsigned HOST_WIDE_INT) INTVAL (operands[2])"
+{
+  if (TARGET_USE_INCDEC)
+    {
+      if (operands[1] == const1_rtx)
+	return "lock{%;} inc{<imodesuffix>}\t%0";
+      if (operands[1] == constm1_rtx)
+	return "lock{%;} dec{<imodesuffix>}\t%0";
+    }
+
+  if (x86_maybe_negate_const_int (&operands[1], <MODE>mode))
+    return "lock{%;} sub{<imodesuffix>}\t{%1, %0|%0, %1}";
+
+  return "lock{%;} add{<imodesuffix>}\t{%1, %0|%0, %1}";
+})
+
 ;; Recall that xchg implicitly sets LOCK#, so adding it again wastes space.
 (define_insn "sync_lock_test_and_set<mode>"
   [(set (match_operand:SWI 0 "register_operand" "=<r>")