From patchwork Wed Dec 15 15:14:59 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ken Werner X-Patchwork-Id: 75652 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 213701007D1 for ; Thu, 16 Dec 2010 02:15:16 +1100 (EST) Received: (qmail 10272 invoked by alias); 15 Dec 2010 15:15:14 -0000 Received: (qmail 10226 invoked by uid 22791); 15 Dec 2010 15:15:11 -0000 X-SWARE-Spam-Status: No, hits=-1.5 required=5.0 tests=AWL, BAYES_00, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mtagate4.uk.ibm.com (HELO mtagate4.uk.ibm.com) (194.196.100.164) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 15 Dec 2010 15:15:05 +0000 Received: from d06nrmr1507.portsmouth.uk.ibm.com (d06nrmr1507.portsmouth.uk.ibm.com [9.149.38.233]) by mtagate4.uk.ibm.com (8.13.1/8.13.1) with ESMTP id oBFFF2hJ025169 for ; Wed, 15 Dec 2010 15:15:02 GMT Received: from d06av06.portsmouth.uk.ibm.com (d06av06.portsmouth.uk.ibm.com [9.149.37.217]) by d06nrmr1507.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id oBFFF3Rt3022950 for ; Wed, 15 Dec 2010 15:15:03 GMT Received: from d06av06.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id oBFFF1eq015513 for ; Wed, 15 Dec 2010 08:15:01 -0700 Received: from leonard.localnet (dyn-9-152-224-32.boeblingen.de.ibm.com [9.152.224.32]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id oBFFF0v2015503 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 15 Dec 2010 08:15:01 -0700 From: Ken Werner To: gcc-patches@gcc.gnu.org Subject: [patch][ARM] Optimize __sync_* builtins Date: Wed, 15 Dec 2010 16:14:59 +0100 User-Agent: KMail/1.13.5 (Linux/2.6.35-23-generic-pae; KDE/4.5.1; i686; ; ) MIME-Version: 1.0 Message-Id: <201012151615.00053.ken@linux.vnet.ibm.com> X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, The emitted code for the __sync_* builtins on ARM can be optimized as suggested by Peter Maydell: http://lists.linaro.org/pipermail/linaro-toolchain/2010-November/000498.html The idea is to eliminate the need for an additional temporary register in case the operation is reversible. This patch adds two new code attributes to the sync.md. The sync_clobber indicates whether an additional register is going to be used/clobbered or not and the sync_t2_reqd sets the sync_t2 attribute to its default depending on the value of the syncop code iterator. The arm.c:arm_output_sync_loop() function has been enhanced to place the return of the strex into old_value instead of t2. The contents of old_value are restored by reversing the operation. This patch has been tested on arm-linux-gnueabi with no regressions. Regards Ken 2010-12-15 Ken Werner * config/arm/sync.md (sync_clobber, sync_t2_reqd): New code attribute. (arm_sync_old_si, arm_sync_old_): Use the sync_clobber and sync_t2_reqd code attributes. * config/arm/arm.c (arm_output_sync_loop): Reverse the operation if the t2 argument is NULL. Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c (revision 167812) +++ gcc/config/arm/arm.c (working copy) @@ -23220,11 +23220,47 @@ arm_output_sync_loop (emit_f emit, break; } - arm_output_strex (emit, mode, "", t2, t1, memory); - operands[0] = t2; - arm_output_asm_insn (emit, 0, operands, "teq\t%%0, #0"); - arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYT%%=", LOCAL_LABEL_PREFIX); + if (t2) + { + arm_output_strex (emit, mode, "", t2, t1, memory); + operands[0] = t2; + arm_output_asm_insn (emit, 0, operands, "teq\t%%0, #0"); + arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYT%%=", + LOCAL_LABEL_PREFIX); + } + else + { + /* Use old_value for the return value because for some operations + the old_value can easily be restored. This saves one register. */ + arm_output_strex (emit, mode, "", old_value, t1, memory); + operands[0] = old_value; + arm_output_asm_insn (emit, 0, operands, "teq\t%%0, #0"); + arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYT%%=", + LOCAL_LABEL_PREFIX); + switch (sync_op) + { + case SYNC_OP_ADD: + arm_output_op3 (emit, "sub", old_value, t1, new_value); + break; + + case SYNC_OP_SUB: + arm_output_op3 (emit, "add", old_value, t1, new_value); + break; + + case SYNC_OP_XOR: + arm_output_op3 (emit, "eor", old_value, t1, new_value); + break; + + case SYNC_OP_NONE: + arm_output_op2 (emit, "mov", old_value, required_value); + break; + + default: + gcc_unreachable (); + } + } + arm_process_output_memory_barrier (emit, NULL); arm_output_asm_insn (emit, 1, operands, "%sLSYB%%=:", LOCAL_LABEL_PREFIX); } Index: gcc/config/arm/sync.md =================================================================== --- gcc/config/arm/sync.md (revision 167812) +++ gcc/config/arm/sync.md (working copy) @@ -103,6 +103,18 @@ (plus "add") (minus "sub")]) +(define_code_attr sync_clobber [(ior "=&r") + (and "=&r") + (xor "X") + (plus "X") + (minus "X")]) + +(define_code_attr sync_t2_reqd [(ior "4") + (and "4") + (xor "*") + (plus "*") + (minus "*")]) + (define_expand "sync_si" [(match_operand:SI 0 "memory_operand") (match_operand:SI 1 "s_register_operand") @@ -286,7 +298,6 @@ VUNSPEC_SYNC_COMPARE_AND_SWAP)) (set (match_dup 1) (unspec_volatile:SI [(match_dup 2)] VUNSPEC_SYNC_COMPARE_AND_SWAP)) - (clobber:SI (match_scratch:SI 4 "=&r")) (set (reg:CC CC_REGNUM) (unspec_volatile:CC [(match_dup 1)] VUNSPEC_SYNC_COMPARE_AND_SWAP)) ] @@ -299,7 +310,6 @@ (set_attr "sync_required_value" "2") (set_attr "sync_new_value" "3") (set_attr "sync_t1" "0") - (set_attr "sync_t2" "4") (set_attr "conds" "clob") (set_attr "predicable" "no")]) @@ -313,7 +323,6 @@ VUNSPEC_SYNC_COMPARE_AND_SWAP))) (set (match_dup 1) (unspec_volatile:NARROW [(match_dup 2)] VUNSPEC_SYNC_COMPARE_AND_SWAP)) - (clobber:SI (match_scratch:SI 4 "=&r")) (set (reg:CC CC_REGNUM) (unspec_volatile:CC [(match_dup 1)] VUNSPEC_SYNC_COMPARE_AND_SWAP)) ] @@ -326,7 +335,6 @@ (set_attr "sync_required_value" "2") (set_attr "sync_new_value" "3") (set_attr "sync_t1" "0") - (set_attr "sync_t2" "4") (set_attr "conds" "clob") (set_attr "predicable" "no")]) @@ -487,7 +495,7 @@ VUNSPEC_SYNC_OLD_OP)) (clobber (reg:CC CC_REGNUM)) (clobber (match_scratch:SI 3 "=&r")) - (clobber (match_scratch:SI 4 "=&r"))] + (clobber (match_scratch:SI 4 ""))] "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER" { return arm_output_sync_insn (insn, operands); @@ -496,7 +504,7 @@ (set_attr "sync_memory" "1") (set_attr "sync_new_value" "2") (set_attr "sync_t1" "3") - (set_attr "sync_t2" "4") + (set_attr "sync_t2" "") (set_attr "sync_op" "") (set_attr "conds" "clob") (set_attr "predicable" "no")]) @@ -540,7 +548,7 @@ VUNSPEC_SYNC_OLD_OP)) (clobber (reg:CC CC_REGNUM)) (clobber (match_scratch:SI 3 "=&r")) - (clobber (match_scratch:SI 4 "=&r"))] + (clobber (match_scratch:SI 4 ""))] "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER" { return arm_output_sync_insn (insn, operands); @@ -549,7 +557,7 @@ (set_attr "sync_memory" "1") (set_attr "sync_new_value" "2") (set_attr "sync_t1" "3") - (set_attr "sync_t2" "4") + (set_attr "sync_t2" "") (set_attr "sync_op" "") (set_attr "conds" "clob") (set_attr "predicable" "no")])