From patchwork Tue Oct 26 21:33:11 2010
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: =?utf-8?b?RG91ZyBLd2FuICjpl5zmjK/lvrcp?=
 <dougkwan@google.com>
X-Patchwork-Id: 69293
Return-Path: 
 <gcc-patches-return-276469-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id 962AAB70CC
	for <incoming@patchwork.ozlabs.org>;
	Wed, 27 Oct 2010 08:33:53 +1100 (EST)
Received: (qmail 12499 invoked by alias); 26 Oct 2010 21:33:36 -0000
Received: (qmail 12458 invoked by uid 22791); 26 Oct 2010 21:33:24 -0000
X-SWARE-Spam-Status: No, hits=3.5 required=5.0	tests=AWL, BAYES_00,
	CHARSET_FARAWAY_HEADER, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU,
	MIME_CHARSET_FARAWAY, SPF_HELO_PASS, TW_CL, TW_LN, TW_LZ,
	TW_VS, TW_XF, TW_XS, TW_XX, TW_YY, T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from smtp-out.google.com (HELO smtp-out.google.com)
	(74.125.121.35) by sourceware.org (qpsmtpd/0.43rc1) with
	ESMTP; Tue, 26 Oct 2010 21:33:18 +0000
Received: from kpbe18.cbf.corp.google.com (kpbe18.cbf.corp.google.com
	[172.25.105.82])	by smtp-out.google.com with ESMTP id
	o9QLXE52021530	for <gcc-patches@gcc.gnu.org>;
	Tue, 26 Oct 2010 14:33:15 -0700
Received: from pwj7 (pwj7.prod.google.com [10.241.219.71])	by
	kpbe18.cbf.corp.google.com with ESMTP id o9QLXCc1030788	for
	<gcc-patches@gcc.gnu.org>; Tue, 26 Oct 2010 14:33:13 -0700
Received: by pwj7 with SMTP id 7so1211286pwj.10 for
	<gcc-patches@gcc.gnu.org>; Tue, 26 Oct 2010 14:33:12 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.142.125.18 with SMTP id x18mr6776499wfc.287.1288128791594;
	Tue, 26 Oct 2010 14:33:11 -0700 (PDT)
Received: by 10.142.238.12 with HTTP; Tue, 26 Oct 2010 14:33:11 -0700 (PDT)
In-Reply-To: <201010260036.04736.paul@codesourcery.com>
References: <AANLkTi=A-ZuBARrxXJaJC4hz=J-oL4_5zYjbJ1w2d_nR@mail.gmail.com>
	<201010221920.37094.paul@codesourcery.com>
	<AANLkTi=BQk4TdhxW6vUcGzk_YkjMZE8iEZM02cZ7GJSt@mail.gmail.com>
	<201010260036.04736.paul@codesourcery.com>
Date: Tue, 26 Oct 2010 14:33:11 -0700
Message-ID: <AANLkTim0DD79EqTbGEWYVQCZaM+j4Vwhp66oWkW_nbJw@mail.gmail.com>
Subject: Re: [PATCH][ARM] Optimized 64-bit multiplication for THUMB-1
From: =?Big5?B?RG91ZyBLd2FuICjD9q62vHcp?= <dougkwan@google.com>
To: Paul Brook <paul@codesourcery.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>, Nick Clifton <nickc@redhat.com>,
	Richard Earnshaw <rearnsha@arm.com>
X-System-Of-Record: true
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

Hi,

 I looked at the definition of the ARM_FUNC_START macro, the cases in
which the macro does not force use of ARM mode are:

- __thumb2__ is defined, the macro is defined but no .arm used.
- __ARM_ARCH_6M__ is defined, the macro is not defined.

  In both of the cases above, the code protected by the test is not
assembled, so there is no problem observed.  I can add an .arm to be
explicit like the attached patch.  Would that be better?

-Doug


在 2010年10月25日下午4:36，Paul Brook <paul@codesourcery.com> 寫道：
>> Hi Paul,
>>
>>    Thank you very much for your review and comments.  I have fixed the
>> push/pop and use of 2-argument code in 32-bit code.  I am not quite
>> sure what the problem in the __thumb2__ test is.  I built arm-eabi-gcc
>> with arches armv4, armv5te, armv7-a and no-arch and all build was
>> successful.  I did change the test so that forcing ARM mode is only
>> done if:
>
> No. You're missing the point. ARM_FUNC_START does not force the use of ARM
> mode.  See comments near the definition of that macro.
>
> Paul
>

Index: gcc/gcc/config/arm/lib1funcs.asm
===================================================================
--- gcc/gcc/config/arm/lib1funcs.asm	(revision 165462)
+++ gcc/gcc/config/arm/lib1funcs.asm	(working copy)
@@ -1274,6 +1274,91 @@ LSYM(Lover12):
 #endif
 	
 #endif /* L_dvmd_lnx */
+
+#ifdef L_muldi3
+
+/* ------------------------------------------------------------------------ */
+/* Dword multiplication operation.
+
+   The THUMB ISA lacks an instruction to compute the higher half of the
+   64-bit result from a 32-bit by 32-bit multiplication.  This makes 64-bit
+   multiplication difficult to implement efficiently.  The ARM ISAs after V3M
+   have UMULL and MLA which can be used to implement 64-bit muliplication
+   efficiently.  On a target that support both ARM V3M+ and THUMB ISA's (but
+   not THUMB2), we want to use the ARM version of _muldi3 in the THUMB libgcc.
+
+   We do not need to use the ARM version for THUMB2 targets as the THUMB2
+   targets also support MLA and UMULL. */
+
+/* We cannot use the faster version for following situations:
+
+   -ARM architetures older than V3M lack the UMULL instruction.
+   -Target is ARMV6M, which does not run ARM code.  */
+
+#undef USE_FAST_MULDI3
+#if (__ARM_ARCH__ > 3 || defined(__ARM_ARCH_3M__)) && !defined(__ARM_ARCH_6M__)
+#define USE_FAST_MULDI3
+#endif
+
+/* Force using ARM code if:
+   1. ARM mode has UMULL (i.e. USE_FAST_MULDI3 is defined) and
+   2. This is THUMB-1 mode and
+   3. INTERWORKING is enabled.  */
+
+#if defined(USE_FAST_MULDI3) \
+    && (defined(__thumb__) && !defined(__thumb2__)) \
+    && defined(__THUMB_INTERWORK__)
+	ARM_FUNC_START muldi3
+	ARM_FUNC_ALIAS aeabi_lmul muldi3
+	.arm
+#else
+	FUNC_START muldi3
+	FUNC_ALIAS aeabi_lmul muldi3
+#endif
+
+#if defined(USE_FAST_MULDI3)
+	/* Fast version for ARM with umull and THUMB2.  */
+	mul	xxh, xxh, yyl
+	mla	yyh, xxl, yyh, xxh
+	umull	xxl, xxh, yyl, xxl
+	add	xxh, xxh, yyh
+	RET
+#else
+	/* Slow version for both THUMB and older ARMs lacking umull. */
+	mul	xxh, yyl		/* xxh := AH*BL */
+	do_push	{r4, r5, r6, r7}
+	mul	yyh, xxl		/* yyh := AL*BH */
+	ldr	r4, .L_mask
+	lsr	r5, xxl, #16		/* r5 := (AL>>16) */
+	lsr	r6, yyl, #16		/* r6 := (BL>>16) */
+	lsr	r7, xxl, #16		/* r7 := (AL>>16) */
+	mul	r5, r6			/* r5 = (AL>>16) * (BL>>16) */
+	and	xxl, r4			/* xxl = AL & 0xffff */
+	and	yyl, r4			/* yyl = BL & 0xffff */
+	add	xxh, yyh		/* xxh = AH*BL+AL*BH */
+	mul	r6, xxl			/* r6 = (AL&0xffff) * (BL>>16) */
+	mul	r7, yyl			/* r7 = (AL>>16) * (BL&0xffff) */
+	add	xxh, r5
+	mul	xxl, yyl		/* xxl = (AL&0xffff) * (BL&0xffff) */
+	mov	r4, #0	
+	adds	r6, r7			/* partial sum to result[47:16]. */
+	adc	r4, r4			/* carry to result[48]. */
+	lsr	yyh, r6, #16
+	lsl	r4, r4, #16
+	lsl	yyl, r6, #16
+	add	xxh, r4
+	adds	xxl, yyl
+	adc	xxh, yyh
+	do_pop	{r4, r5, r6, r7}
+	RET
+	.align	2
+.L_mask:
+	.word	65535
+#endif
+
+	FUNC_END muldi3
+#endif
+
 #ifdef L_clear_cache
 #if defined __ARM_EABI__ && defined __linux__
 @ EABI GNU/Linux call to cacheflush syscall.
Index: gcc/gcc/config/arm/t-strongarm-elf
===================================================================
--- gcc/gcc/config/arm/t-strongarm-elf	(revision 165462)
+++ gcc/gcc/config/arm/t-strongarm-elf	(working copy)
@@ -16,7 +16,8 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func _clzsi2 _clzdi2
+LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func \
+	_clzsi2 _clzdi2 _muldi3
 
 # We want fine grained libraries, so use the new code to build the
 # floating point emulation libraries.
Index: gcc/gcc/config/arm/t-vxworks
===================================================================
--- gcc/gcc/config/arm/t-vxworks	(revision 165462)
+++ gcc/gcc/config/arm/t-vxworks	(working copy)
@@ -16,7 +16,8 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func _call_via_rX _interwork_call_via_rX _clzsi2 _clzdi2
+LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func \
+	_call_via_rX _interwork_call_via_rX _clzsi2 _clzdi2 _muldi3
 
 # We want fine grained libraries, so use the new code to build the
 # floating point emulation libraries.
Index: gcc/gcc/config/arm/t-pe
===================================================================
--- gcc/gcc/config/arm/t-pe	(revision 165462)
+++ gcc/gcc/config/arm/t-pe	(working copy)
@@ -17,7 +17,7 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _call_via_rX _interwork_call_via_rX _clzsi2 _clzdi2
+LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _call_via_rX _interwork_call_via_rX _clzsi2 _clzdi2 _muldi3
 
 # We want fine grained libraries, so use the new code to build the
 # floating point emulation libraries.
Index: gcc/gcc/config/arm/t-arm-elf
===================================================================
--- gcc/gcc/config/arm/t-arm-elf	(revision 165462)
+++ gcc/gcc/config/arm/t-arm-elf	(working copy)
@@ -29,7 +29,7 @@ LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3
 	_arm_truncdfsf2 _arm_negsf2 _arm_addsubsf3 _arm_muldivsf3 \
 	_arm_cmpsf2 _arm_unordsf2 _arm_fixsfsi _arm_fixunssfsi \
 	_arm_floatdidf _arm_floatdisf _arm_floatundidf _arm_floatundisf \
-	_clzsi2 _clzdi2 
+	_clzsi2 _clzdi2 _muldi3
 
 MULTILIB_OPTIONS     = marm/mthumb
 MULTILIB_DIRNAMES    = arm thumb
Index: gcc/gcc/config/arm/t-linux
===================================================================
--- gcc/gcc/config/arm/t-linux	(revision 165462)
+++ gcc/gcc/config/arm/t-linux	(working copy)
@@ -23,7 +23,7 @@ TARGET_LIBGCC2_CFLAGS = -fomit-frame-pointer -fPIC
 
 LIB1ASMSRC = arm/lib1funcs.asm
 LIB1ASMFUNCS = _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_lnx _clzsi2 _clzdi2 \
-	_arm_addsubdf3 _arm_addsubsf3
+	_arm_addsubdf3 _arm_addsubsf3 _muldi3
 
 # MULTILIB_OPTIONS = mhard-float/msoft-float
 # MULTILIB_DIRNAMES = hard-float soft-float
Index: gcc/gcc/config/arm/t-symbian
===================================================================
--- gcc/gcc/config/arm/t-symbian	(revision 165462)
+++ gcc/gcc/config/arm/t-symbian	(working copy)
@@ -16,7 +16,8 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-LIB1ASMFUNCS += _bb_init_func _call_via_rX _interwork_call_via_rX _clzsi2 _clzdi2
+LIB1ASMFUNCS += _bb_init_func _call_via_rX _interwork_call_via_rX _clzsi2 \
+	_clzdi2 _muldi3
 
 # These functions have __aeabi equivalents and will never be called by GCC.  
 # By putting them in LIB1ASMFUNCS, we avoid the standard libgcc2.c code being
Index: gcc/gcc/config/arm/t-wince-pe
===================================================================
--- gcc/gcc/config/arm/t-wince-pe	(revision 165462)
+++ gcc/gcc/config/arm/t-wince-pe	(working copy)
@@ -16,7 +16,8 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _call_via_rX _interwork_call_via_rX _clzsi2 _clzdi2
+LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _call_via_rX \
+	_interwork_call_via_rX _clzsi2 _clzdi2 _muldi3
 
 # We want fine grained libraries, so use the new code to build the
 # floating point emulation libraries.