From patchwork Thu Jul 30 21:10:16 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
X-Patchwork-Id: 502306
Return-Path: 
 <gcc-patches-return-404342-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id AD6A91402ED
	for <incoming@patchwork.ozlabs.org>;
	Fri, 31 Jul 2015 07:10:34 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=SA9ZKLDF; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:in-reply-to:references:date:message-id:subject
	:from:to:cc:content-type; q=dns; s=default; b=Toy7u50d+HKh+BuXv6
	EUOTVTWa3GbGlS2alL9VzXg4aat/Or5BwUIGtRq6+JRSE+Xrx+eO8z6bMihQGeLl
	YPgl6FDwXjZyg2iRyMfFelB2+flp9nqLyc+dht/36zzOJrMzjDv36Vz2KVtVdIz5
	NaskOX286bNHzbBTO/q7E+0s4=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:in-reply-to:references:date:message-id:subject
	:from:to:cc:content-type; s=default; bh=kN29ZVCRWObpzhkOttljtvFi
	g5Y=; b=SA9ZKLDF2DBtGmcuEuRBulFm3IySuw1s4y3xIPOWcIMRRvaewYB8VavK
	YH5RJEksM58YOaUIKbWesYL2D6zpfv+ZaRJHFvr05GoNuzOnnrVEYRqhKi1V5AFR
	Tx27Df97qLEpZIFevchbXT5h3LNU3Ptr2mK7iditL7sJgiSQ3/8=
Received: (qmail 52725 invoked by alias); 30 Jul 2015 21:10:24 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 52652 invoked by uid 89); 30 Jul 2015 21:10:20 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=2.6 required=5.0 tests=AWL, BAYES_00,
	RCVD_IN_DNSWL_LOW, SPF_PASS,
	ZIP_ATTACHED autolearn=no version=3.3.2
X-HELO: mail-yk0-f180.google.com
Received: from mail-yk0-f180.google.com (HELO mail-yk0-f180.google.com)
	(209.85.160.180) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256
	encrypted) ESMTPS; Thu, 30 Jul 2015 21:10:18 +0000
Received: by ykax123 with SMTP id x123so44479699yka.1 for
	<gcc-patches@gcc.gnu.org>; Thu, 30 Jul 2015 14:10:16 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;
	s=20130820;
	h=x-gm-message-state:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=DakmHbbnpJd1sZZQxLAg1s8Vpoiq8svbjanZJ1JR6f0=;
	b=UZmUJgOalF/TZFh/nUXP5locEeO0mmWTdyi63GFnp9YS16NOSSEP3BYFyeF3toYEZv
	JkUKVg4G2doxZOG6z2Ut9zKUNcopu3/RtjqaMehTtPCmgJCMfmipDgo84OLJlnKuo+uY
	ZWJSwk8ZQr4GLQwr0vI/5SrIay8DCdGyRZKJwELmCyJ2eOUULCxlvv/UoSqRFWzE12P6
	P6xpRrH8Pg8FuqYWSNLWmL07o7GhWsy2lLCqfhlpc60FV7Pm1nPJVXCmg0QPluecmtOg
	WCRtR/HCC7JkvWXuiaPMMfe6CqemYkeLipjkNrJ/f+LBlfDz8MVv+0OUEWsDyhdR0ObX
	bBIw==
X-Gm-Message-State: 
 ALoCoQnbuVlyGU5Q7GasXV9GCL8yzL1IXj9q6F/Gk/vuvyPCbmoOtqr1o3caRF1j/2Jv+7tEaa5Z
MIME-Version: 1.0
X-Received: by 10.13.248.68 with SMTP id i65mr51513048ywf.151.1438290616749;
	Thu, 30 Jul 2015 14:10:16 -0700 (PDT)
Received: by 10.37.88.137 with HTTP; Thu, 30 Jul 2015 14:10:16 -0700 (PDT)
In-Reply-To: <55B8AC02.9050301@arm.com>
References: 
 <CAAgBjMk0Hdask2JU8xs4fj_Ai1e0ggxB+h3ayb=NOGQBYJ8ccQ@mail.gmail.com>
	<55B8AC02.9050301@arm.com>
Date: Fri, 31 Jul 2015 02:40:16 +0530
Message-ID: 
 <CAAgBjMkmijb5QdWnRk5RYUFQXXb8HiO4rER+GxOK_DAzhK=rDg@mail.gmail.com>
Subject: Re: [ARM] implement division using vrecpe/vrecps with
	-funsafe-math-optimizations
From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
To: Kyrill Tkachov <kyrylo.tkachov@arm.com>
Cc: gcc Patches <gcc-patches@gcc.gnu.org>,
	Charles Baylis <charles.baylis@linaro.org>
X-IsSubscribed: yes

On 29 July 2015 at 16:03, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
> Hi Prathamesh,
>
> This is probably not appropriate for -Os optimisation.
> And for speed optimisation I imagine it can vary a lot on the target the
> code is run.
> Do you have any benchmark results for this patch?
Hi Kyrill,
Thanks for the review. I have attempted to address your comments in
the attached patch.
Does it look OK from correctness perspective ?
Unfortunately I haven't done benchmarking yet.
I ran a test-case (attached) prepared by Charles for target
arm-linux-gnueabihf (on APM Mustang),
and it appeared to run faster with the patch:
Options passed: -O3 -mfpu=neon -mfloat-abi=hard -funsafe-math-optimizations

Before:
t8a, len =       32, took   2593977 ticks
t8a, len =      128, took   2408907 ticks
t8a, len =     1024, took   2354950 ticks
t8a, len =    65536, took   2365041 ticks
t8a, len =  1048576, took   2692928 ticks

After:
t8a, len =       32, took   2027323 ticks
t8a, len =      128, took   1920595 ticks
t8a, len =     1024, took   1827250 ticks
t8a, len =    65536, took   1797924 ticks
t8a, len =  1048576, took   2026274 ticks

I will get back to you soon with benchmarking results.

Thanks,
Prathamesh
>
> Thanks,
> Kyrill
>
>
> On 29/07/15 11:09, Prathamesh Kulkarni wrote:
>>
>> Hi,
>> This patch tries to implement division with multiplication by
>> reciprocal using vrecpe/vrecps
>> with -funsafe-math-optimizations and -freciprocal-math enabled.
>> Tested on arm-none-linux-gnueabihf using qemu.
>> OK for trunk ?
>>
>> Thank you,
>> Prathamesh
>
> +    /* Perform 2 iterations of Newton-Raphson method for better accuracy */
> +    for (int i = 0; i < 2; i++)
> +      {
> +    emit_insn (gen_neon_vrecps<mode> (vrecps_temp, rec, operands[2]));
> +    emit_insn (gen_mul<mode>3 (rec, rec, vrecps_temp));
> +      }
> +
> +    /* We now have reciprocal in rec, perform operands[0] = operands[1] *
> rec */
> +    emit_insn (gen_mul<mode>3 (operands[0], operands[1], rec));
> +    DONE;
> +  }
> +)
> +
>
> Full stop and two spaces at the end of the comments.
>
2015-07-28  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
	    Charles Baylis  <charles.baylis@linaro.org>

	* config/arm/neon.md (div<mode>3): New pattern.

testsuite/
	* gcc.target/arm/vect-div-1.c: New test-case.
	* gcc.target/arm/vect-div-2.c: Likewise.

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 654d9d5..f2dbcc4 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -548,6 +548,33 @@
                     (const_string "neon_mul_<V_elem_ch><q>")))]
 )
 
+(define_expand "div<mode>3"
+  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+        (div:VCVTF (match_operand:VCVTF 1 "s_register_operand" "w")
+		  (match_operand:VCVTF 2 "s_register_operand" "w")))]
+  "TARGET_NEON && !optimize_size
+   && flag_unsafe_math_optimizations && flag_reciprocal_math"
+  {
+    rtx rec = gen_reg_rtx (<MODE>mode);
+    rtx vrecps_temp = gen_reg_rtx (<MODE>mode);
+
+    /* Reciprocal estimate.  */
+    emit_insn (gen_neon_vrecpe<mode> (rec, operands[2]));
+
+    /* Perform 2 iterations of newton-raphson method.  */
+    for (int i = 0; i < 2; i++)
+      {
+	emit_insn (gen_neon_vrecps<mode> (vrecps_temp, rec, operands[2]));
+	emit_insn (gen_mul<mode>3 (rec, rec, vrecps_temp));
+      }
+
+    /* We now have reciprocal in rec, perform operands[0] = operands[1] * rec.  */
+    emit_insn (gen_mul<mode>3 (operands[0], operands[1], rec));
+    DONE;
+  }
+)
+
+
 (define_insn "mul<mode>3add<mode>_neon"
   [(set (match_operand:VDQW 0 "s_register_operand" "=w")
         (plus:VDQW (mult:VDQW (match_operand:VDQW 2 "s_register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/arm/vect-div-1.c b/gcc/testsuite/gcc.target/arm/vect-div-1.c
new file mode 100644
index 0000000..e562ef3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-div-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-options "-O2 -funsafe-math-optimizations -ftree-vectorize -fdump-tree-vect-all" } */
+/* { dg-add-options arm_v8_neon } */
+
+void
+foo (int len, float * __restrict p, float *__restrict x)
+{
+  len = len & ~31;
+  for (int i = 0; i < len; i++)
+    p[i] = p[i] / x[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/arm/vect-div-2.c b/gcc/testsuite/gcc.target/arm/vect-div-2.c
new file mode 100644
index 0000000..8e15d0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-div-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-options "-O2 -funsafe-math-optimizations -fno-reciprocal-math -ftree-vectorize -fdump-tree-vect-all" } */
+/* { dg-add-options arm_v8_neon } */
+
+void
+foo (int len, float * __restrict p, float *__restrict x)
+{
+  len = len & ~31;
+  for (int i = 0; i < len; i++)
+    p[i] = p[i] / x[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */