From patchwork Thu Oct 20 15:17:29 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Uros Bizjak <ubizjak@gmail.com>
X-Patchwork-Id: 120828
Return-Path: 
 <gcc-patches-return-305082-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id B3F48B70B2
	for <incoming@patchwork.ozlabs.org>;
	Fri, 21 Oct 2011 02:17:50 +1100 (EST)
Received: (qmail 26691 invoked by alias); 20 Oct 2011 15:17:48 -0000
Received: (qmail 26672 invoked by uid 22791); 20 Oct 2011 15:17:47 -0000
X-SWARE-Spam-Status: No, hits=-2.3 required=5.0	tests=AWL, BAYES_00,
	DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM,
	RCVD_IN_DNSWL_LOW, TW_ZJ
X-Spam-Check-By: sourceware.org
Received: from mail-yw0-f47.google.com (HELO mail-yw0-f47.google.com)
	(209.85.213.47) by sourceware.org (qpsmtpd/0.43rc1) with
	ESMTP; Thu, 20 Oct 2011 15:17:29 +0000
Received: by ywf9 with SMTP id 9so528285ywf.20 for <gcc-patches@gcc.gnu.org>;
	Thu, 20 Oct 2011 08:17:29 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.236.131.37 with SMTP id l25mr16184990yhi.76.1319123849157;
	Thu, 20 Oct 2011 08:17:29 -0700 (PDT)
Received: by 10.146.82.5 with HTTP; Thu, 20 Oct 2011 08:17:29 -0700 (PDT)
In-Reply-To: <Pine.LNX.4.64.1110201444250.1642@digraph.polyomino.org.uk>
References: 
 <CAFULd4Zd0=NwVWZwOUvsD9AWWsGjEzjXsRezTL-Pe-_MDvM46w@mail.gmail.com>
	<Pine.LNX.4.64.1110201444250.1642@digraph.polyomino.org.uk>
Date: Thu, 20 Oct 2011 17:17:29 +0200
Message-ID: 
 <CAFULd4Y=8G+AZ0Xdmh9Voa=JNc=F+qGP=UoF2y4GhMEXxWhe=A@mail.gmail.com>
Subject: Re: [PATCH,
	i386]: Use reciprocal sequences for vectorized SFmode division and
	sqrtf(x) for -ffast-math
From: Uros Bizjak <ubizjak@gmail.com>
To: "Joseph S. Myers" <joseph@codesourcery.com>
Cc: gcc-patches@gcc.gnu.org, Michael Matz <matz@suse.de>
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

On Thu, Oct 20, 2011 at 4:45 PM, Joseph S. Myers
<joseph@codesourcery.com> wrote:

>> The patch was tested on x86_64-pc-linux-gnu, but I would like Joseph
>> to check if I didn't mess something with options handling.
>
> I have no comments on the option handling in this patch.
>
>> +for vectorized single float division and vectorized sqrtf(x) already with
>
> @code{sqrtf (@var{x})}

Thanks - fixed, with a similar fix in the previous paragraph.

I also found a PR that deals with vectorized reciprocal, so I referred
to the PR in the ChangeLog entry:

2011-10-20  Uros Bizjak  <ubizjak@gmail.com>

	PR target/47989
	* config/i386/i386.h (RECIP_MASK_DEFAULT): New define.
	* config/i386/i386.op (recip_mask): Initialize with RECIP_MASK_DEFAULT.
	* doc/invoke.texi (ix86 Options, -mrecip): Document that GCC
	implements vectorized single float division and vectorized sqrtf(x)
	with reciprocal sequence with additional Newton-Raphson step with
	-ffast-math.

Attached is the patch that was committed to mainline SVN. Encouraged
by Michael's results, let's see what automated benchmark testers will
show.

Uros.

Index: config/i386/i386.h
===================================================================
--- config/i386/i386.h	(revision 180255)
+++ config/i386/i386.h	(working copy)
@@ -2322,6 +2322,7 @@
 #define RECIP_MASK_VEC_SQRT	0x08
 #define RECIP_MASK_ALL	(RECIP_MASK_DIV | RECIP_MASK_SQRT \
 			 | RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT)
+#define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT)
 
 #define TARGET_RECIP_DIV	((recip_mask & RECIP_MASK_DIV) != 0)
 #define TARGET_RECIP_SQRT	((recip_mask & RECIP_MASK_SQRT) != 0)
Index: config/i386/i386.opt
===================================================================
--- config/i386/i386.opt	(revision 180255)
+++ config/i386/i386.opt	(working copy)
@@ -32,7 +32,7 @@
 HOST_WIDE_INT ix86_isa_flags_explicit
 
 TargetVariable
-int recip_mask
+int recip_mask = RECIP_MASK_DEFAULT
 
 Variable
 int recip_mask_explicit
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 180255)
+++ doc/invoke.texi	(working copy)
@@ -12922,7 +12922,12 @@
 of the non-reciprocal instruction, the precision of the sequence can be
 decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).
 
-Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS)
+Note that GCC implements @code{1.0f/sqrtf(@var{x})} in terms of RSQRTSS
+(or RSQRTPS) already with @option{-ffast-math} (or the above option
+combination), and doesn't need @option{-mrecip}.
+
+Also note that GCC emits the above sequence with additional Newton-Raphson step
+for vectorized single float division and vectorized @code{sqrtf(@var{x})}
 already with @option{-ffast-math} (or the above option combination), and
 doesn't need @option{-mrecip}.