From patchwork Thu Oct 20 15:17:29 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 120828 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id B3F48B70B2 for ; Fri, 21 Oct 2011 02:17:50 +1100 (EST) Received: (qmail 26691 invoked by alias); 20 Oct 2011 15:17:48 -0000 Received: (qmail 26672 invoked by uid 22791); 20 Oct 2011 15:17:47 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, TW_ZJ X-Spam-Check-By: sourceware.org Received: from mail-yw0-f47.google.com (HELO mail-yw0-f47.google.com) (209.85.213.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 20 Oct 2011 15:17:29 +0000 Received: by ywf9 with SMTP id 9so528285ywf.20 for ; Thu, 20 Oct 2011 08:17:29 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.131.37 with SMTP id l25mr16184990yhi.76.1319123849157; Thu, 20 Oct 2011 08:17:29 -0700 (PDT) Received: by 10.146.82.5 with HTTP; Thu, 20 Oct 2011 08:17:29 -0700 (PDT) In-Reply-To: References: Date: Thu, 20 Oct 2011 17:17:29 +0200 Message-ID: Subject: Re: [PATCH, i386]: Use reciprocal sequences for vectorized SFmode division and sqrtf(x) for -ffast-math From: Uros Bizjak To: "Joseph S. Myers" Cc: gcc-patches@gcc.gnu.org, Michael Matz Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Thu, Oct 20, 2011 at 4:45 PM, Joseph S. Myers wrote: >> The patch was tested on x86_64-pc-linux-gnu, but I would like Joseph >> to check if I didn't mess something with options handling. > > I have no comments on the option handling in this patch. > >> +for vectorized single float division and vectorized sqrtf(x) already with > > @code{sqrtf (@var{x})} Thanks - fixed, with a similar fix in the previous paragraph. I also found a PR that deals with vectorized reciprocal, so I referred to the PR in the ChangeLog entry: 2011-10-20 Uros Bizjak PR target/47989 * config/i386/i386.h (RECIP_MASK_DEFAULT): New define. * config/i386/i386.op (recip_mask): Initialize with RECIP_MASK_DEFAULT. * doc/invoke.texi (ix86 Options, -mrecip): Document that GCC implements vectorized single float division and vectorized sqrtf(x) with reciprocal sequence with additional Newton-Raphson step with -ffast-math. Attached is the patch that was committed to mainline SVN. Encouraged by Michael's results, let's see what automated benchmark testers will show. Uros. Index: config/i386/i386.h =================================================================== --- config/i386/i386.h (revision 180255) +++ config/i386/i386.h (working copy) @@ -2322,6 +2322,7 @@ #define RECIP_MASK_VEC_SQRT 0x08 #define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \ | RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) +#define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) #define TARGET_RECIP_DIV ((recip_mask & RECIP_MASK_DIV) != 0) #define TARGET_RECIP_SQRT ((recip_mask & RECIP_MASK_SQRT) != 0) Index: config/i386/i386.opt =================================================================== --- config/i386/i386.opt (revision 180255) +++ config/i386/i386.opt (working copy) @@ -32,7 +32,7 @@ HOST_WIDE_INT ix86_isa_flags_explicit TargetVariable -int recip_mask +int recip_mask = RECIP_MASK_DEFAULT Variable int recip_mask_explicit Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 180255) +++ doc/invoke.texi (working copy) @@ -12922,7 +12922,12 @@ of the non-reciprocal instruction, the precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994). -Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS) +Note that GCC implements @code{1.0f/sqrtf(@var{x})} in terms of RSQRTSS +(or RSQRTPS) already with @option{-ffast-math} (or the above option +combination), and doesn't need @option{-mrecip}. + +Also note that GCC emits the above sequence with additional Newton-Raphson step +for vectorized single float division and vectorized @code{sqrtf(@var{x})} already with @option{-ffast-math} (or the above option combination), and doesn't need @option{-mrecip}.