From patchwork Mon Jun 20 17:09:19 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Verbin X-Patchwork-Id: 638151 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rYHTW1L2dz9sdn for ; Tue, 21 Jun 2016 03:11:59 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=bCbJ1lNb; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=jdcIx8EeZdP09uUYfni2wZ/hpOktaZMRnet2VmLiCXoHEB2RxA iUUIn8vwJdZ4dKrT4U2w580qiq4tTJXEiEVcuGqCEkBzvdmlacJt2b0BN0fbeyv+ /TfxTrAJHA3bYBH6q4fXikmkUemsBy9gpIvRtnsaaa/a6FIV+W51i+H2A= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=hSCLzQ64y0jMtlksRD26sPaNWEE=; b=bCbJ1lNbW/YxSlZbFHJS L4Kz1ApwBzJ0cCX8nqQySNOT+J3V5hgAHNGXS8lKZocWz9YfEhh2ZLWi4LIflNTz +hB0nd/5tKuqqRv4A1p1pHjfW4Wqjr9xpEIfmbIvkQtbxSGTFt6m2F1/eY4A/PPo CMGb6bicG1ETt/RaPapEsP4= Received: (qmail 84609 invoked by alias); 20 Jun 2016 17:11:51 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 84598 invoked by uid 89); 20 Jun 2016 17:11:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=gen_rtx_SET, gen_rtx_set, gen_rtx_UNSPEC, gen_rtx_unspec X-HELO: mail-wm0-f65.google.com Received: from mail-wm0-f65.google.com (HELO mail-wm0-f65.google.com) (74.125.82.65) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Mon, 20 Jun 2016 17:11:48 +0000 Received: by mail-wm0-f65.google.com with SMTP id 187so15659532wmz.1 for ; Mon, 20 Jun 2016 10:11:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version :content-disposition:user-agent; bh=9cvDT7yYnHOCt+cqcNKz/cQN11fsJtUmZJ0HBO0spsQ=; b=crH7dS295orzM19NCwPnCI9nrgrIPwlZkypZReAHbq4aeWwMTd9KbEd4z4hM0Ncxk3 o/Uc4NUv+h7p1qxUMXAdMZPmeJutVE+51LTPGPlJXv1RFEygGe63L2Ca5E+xK9PzvWgc avndXFKOtldx1QPO/zY8ImnDZb0nVG8lz/jZ29mb5xJfD/ah5snOrjBaAsUjfi2uFqD5 NlIgkQEXGPV9UcWsYeJdlFtx301vT+ZSuzI3qOuRo+0KKBgu+AV885FcUJxhloos3hry 81sWzF1X0EJsJz68CIgc4nNpVYmZNqEqe2OFvQ51Ak8rzn1rd0PnVYq8LzwYi1xkNPsB 3Qhg== X-Gm-Message-State: ALyK8tInm6h1byKSTMff12AQbQRL3CORhlLVlqWryFBaaHfZ4e8SMeabGEhi6Em9amQ3zQ== X-Received: by 10.28.223.215 with SMTP id w206mr11855958wmg.61.1466442705864; Mon, 20 Jun 2016 10:11:45 -0700 (PDT) Received: from msticlxl57.ims.intel.com ([192.198.151.43]) by smtp.gmail.com with ESMTPSA id bb4sm25005462wjb.32.2016.06.20.10.11.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 Jun 2016 10:11:45 -0700 (PDT) Date: Mon, 20 Jun 2016 20:09:19 +0300 From: Ilya Verbin To: Uros Bizjak , gcc-patches@gcc.gnu.org Cc: Kirill Yukhin , Jakub Jelinek Subject: [PATCH, i386, AVX-512ER] vrcp28ps auto generation Message-ID: <20160620170919.GA21416@msticlxl57.ims.intel.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-IsSubscribed: yes Hi! This patch emits vrcp28ps and vmulps istructions for ix86_emit_swdivsf. The relative error is < 2^-23, so no additional iteration is necessary. Regtested using various benchmarks on a AVX-512ER machine. OK for trunk? gcc/ * config/i386/i386.c (ix86_emit_swdivsf): Emit vrcp28ps. gcc/testsuite/ * gcc.target/i386/avx512er-vrcp28ps-3.c: New test. * gcc.target/i386/avx512er-vrcp28ps-4.c: New test. -- Ilya diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 56a5b9c..8e0bf26 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -48674,8 +48674,19 @@ void ix86_emit_swdivsf (rtx res, rtx a, rtx b, machine_mode mode) /* x0 = rcp(b) estimate */ if (mode == V16SFmode || mode == V8DFmode) - emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b), - UNSPEC_RCP14))); + { + if (TARGET_AVX512ER) + { + emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b), + UNSPEC_RCP28))); + /* res = a * x0 */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, a, x0))); + return; + } + else + emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b), + UNSPEC_RCP14))); + } else emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b), UNSPEC_RCP))); diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-3.c b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-3.c new file mode 100644 index 0000000..e08bea4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-3.c @@ -0,0 +1,50 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512er } */ +/* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx512er" } */ + +#include "avx512er-check.h" + +#define MAX 1000 +#define EPS 0.00001 + +__attribute__ ((noinline, optimize (0))) +void static +compute_rcp_ref (float *a, float *b, float *r) +{ + for (int i = 0; i < MAX; i++) + r[i] = a[i] / b[i]; +} + +__attribute__ ((noinline)) +void static +compute_rcp_exp (float *a, float *b, float *r) +{ + for (int i = 0; i < MAX; i++) + r[i] = a[i] / b[i]; +} + +void static +avx512er_test (void) +{ + float a[MAX]; + float b[MAX]; + float ref[MAX]; + float exp[MAX]; + + for (int i = 0; i < MAX; i++) + { + a[i] = 179.345 - 6.5645 * i; + b[i] = 8765.987 - 8.6756 * i; + } + + compute_rcp_ref (a, b, ref); + compute_rcp_exp (a, b, exp); + + for (int i = 0; i < MAX; i++) + { + float rel_err = (ref[i] - exp[i]) / ref[i]; + rel_err = rel_err > 0.0 ? rel_err : -rel_err; + if (rel_err > EPS) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-4.c b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-4.c new file mode 100644 index 0000000..2c76d96 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-4.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx512er" } */ + +#include "avx512er-vrcp28ps-3.c" + +/* { dg-final { scan-assembler-times "vrcp28ps\[^\n\r\]*zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */