From patchwork Wed Sep 17 10:08:49 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 390343 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 2650314008F for ; Wed, 17 Sep 2014 20:09:09 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; q=dns; s= default; b=aXQeEi8bGKO6TvwdQfbc8WMxOL+GElQfhxlIPaBDifRG13ZpdQz/B DgrkW6txvLNV++mmLFvEJdqesKSftPxQ9ONx58bvFlUe6DP4jUUwdd5uyq0TSMQc TnZlh9NLJZk56SSDlH/ZzErFqF0tERqgVsGku9keUQWCns2IOTXlGg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; s=default; bh=3c1VNuSLgpZlK2w9eV1hGXDmB6U=; b=hUhtOvJ+DjzjIvINBka4YuI699T0 0NI3oTk53FkxzMaDWAtLYQH0gwax0TCJJ3sv1YhuAOiQJFRMRymR3RBXy5sJ5vG0 rwhfAJusb22LpO+aCkhqSj5B68o5X3Am5d/oHQ/fuIloBslogsLIp3ma1ycQ6ckJ 1OgVdMXp7TfxCUk= Received: (qmail 22635 invoked by alias); 17 Sep 2014 10:09:04 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 22625 invoked by uid 89); 17 Sep 2014 10:09:03 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Date: Wed, 17 Sep 2014 12:08:49 +0200 From: Jakub Jelinek To: Andrew Senkevich Cc: "H.J. Lu" , "Carlos O'Donell" , "Joseph S. Myers" , libc-alpha , "Zamyatin, Igor" , "Melik-Adamyan, Areg" Subject: Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc Message-ID: <20140917100849.GD17454@tucnak.redhat.com> Reply-To: Jakub Jelinek References: <5411F8D3.7050001@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) On Wed, Sep 17, 2014 at 01:56:06PM +0400, Andrew Senkevich wrote: > > The wiki says: > > > > 3.1. Goal > > > > Main goal is to improve vectorization of GCC with OpenMP4.0 SIMD > > constructs (#2.8 in http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf > > and Cilk Plus constructs (6-7 in > > http://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm) > > on x86_64 by adding SSE4, AVX and AVX2 vector implementations of > > several vector math functions (float and double versions). AVX-512 > > versions are planned to be added later. These functions can be also > > used manually (with intrincics) by developers to obtain speedup. > > > > It is the opposite of > > > > https://sourceware.org/ml/libc-alpha/2014-09/msg00277.html > > > > which is for programmers to use them directly in their > > applications, mostly independent of compilers. > > > > We need to come to an agreement on what goal is first. > > > > -- > > H.J. > > Hi H.J., > > of course the first goal is to improve vectorization. Usage with > intrinsics is additional goal and is not very significant. > > Attached first patch corrected according last comments in > https://sourceware.org/ml/libc-alpha/2014-09/msg00182.html. you need all of SSE2, AVX and AVX2 versions, the other two can be thunked (extract arguments and call cos in a loop or similarly, then pass result in vector reg again). Jakub --- a/math/bits/mathcalls.h +++ b/math/bits/mathcalls.h @@ -46,6 +46,17 @@ # error "Never include directly; include instead." #endif +#undef __DECL_SIMD + +/* For now we have vectorized version only for _Mdouble_ case */ +#if !defined _Mfloat_ && !defined _Mlong_double_ +# if defined _OPENMP && _OPENMP >= 201307 +# define __DECL_SIMD _Pragma ("omp declare simd") As the function is provided only on x86_64, it needs to be guarded by defined __x86_64__ too (or have some way how arch specific headers can tell what function are elemental). Also, only the N (notinbranch) version is provided, so you'd need to use "omp declare simd notinbranch", and furthermore only the AVX2 version is provided (that is not possible for gcc,