From patchwork Tue Apr 4 12:14:50 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Modra X-Patchwork-Id: 746770 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3vy7GH6dnBz9s8H for ; Tue, 4 Apr 2017 22:15:08 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="njC63sid"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=yMdkLmrs23HqUS4aA WNxqPWAvhu1E/y4J8SDv3jL6bYzRNtwcCi53tPLeDLzjLL+Py44Mn7MnfMSiY2di KZ9l+TT9Olfk9wTqTguTO6vVVzihePW+6VqupYZp3+poH9XA6+a1lOd+hbr9duY4 hIrrkNq5wgR0myx6bY5SvoA97M= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=default; bh=bWK48gz9K96EcdC4iUkie6I I2bQ=; b=njC63sidfSxvAwneMVggc+OoQSkGDgpp30aGjW7ZxWUuYpEhGcDVh70 8khunUUYpjrcLsSkfd3lAaY1i0qwzMROZ+CMpfJI7KknitzFE3YMgZUeBE8qfau3 mGPbsUR+/Y2rUYgWLdDMgPXGCu4e4e0+Ml2aoqBcEG0duELIfUek= Received: (qmail 75906 invoked by alias); 4 Apr 2017 12:14:58 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 75885 invoked by uid 89); 4 Apr 2017 12:14:57 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.3 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, SPF_PASS autolearn=ham version=3.3.2 spammy=Goto X-HELO: mail-pg0-f54.google.com Received: from mail-pg0-f54.google.com (HELO mail-pg0-f54.google.com) (74.125.83.54) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 04 Apr 2017 12:14:55 +0000 Received: by mail-pg0-f54.google.com with SMTP id g2so148860277pge.3 for ; Tue, 04 Apr 2017 05:14:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=7uBNHzDPXSOAfILFMRfCUGcmQ5A8J2W7eJpGeWwDTQY=; b=bJfVFQmQiWdj3sERT/veUZaEVtJUlKjoWK76m/dVVLs7W3hhZr+OOsj2Kj08WACJTF 7ZCkwXR1g9eL0Lzi7F0+5FHA5Ci+IxaPbZRy0Lg7IoiTeE9yopqxqHTpqfZymE0YfQpk 4n3iwzzqZf/AlzG1jTup5waV/OiU5WLwuaZbXA7kdhXP1yYxO/qd/NzxSDQuhjR5vXNL 7Y4ESD9pf5awnxnFVhHnuw0WGX3gVJwuoE+ach+C6XEPb0ex9mjJGUSFI3Eo40jkuc2u xHg9OhZJLzJWI06EWB8Y/7nr/SRVTM9ypvgupfxKepiXcBY9cwhd8TL02hFlRfBGyH86 V/hQ== X-Gm-Message-State: AFeK/H0lCIOWsRLS3O94ZbBvpURSMsIlGfRtjDExbyrRoUJwvmv6ECpqO8Sb8dVL/YtpGg== X-Received: by 10.98.71.149 with SMTP id p21mr23067433pfi.94.1491308095217; Tue, 04 Apr 2017 05:14:55 -0700 (PDT) Received: from bubble.grove.modra.org (CPE-58-160-71-80.tyqh2.lon.bigpond.net.au. [58.160.71.80]) by smtp.gmail.com with ESMTPSA id f5sm31743131pgn.50.2017.04.04.05.14.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 04 Apr 2017 05:14:54 -0700 (PDT) Received: by bubble.grove.modra.org (Postfix, from userid 1000) id 5CC72C1B2B; Tue, 4 Apr 2017 21:44:50 +0930 (ACST) Date: Tue, 4 Apr 2017 21:44:50 +0930 From: Alan Modra To: Sandra Loosemore Cc: gcc-patches@gcc.gnu.org Subject: Re: [DOC PATCH] PowerPC extended asm example Message-ID: <20170404121450.GF16711@bubble.grove.modra.org> References: <20170331133021.GH4983@bubble.grove.modra.org> <58E03B5B.9010804@codesourcery.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <58E03B5B.9010804@codesourcery.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-IsSubscribed: yes Revised patch. * doc/extend.texi (Extended Asm ): Rename to "Clobbers and Scratch Registers". Add OpenBLAS example. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 0f44ece..0b0a021 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7869,7 +7869,7 @@ A comma-separated list of C expressions read by the instructions in the @item Clobbers A comma-separated list of registers or other values changed by the @var{AssemblerTemplate}, beyond those listed as outputs. -An empty list is permitted. @xref{Clobbers}. +An empty list is permitted. @xref{Clobbers and Scratch Registers}. @item GotoLabels When you are using the @code{goto} form of @code{asm}, this section contains @@ -8229,7 +8229,7 @@ The enclosing parentheses are a required part of the syntax. When the compiler selects the registers to use to represent the output operands, it does not use any of the clobbered registers -(@pxref{Clobbers}). +(@pxref{Clobbers and Scratch Registers}). Output operand expressions must be lvalues. The compiler cannot check whether the operands have data types that are reasonable for the instruction being @@ -8465,7 +8465,8 @@ as input. The enclosing parentheses are a required part of the syntax. @end table When the compiler selects the registers to use to represent the input -operands, it does not use any of the clobbered registers (@pxref{Clobbers}). +operands, it does not use any of the clobbered registers +(@pxref{Clobbers and Scratch Registers}). If there are no output operands but there are input operands, place two consecutive colons where the output operands would go: @@ -8516,9 +8517,10 @@ asm ("cmoveq %1, %2, %[result]" : "r" (test), "r" (new), "[result]" (old)); @end example -@anchor{Clobbers} -@subsubsection Clobbers +@anchor{Clobbers and Scratch Registers} +@subsubsection Clobbers and Scratch Registers @cindex @code{asm} clobbers +@cindex @code{asm} scratch registers While the compiler is aware of changes to entries listed in the output operands, the inline @code{asm} code may modify more than just the outputs. For @@ -8589,6 +8591,110 @@ ten bytes of a string, use a memory input like: @end table +Rather than allocating fixed registers via clobbers to provide scratch +registers for an @code{asm} statement, there are better techniques you +can use which give the compiler more freedom. There are also better +ways than using a @code{"memory"} clobber to tell GCC that an +@code{asm} statement accesses or modifies memory. The following +PowerPC example taken from OpenBLAS illustrates some of these +techniques. + +In the function shown below, all of the function parameters are inputs +except for the @code{y} array, which is modified by the function. +Only the first few lines of assembly in the @code{asm} statement are +shown, and a comment handy for checking register assignments. These +insns set up some registers for later use in loops, and in particular, +set up four pointers into the @code{ap} array, @code{a0=ap}, +@code{a1=ap+lda}, @code{a2=ap+2*lda}, and @code{a3=ap+3*lda}. The +rest of the assembly is simply too large to include here. + +@smallexample +static void +dgemv_kernel_4x4 (long n, const double *ap, long lda, + const double *x, double *y, double alpha) +@{ + double *a0; + double *a1; + double *a2; + double *a3; + + __asm__ + ( + "lxvd2x 34, 0, %10 \n\t" // x0, x1 + "lxvd2x 35, %11, %10 \n\t" // x2, x3 + "xxspltd 32, %x9, 0 \n\t" // alpha, alpha + "sldi %6, %13, 3 \n\t" // lda * sizeof (double) + "xvmuldp 34, 34, 32 \n\t" // x0 * alpha, x1 * alpha + "xvmuldp 35, 35, 32 \n\t" // x2 * alpha, x3 * alpha + "add %4, %3, %6 \n\t" // a0 = ap, a1 = a0 + lda + "add %6, %6, %6 \n\t" // 2 * lda + "xxspltd 32, 34, 0 \n\t" // x0 * alpha, x0 * alpha + "xxspltd 33, 34, 1 \n\t" // x1 * alpha, x1 * alpha + "xxspltd 34, 35, 0 \n\t" // x2 * alpha, x2 * alpha + "xxspltd 35, 35, 1 \n\t" // x3 * alpha, x3 * alpha + "add %5, %3, %6 \n\t" // a2 = a0 + 2 * lda + "add %6, %4, %6 \n\t" // a3 = a1 + 2 * lda + ... + "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n" + "#a0=%3 a1=%4 a2=%5 a3=%6" + : + "+m" (*y), + "+r" (n), // 1 + "+b" (y), // 2 + "=b" (a0), // 3 + "=b" (a1), // 4 + "=&b" (a2), // 5 + "=&b" (a3) // 6 + : + "m" (*x), + "m" (*ap), + "d" (alpha), // 9 + "r" (x), // 10 + "b" (16), // 11 + "3" (ap), // 12 + "4" (lda) // 13 + : + "cr0", + "vs32","vs33","vs34","vs35","vs36","vs37", + "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47" + ); +@} +@end smallexample + +Allocating scratch registers is done by declaring a variable and +making it an early-clobber @code{asm} output as with @code{a2} and +@code{a3}, or making it an output tied to an input as with @code{a0} +and @code{a1}. You can use a normal @code{asm} output if all inputs +that might share the same register are consumed before the scratch is +used. The VSX registers clobbered by the @code{asm} statement could +have used the same technique except for GCC's limit on number of +@code{asm} parameters. It shouldn't be surprising that @code{a0} is +tied to @code{ap} from the above description, and @code{lda} is only +used in the fourth machine insn shown above, so that register is +available for reuse as @code{a1}. Note that tying an input to an +output is the way to set up an initialized temporary register modified +by an @code{asm} statement. The example also shows an initialized +register unchanged by the @code{asm} statement; @code{"b" (16)} sets +up @code{%11} to 16. + +Rather than using a @code{"memory"} clobber, the @code{asm} has +@code{"+m" (*y)} in the list of outputs to tell GCC that the @code{y} +array is both read and written by the @code{asm} statement. +@code{"m" (*x)} and @code{"m" (*ap)} in the inputs tell GCC that these +arrays are read. At a minimum, aliasing rules allow GCC to know what +memory @emph{doesn't} need to be flushed, and if the function were +inlined then GCC may be able to do even better. Also, if GCC can +prove that all of the outputs of an @code{asm} statement are unused, +then the @code{asm} may be deleted. Removal of dead @code{asm} +statements will not happen if they clobber @code{"memory"}. Notice +that @code{x}, @code{y}, and @code{ap} all appear twice in the +@code{asm} parameters, once to specify memory accessed, and once to +specify a base register used by the @code{asm}. You won't normally be +wasting a register by doing this as GCC can use the same register for +both purposes. However, it would be foolish to use both @code{%0} and +@code{%2} for @code{y} in this @code{asm} assembly and expect them to +be the same. + @anchor{GotoLabels} @subsubsection Goto Labels @cindex @code{asm} goto labels