diff mbox

[DOC] PowerPC extended asm example

Message ID 58E03B5B.9010804@codesourcery.com
State New
Headers show

Commit Message

Sandra Loosemore April 1, 2017, 11:44 p.m. UTC
On 03/31/2017 07:30 AM, Alan Modra wrote:
> Some people over at OpenBLAS were asking me whether I knew of a
> whitepaper on gcc asm.  I didn't besides the gcc manual, and wrote a
> note explaining some tricks.  This patch is that note cleaned up.
> Tested by an x86_64-linux build.  OK to apply?

The patch had a lot of copy-editing issues with markup, spelling, etc. 
I thought it would be easier just to fix them than explain what was 
wrong, so I've attached a tidied-up version.  I also moved the example 
before the detailed discussion since it's easier to understand that way.

There are still a couple semantic issues that need fixing, though...

(1) The example is in the "Input Operands" subsection, but it seems like 
it's really about clobbers and alternatives to clobbers.  Unless you 
have some better idea, I'd suggest moving it to the "Clobbers" 
subsection and maybe renaming that subsection "Clobbers and Scratch 
Registers" too.  And making the purpose of the example and its relation 
to the purpose of the containing section more explicit in its 
introductory text.

(2) In this bit of text

> +function, and that early assembly sets up four pointers into the
> +@code{ap} array, @code{a0=ap}, @code{a1=ap+lda}, @code{a2=ap+2*lda},
> +and @code{a3=ap+3*lda}.

I don't understand what "early assembly" is.  Wouldn't it make more 
sense to add initializers to these declarations in the example code

> +  double *a0;
> +  double *a1;
> +  double *a2;
> +  double *a3;

than to hand-wave about what sets up these pointers?

-Sandra
diff mbox

Patch

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 246632)
+++ gcc/doc/extend.texi	(working copy)
@@ -8516,6 +8516,86 @@  asm ("cmoveq %1, %2, %[result]"
    : "r" (test), "r" (new), "[result]" (old));
 @end example
 
+Here is a larger PowerPC example taken from OpenBLAS.  All of the
+function parameters are inputs except for the @code{y} array, which is
+modified by the function.  Early assembly sets up four pointers
+into the @code{ap} array, @code{a0=ap}, @code{a1=ap+lda},
+@code{a2=ap+2*lda}, and @code{a3=ap+3*lda}.  The actual assembly code
+has been elided except for comments added to check GCC's register
+assignments, since it's not interesting for purposes of explaining the
+operands.
+
+@smallexample
+static void
+dgemv_kernel_4x4 (long n, const double *ap, long lda,
+                  const double *x, double *y, double alpha)
+@{
+  double *a0;
+  double *a1;
+  double *a2;
+  double *a3;
+
+  __asm__
+    (
+     ...
+     "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
+     "#a0=%3 a1=%4 a2=%5 a3=%6"
+     :
+       "+m" (*y),
+       "+r" (n),	// 1
+       "+b" (y),	// 2
+       "=b" (a0),	// 3
+       "=b" (a1),	// 4
+       "=&b" (a2),	// 5
+       "=&b" (a3)	// 6
+     :
+       "m" (*x),
+       "m" (*ap),
+       "d" (alpha),	// 9
+       "r" (x),		// 10
+       "b" (16),	// 11
+       "3" (ap),	// 12
+       "4" (lda)	// 13
+     :
+       "cr0",
+       "vs32","vs33","vs34","vs35","vs36","vs37",
+       "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
+     );
+@}
+@end smallexample
+
+Illustrated here are techniques you can use to have GCC allocate
+temporary registers for an @code{asm} statement, giving the compiler
+more freedom than if you allocated fixed registers via clobbers.  This
+is done by declaring a variable and making it an early-clobber
+@code{asm} output as with @code{a2} and @code{a3}, or making it an
+output tied to an input as with @code{a0} and @code{a1}.  The VSX
+registers used by the @code{asm} statement could have used the same
+technique except for GCC's limit on number of @code{asm} parameters.
+It shouldn't be surprising that @code{a0} is tied to @code{ap} from
+the above description, and @code{lda} is only used early so that
+register is available for reuse as @code{a1}.  Tying an input to an
+output is the way to set up an initialized temporary register that is
+modified by an @code{asm} statement.  The example also shows an
+initialized register unchanged by the @code{asm} statement; @code{"b"
+(16)} sets up @code{%11} to 16.
+
+Also shown is a somewhat better method than using a @code{"memory"}
+clobber to tell GCC that an @code{asm} statement accesses or modifies
+memory.  Here we use @code{"+m" (*y)} in the list of outputs to tell
+GCC that the @code{y} array is both read and written by the @code{asm}
+statement.  @code{"m" (*x)} and @code{"m" (*ap)} in the inputs tell
+GCC that these arrays are read.  At a minimum, aliasing rules allow
+GCC to know what memory @emph{doesn't} need to be flushed, and if the
+function were inlined then GCC may be able to do even better.  Notice
+that @code{x}, @code{y}, and @code{ap} all appear twice in the
+@code{asm} parameters, once to specify memory accessed, and once to
+specify a base register used by the @code{asm}.  You won't normally be
+wasting a register by doing this as GCC can use the same register for
+both purposes.  However, it would be foolish to use both @code{%0} and
+@code{%2} for @code{y} in your @code{asm} statement and expect them to
+be the same.
+
 @anchor{Clobbers}
 @subsubsection Clobbers
 @cindex @code{asm} clobbers