===================================================================
@@ -8516,6 +8516,86 @@ asm ("cmoveq %1, %2, %[result]"
: "r" (test), "r" (new), "[result]" (old));
@end example
+Here is a larger PowerPC example taken from OpenBLAS. All of the
+function parameters are inputs except for the @code{y} array, which is
+modified by the function. Early assembly sets up four pointers
+into the @code{ap} array, @code{a0=ap}, @code{a1=ap+lda},
+@code{a2=ap+2*lda}, and @code{a3=ap+3*lda}. The actual assembly code
+has been elided except for comments added to check GCC's register
+assignments, since it's not interesting for purposes of explaining the
+operands.
+
+@smallexample
+static void
+dgemv_kernel_4x4 (long n, const double *ap, long lda,
+ const double *x, double *y, double alpha)
+@{
+ double *a0;
+ double *a1;
+ double *a2;
+ double *a3;
+
+ __asm__
+ (
+ ...
+ "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
+ "#a0=%3 a1=%4 a2=%5 a3=%6"
+ :
+ "+m" (*y),
+ "+r" (n), // 1
+ "+b" (y), // 2
+ "=b" (a0), // 3
+ "=b" (a1), // 4
+ "=&b" (a2), // 5
+ "=&b" (a3) // 6
+ :
+ "m" (*x),
+ "m" (*ap),
+ "d" (alpha), // 9
+ "r" (x), // 10
+ "b" (16), // 11
+ "3" (ap), // 12
+ "4" (lda) // 13
+ :
+ "cr0",
+ "vs32","vs33","vs34","vs35","vs36","vs37",
+ "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
+ );
+@}
+@end smallexample
+
+Illustrated here are techniques you can use to have GCC allocate
+temporary registers for an @code{asm} statement, giving the compiler
+more freedom than if you allocated fixed registers via clobbers. This
+is done by declaring a variable and making it an early-clobber
+@code{asm} output as with @code{a2} and @code{a3}, or making it an
+output tied to an input as with @code{a0} and @code{a1}. The VSX
+registers used by the @code{asm} statement could have used the same
+technique except for GCC's limit on number of @code{asm} parameters.
+It shouldn't be surprising that @code{a0} is tied to @code{ap} from
+the above description, and @code{lda} is only used early so that
+register is available for reuse as @code{a1}. Tying an input to an
+output is the way to set up an initialized temporary register that is
+modified by an @code{asm} statement. The example also shows an
+initialized register unchanged by the @code{asm} statement; @code{"b"
+(16)} sets up @code{%11} to 16.
+
+Also shown is a somewhat better method than using a @code{"memory"}
+clobber to tell GCC that an @code{asm} statement accesses or modifies
+memory. Here we use @code{"+m" (*y)} in the list of outputs to tell
+GCC that the @code{y} array is both read and written by the @code{asm}
+statement. @code{"m" (*x)} and @code{"m" (*ap)} in the inputs tell
+GCC that these arrays are read. At a minimum, aliasing rules allow
+GCC to know what memory @emph{doesn't} need to be flushed, and if the
+function were inlined then GCC may be able to do even better. Notice
+that @code{x}, @code{y}, and @code{ap} all appear twice in the
+@code{asm} parameters, once to specify memory accessed, and once to
+specify a base register used by the @code{asm}. You won't normally be
+wasting a register by doing this as GCC can use the same register for
+both purposes. However, it would be foolish to use both @code{%0} and
+@code{%2} for @code{y} in your @code{asm} statement and expect them to
+be the same.
+
@anchor{Clobbers}
@subsubsection Clobbers
@cindex @code{asm} clobbers