diff mbox

Clobbers and Scratch Registers

Message ID 20170821012323.GC3368@bubble.grove.modra.org
State New
Headers show

Commit Message

Alan Modra Aug. 21, 2017, 1:23 a.m. UTC
This is a revised version of
https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01562.html limited to
showing just the scratch register aspect, as a followup to
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01174.html 

	* doc/extend.texi (Extended Asm <Clobbers>): Rename to
	"Clobbers and Scratch Registers".  Add paragraph on
	alternative to clobbers for scratch registers and OpenBLAS
	example.

Comments

Richard Sandiford Aug. 21, 2017, 5:33 p.m. UTC | #1
Thanks for doing this.

Alan Modra <amodra@gmail.com> writes:
> This is a revised version of
> https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01562.html limited to
> showing just the scratch register aspect, as a followup to
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01174.html 
>
> 	* doc/extend.texi (Extended Asm <Clobbers>): Rename to
> 	"Clobbers and Scratch Registers".  Add paragraph on
> 	alternative to clobbers for scratch registers and OpenBLAS
> 	example.
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 940490e..0637672 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -8075,7 +8075,7 @@ A comma-separated list of C expressions read by the instructions in the
>  @item Clobbers
>  A comma-separated list of registers or other values changed by the 
>  @var{AssemblerTemplate}, beyond those listed as outputs.
> -An empty list is permitted.  @xref{Clobbers}.
> +An empty list is permitted.  @xref{Clobbers and Scratch Registers}.
>  
>  @item GotoLabels
>  When you are using the @code{goto} form of @code{asm}, this section contains 
> @@ -8435,7 +8435,7 @@ The enclosing parentheses are a required part of the syntax.
>  
>  When the compiler selects the registers to use to 
>  represent the output operands, it does not use any of the clobbered registers 
> -(@pxref{Clobbers}).
> +(@pxref{Clobbers and Scratch Registers}).
>  
>  Output operand expressions must be lvalues. The compiler cannot check whether 
>  the operands have data types that are reasonable for the instruction being 
> @@ -8671,7 +8671,8 @@ as input.  The enclosing parentheses are a required part of the syntax.
>  @end table
>  
>  When the compiler selects the registers to use to represent the input 
> -operands, it does not use any of the clobbered registers (@pxref{Clobbers}).
> +operands, it does not use any of the clobbered registers
> +(@pxref{Clobbers and Scratch Registers}).
>  
>  If there are no output operands but there are input operands, place two 
>  consecutive colons where the output operands would go:
> @@ -8722,9 +8723,10 @@ asm ("cmoveq %1, %2, %[result]"
>     : "r" (test), "r" (new), "[result]" (old));
>  @end example
>  
> -@anchor{Clobbers}
> -@subsubsection Clobbers
> +@anchor{Clobbers and Scratch Registers}
> +@subsubsection Clobbers and Scratch Registers
>  @cindex @code{asm} clobbers
> +@cindex @code{asm} scratch registers
>  
>  While the compiler is aware of changes to entries listed in the output 
>  operands, the inline @code{asm} code may modify more than just the outputs. For 
> @@ -8853,6 +8855,65 @@ dscal (size_t n, double *x, double alpha)
>  @}
>  @end smallexample
>  
> +Rather than allocating fixed registers via clobbers to provide scratch
> +registers for an @code{asm} statement, an alternative is to define a
> +variable and make it an early-clobber output as with @code{a2} and
> +@code{a3} in the example below.  This gives the compiler register
> +allocator more freedom.  You can also define a variable and make it an
> +output tied to an input as with @code{a0} and @code{a1}, tied
> +respectively to @code{ap} and @code{lda}.

I think it's worth emphasising that tying operands doesn't change
whether an output needs an earlyclobber or not.  E.g. for:

  asm ("%0 = f(%1); use %2"
       : "=r" (a) : "0" (b), "r" (c));

the compiler can assign the same register to all three operands if
it can prove that b == c on entry.  Since %0 is being modified before
%2 is used, it needs to be:

  asm ("%0 = f(%1); use %2"
       : "=&r" (a) : "0" (b), "r" (c));

instead.

Thanks,
Richard

> Of course, with tied
> +outputs your @code{asm} can't use the input value after modifying the
> +output register since they are one and the same register.  Note also
> +that tying an input to an output is the way to set up an initialized
> +temporary register modified by an @code{asm} statement.  An input not
> +tied to an output is assumed by GCC to be unchanged, for example
> +@code{"b" (16)} below sets up @code{%11} to 16, and GCC might use that
> +register in following code if the value 16 happened to be needed.  You
> +can even use a normal @code{asm} output for a scratch if all inputs
> +that might share the same register are consumed before the scratch is
> +used.  The VSX registers clobbered by the @code{asm} statement could
> +have used this technique except for GCC's limit on the number of
> +@code{asm} parameters.
> +
> +@smallexample
> +static void
> +dgemv_kernel_4x4 (long n, const double *ap, long lda,
> +                  const double *x, double *y, double alpha)
> +@{
> +  double *a0;
> +  double *a1;
> +  double *a2;
> +  double *a3;
> +
> +  __asm__
> +    (
> +     /* lots of asm here */
> +     "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
> +     "#a0=%3 a1=%4 a2=%5 a3=%6"
> +     :
> +       "+m" (*(double (*)[n]) y),
> +       "+r" (n),	// 1
> +       "+b" (y),	// 2
> +       "=b" (a0),	// 3
> +       "=b" (a1),	// 4
> +       "=&b" (a2),	// 5
> +       "=&b" (a3)	// 6
> +     :
> +       "m" (*(const double (*)[n]) x),
> +       "m" (*(const double (*)[]) ap),
> +       "d" (alpha),	// 9
> +       "r" (x),		// 10
> +       "b" (16),	// 11
> +       "3" (ap),	// 12
> +       "4" (lda)	// 13
> +     :
> +       "cr0",
> +       "vs32","vs33","vs34","vs35","vs36","vs37",
> +       "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
> +     );
> +@}
> +@end smallexample
> +
>  @anchor{GotoLabels}
>  @subsubsection Goto Labels
>  @cindex @code{asm} goto labels
diff mbox

Patch

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 940490e..0637672 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8075,7 +8075,7 @@  A comma-separated list of C expressions read by the instructions in the
 @item Clobbers
 A comma-separated list of registers or other values changed by the 
 @var{AssemblerTemplate}, beyond those listed as outputs.
-An empty list is permitted.  @xref{Clobbers}.
+An empty list is permitted.  @xref{Clobbers and Scratch Registers}.
 
 @item GotoLabels
 When you are using the @code{goto} form of @code{asm}, this section contains 
@@ -8435,7 +8435,7 @@  The enclosing parentheses are a required part of the syntax.
 
 When the compiler selects the registers to use to 
 represent the output operands, it does not use any of the clobbered registers 
-(@pxref{Clobbers}).
+(@pxref{Clobbers and Scratch Registers}).
 
 Output operand expressions must be lvalues. The compiler cannot check whether 
 the operands have data types that are reasonable for the instruction being 
@@ -8671,7 +8671,8 @@  as input.  The enclosing parentheses are a required part of the syntax.
 @end table
 
 When the compiler selects the registers to use to represent the input 
-operands, it does not use any of the clobbered registers (@pxref{Clobbers}).
+operands, it does not use any of the clobbered registers
+(@pxref{Clobbers and Scratch Registers}).
 
 If there are no output operands but there are input operands, place two 
 consecutive colons where the output operands would go:
@@ -8722,9 +8723,10 @@  asm ("cmoveq %1, %2, %[result]"
    : "r" (test), "r" (new), "[result]" (old));
 @end example
 
-@anchor{Clobbers}
-@subsubsection Clobbers
+@anchor{Clobbers and Scratch Registers}
+@subsubsection Clobbers and Scratch Registers
 @cindex @code{asm} clobbers
+@cindex @code{asm} scratch registers
 
 While the compiler is aware of changes to entries listed in the output 
 operands, the inline @code{asm} code may modify more than just the outputs. For 
@@ -8853,6 +8855,65 @@  dscal (size_t n, double *x, double alpha)
 @}
 @end smallexample
 
+Rather than allocating fixed registers via clobbers to provide scratch
+registers for an @code{asm} statement, an alternative is to define a
+variable and make it an early-clobber output as with @code{a2} and
+@code{a3} in the example below.  This gives the compiler register
+allocator more freedom.  You can also define a variable and make it an
+output tied to an input as with @code{a0} and @code{a1}, tied
+respectively to @code{ap} and @code{lda}.  Of course, with tied
+outputs your @code{asm} can't use the input value after modifying the
+output register since they are one and the same register.  Note also
+that tying an input to an output is the way to set up an initialized
+temporary register modified by an @code{asm} statement.  An input not
+tied to an output is assumed by GCC to be unchanged, for example
+@code{"b" (16)} below sets up @code{%11} to 16, and GCC might use that
+register in following code if the value 16 happened to be needed.  You
+can even use a normal @code{asm} output for a scratch if all inputs
+that might share the same register are consumed before the scratch is
+used.  The VSX registers clobbered by the @code{asm} statement could
+have used this technique except for GCC's limit on the number of
+@code{asm} parameters.
+
+@smallexample
+static void
+dgemv_kernel_4x4 (long n, const double *ap, long lda,
+                  const double *x, double *y, double alpha)
+@{
+  double *a0;
+  double *a1;
+  double *a2;
+  double *a3;
+
+  __asm__
+    (
+     /* lots of asm here */
+     "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
+     "#a0=%3 a1=%4 a2=%5 a3=%6"
+     :
+       "+m" (*(double (*)[n]) y),
+       "+r" (n),	// 1
+       "+b" (y),	// 2
+       "=b" (a0),	// 3
+       "=b" (a1),	// 4
+       "=&b" (a2),	// 5
+       "=&b" (a3)	// 6
+     :
+       "m" (*(const double (*)[n]) x),
+       "m" (*(const double (*)[]) ap),
+       "d" (alpha),	// 9
+       "r" (x),		// 10
+       "b" (16),	// 11
+       "3" (ap),	// 12
+       "4" (lda)	// 13
+     :
+       "cr0",
+       "vs32","vs33","vs34","vs35","vs36","vs37",
+       "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
+     );
+@}
+@end smallexample
+
 @anchor{GotoLabels}
 @subsubsection Goto Labels
 @cindex @code{asm} goto labels