diff mbox series

i386: Make xmm16-xmm31 call used even in ms ABI

Message ID 20200204114211.GP17695@tucnak
State New
Headers show
Series i386: Make xmm16-xmm31 call used even in ms ABI | expand

Commit Message

Jakub Jelinek Feb. 4, 2020, 11:42 a.m. UTC
Hi!

On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
> I guess that Comment #9 patch form the PR should be trivially correct,
> but althouhg it looks obvious, I don't want to propose the patch since
> I have no means of testing it.

I don't have means of testing it either.
https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019
is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low
128-bits only) are call preserved.

Jonathan, could you please test this if it is sufficient to just change
CALL_USED_REGISTERS or if e.g. something in the pro/epilogue needs tweaking
too?  Thanks.

We are talking e.g. about
/* { dg-options "-O2 -mabi=ms -mavx512vl" } */

typedef double V __attribute__((vector_size (16)));
void foo (void);
V bar (void);
void baz (V);
void
qux (void)
{
  V c;
  {
    register V a __asm ("xmm18");
    V b = bar ();
    asm ("" : "=x" (a) : "0" (b));
    c = a;
  }
  foo ();
  {
    register V d __asm ("xmm18");
    V e;
    d = c;
    asm ("" : "=x" (e) : "0" (d));
    baz (e);
  }
}
where according to the MSDN doc gcc incorrectly holds the c value
in xmm18 register across the foo call; if foo is compiled by some Microsoft
compiler (or LLVM), then it could clobber %xmm18.
If all xmm18 occurrences are changed to say xmm15, then it is valid to hold
the 128-bit value across the foo call (though, surprisingly, LLVM saves it
into stack anyway).

2020-02-04  Uroš Bizjak  <ubizjak@gmail.com>

	* config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make
	xmm16-xmm31 call-used even in 64-bit ms-abi.



	Jakub

Comments

Jonathan Yong Feb. 6, 2020, 1 a.m. UTC | #1
On 2/4/20 11:42 AM, Jakub Jelinek wrote:
> Hi!
> 
> On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
>> I guess that Comment #9 patch form the PR should be trivially correct,
>> but althouhg it looks obvious, I don't want to propose the patch since
>> I have no means of testing it.
> 
> I don't have means of testing it either.
> https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019
> is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low
> 128-bits only) are call preserved.
> 
> Jonathan, could you please test this if it is sufficient to just change
> CALL_USED_REGISTERS or if e.g. something in the pro/epilogue needs tweaking
> too?  Thanks.

Is this patch testing still required? I just got back from traveling.
Jakub Jelinek Feb. 6, 2020, 6:07 a.m. UTC | #2
On Thu, Feb 06, 2020 at 01:00:36AM +0000, JonY wrote:
> On 2/4/20 11:42 AM, Jakub Jelinek wrote:
> > Hi!
> > 
> > On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
> >> I guess that Comment #9 patch form the PR should be trivially correct,
> >> but althouhg it looks obvious, I don't want to propose the patch since
> >> I have no means of testing it.
> > 
> > I don't have means of testing it either.
> > https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019
> > is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low
> > 128-bits only) are call preserved.
> > 
> > Jonathan, could you please test this if it is sufficient to just change
> > CALL_USED_REGISTERS or if e.g. something in the pro/epilogue needs tweaking
> > too?  Thanks.
> 
> Is this patch testing still required? I just got back from traveling.

Yes, our reading of the MS ABI docs show that xmm16-31 are to be call used
(not preserved over calls), while in gcc they are currently handled as
preserved across the calls.

	Jakub
Jonathan Yong Feb. 7, 2020, 10:57 a.m. UTC | #3
On 2/6/20 6:07 AM, Jakub Jelinek wrote:
> On Thu, Feb 06, 2020 at 01:00:36AM +0000, JonY wrote:
>> On 2/4/20 11:42 AM, Jakub Jelinek wrote:
>>> Hi!
>>>
>>> On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
>>>> I guess that Comment #9 patch form the PR should be trivially correct,
>>>> but althouhg it looks obvious, I don't want to propose the patch since
>>>> I have no means of testing it.
>>>
>>> I don't have means of testing it either.
>>> https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019
>>> is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low
>>> 128-bits only) are call preserved.
>>>
>>> Jonathan, could you please test this if it is sufficient to just change
>>> CALL_USED_REGISTERS or if e.g. something in the pro/epilogue needs tweaking
>>> too?  Thanks.
>>
>> Is this patch testing still required? I just got back from traveling.
> 
> Yes, our reading of the MS ABI docs show that xmm16-31 are to be call used
> (not preserved over calls), while in gcc they are currently handled as
> preserved across the calls.
> 
> 	Jakub
> 


--- original.s  2020-02-06 09:00:02.014638069 +0000
+++ new.s       2020-02-07 10:28:55.678317667 +0000
@@ -7,23 +7,23 @@
 qux:
        subq    $72, %rsp
        .seh_stackalloc 72
-       vmovaps %xmm18, 48(%rsp)
-       .seh_savexmm    %xmm18, 48
+       vmovaps %xmm6, 48(%rsp)
+       .seh_savexmm    %xmm6, 48
        .seh_endprologue
        call    bar
        vmovapd %xmm0, %xmm1
-       vmovapd %xmm1, %xmm18
+       vmovapd %xmm1, %xmm6
        call    foo
        leaq    32(%rsp), %rcx
-       vmovapd %xmm18, %xmm0
-       vmovaps %xmm0, 32(%rsp)
+       vmovapd %xmm6, %xmm0
+       vmovapd %xmm0, 32(%rsp)
        call    baz
        nop
-       vmovaps 48(%rsp), %xmm18
+       vmovaps 48(%rsp), %xmm6
        addq    $72, %rsp
        ret
        .seh_endproc
-       .ident  "GCC: (GNU) 10.0.0 20191024 (experimental)"
+       .ident  "GCC: (GNU) 10.0.1 20200206 (experimental)"
        .def    bar;    .scl    2;      .type   32;     .endef
        .def    foo;    .scl    2;      .type   32;     .endef
        .def    baz;    .scl    2;      .type   32;     .endef

GCC with the patch now seems to put the variables in xmm6, unfortunately
I don't know enough of AVX or stack setups to know if that's all that is
needed.
Jakub Jelinek Feb. 7, 2020, 11:28 a.m. UTC | #4
On Fri, Feb 07, 2020 at 10:57:22AM +0000, JonY wrote:
> >> Is this patch testing still required? I just got back from traveling.
> > 
> > Yes, our reading of the MS ABI docs show that xmm16-31 are to be call used
> > (not preserved over calls), while in gcc they are currently handled as
> > preserved across the calls.

The other parts are I guess mainly about SEH.  Consider e.g.
void
foo (void)
{
  register double x __asm ("xmm14");
  register double y __asm ("xmm18");
  asm ("" : "=x" (x));
  asm ("" : "=v" (y));
  x += y;
  y += x;
  asm ("" : : "x" (x));
  asm ("" : : "v" (y));
}
looking at cross-compiler output, with -O2 -mavx512f this emits
	.file	"abcdeq.c"
	.text
	.align 16
	.globl	foo
	.def	foo;	.scl	2;	.type	32;	.endef
	.seh_proc	foo
foo:
	subq	$40, %rsp
	.seh_stackalloc	40
	vmovaps	%xmm14, (%rsp)
	.seh_savexmm	%xmm14, 0
	vmovaps	%xmm18, 16(%rsp)
	.seh_savexmm	%xmm18, 16
	.seh_endprologue
	vaddsd	%xmm18, %xmm14, %xmm14
	vaddsd	%xmm18, %xmm14, %xmm18
	vmovaps	(%rsp), %xmm14
	vmovaps	16(%rsp), %xmm18
	addq	$40, %rsp
	ret
	.seh_endproc
	.ident	"GCC: (GNU) 10.0.1 20200207 (experimental)"
Does whatever assembler mingw64 uses even assemble this (I mean the
.seh_savexmm %xmm16, 16 could be problematic)?
I can find e.g.
https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin/43210527
which then links to
https://gcc.gnu.org/PR65782

So, I'd say we want to add PR target/65782 to the ChangeLog entry in the
patch, and likely add a testcase like the above, so like below?

Have you tested the earlier version of the patch on mingw64 or cygwin?
If yes, can you just test the testcase from the following patch, without and
with the i386.h part and verify it FAILs without and PASSes with it?
Just make check-gcc RUNTESTFLAGS=i386.exp=pr65782.c should do?
If not, can you please test the whole patch?

2020-02-07  Uroš Bizjak  <ubizjak@gmail.com>
	    Jakub Jelinek  <jakub@redhat.com>

	PR target/65782
	* config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make
	xmm16-xmm31 call-used even in 64-bit ms-abi.

	* gcc.target/i386/pr65782.c: New test.

--- gcc/config/i386/config/i386/i386.h.jj	2020-01-22 10:19:24.199221986 +0100
+++ gcc/config/i386/config/i386/i386.h	2020-02-04 12:09:12.338341003 +0100
@@ -1128,9 +1128,9 @@ extern const char *host_detect_local_cpu
 /*xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15*/		\
      6,   6,    6,    6,    6,    6,    6,    6,		\
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
-     6,    6,     6,    6,    6,    6,    6,    6,		\
+     1,    1,     1,    1,    1,    1,    1,    1,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     6,    6,     6,    6,    6,    6,    6,    6,		\
+     1,    1,     1,    1,    1,    1,    1,    1,		\
  /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/			\
      1,   1,   1,   1,   1,   1,   1,   1 }
 
--- gcc/testsuite/gcc.target/i386/pr65782.c.jj	2020-02-07 12:21:09.472819018 +0100
+++ gcc/testsuite/gcc.target/i386/pr65782.c	2020-02-07 12:24:06.820154495 +0100
@@ -0,0 +1,16 @@
+/* PR target/65782 */
+/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */
+/* { dg-options "-O2 -mavx512vl" } */
+
+void
+foo (void)
+{
+  register double x __asm ("xmm14");
+  register double y __asm ("xmm18");
+  asm ("" : "=x" (x));
+  asm ("" : "=v" (y));
+  x += y;
+  y += x;
+  asm ("" : : "x" (x));
+  asm ("" : : "v" (y));
+}


	Jakub
Jonathan Yong Feb. 8, 2020, 8:24 a.m. UTC | #5
On 2/7/20 11:28 AM, Jakub Jelinek wrote:
> On Fri, Feb 07, 2020 at 10:57:22AM +0000, JonY wrote:
>>>> Is this patch testing still required? I just got back from traveling.
>>>
>>> Yes, our reading of the MS ABI docs show that xmm16-31 are to be call used
>>> (not preserved over calls), while in gcc they are currently handled as
>>> preserved across the calls.
> 
> The other parts are I guess mainly about SEH.  Consider e.g.
> void
> foo (void)
> {
>   register double x __asm ("xmm14");
>   register double y __asm ("xmm18");
>   asm ("" : "=x" (x));
>   asm ("" : "=v" (y));
>   x += y;
>   y += x;
>   asm ("" : : "x" (x));
>   asm ("" : : "v" (y));
> }
> looking at cross-compiler output, with -O2 -mavx512f this emits
> 	.file	"abcdeq.c"
> 	.text
> 	.align 16
> 	.globl	foo
> 	.def	foo;	.scl	2;	.type	32;	.endef
> 	.seh_proc	foo
> foo:
> 	subq	$40, %rsp
> 	.seh_stackalloc	40
> 	vmovaps	%xmm14, (%rsp)
> 	.seh_savexmm	%xmm14, 0
> 	vmovaps	%xmm18, 16(%rsp)
> 	.seh_savexmm	%xmm18, 16
> 	.seh_endprologue
> 	vaddsd	%xmm18, %xmm14, %xmm14
> 	vaddsd	%xmm18, %xmm14, %xmm18
> 	vmovaps	(%rsp), %xmm14
> 	vmovaps	16(%rsp), %xmm18
> 	addq	$40, %rsp
> 	ret
> 	.seh_endproc
> 	.ident	"GCC: (GNU) 10.0.1 20200207 (experimental)"
> Does whatever assembler mingw64 uses even assemble this (I mean the
> .seh_savexmm %xmm16, 16 could be problematic)?''

It does not, I just checked with the master branch of binutils.

> I can find e.g.
> https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin/43210527
> which then links to
> https://gcc.gnu.org/PR65782
> 
> So, I'd say we want to add PR target/65782 to the ChangeLog entry in the
> patch, and likely add a testcase like the above, so like below?
> 
> Have you tested the earlier version of the patch on mingw64 or cygwin?
> If yes, can you just test the testcase from the following patch, without and
> with the i386.h part and verify it FAILs without and PASSes with it?
> Just make check-gcc RUNTESTFLAGS=i386.exp=pr65782.c should do?
> If not, can you please test the whole patch?
> 

I did a -c test build with an older toolchain, it fails to compile
(invalid register for .seh_savexmm) while the latest gcc is passing,
both are using the same binutils version.

I guess that covers it?


> 2020-02-07  Uroš Bizjak  <ubizjak@gmail.com>
> 	    Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/65782
> 	* config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make
> 	xmm16-xmm31 call-used even in 64-bit ms-abi.
> 
> 	* gcc.target/i386/pr65782.c: New test.
> 
> --- gcc/config/i386/config/i386/i386.h.jj	2020-01-22 10:19:24.199221986 +0100
> +++ gcc/config/i386/config/i386/i386.h	2020-02-04 12:09:12.338341003 +0100
> @@ -1128,9 +1128,9 @@ extern const char *host_detect_local_cpu
>  /*xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15*/		\
>       6,   6,    6,    6,    6,    6,    6,    6,		\
>  /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
> -     6,    6,     6,    6,    6,    6,    6,    6,		\
> +     1,    1,     1,    1,    1,    1,    1,    1,		\
>  /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
> -     6,    6,     6,    6,    6,    6,    6,    6,		\
> +     1,    1,     1,    1,    1,    1,    1,    1,		\
>   /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/			\
>       1,   1,   1,   1,   1,   1,   1,   1 }
>  
> --- gcc/testsuite/gcc.target/i386/pr65782.c.jj	2020-02-07 12:21:09.472819018 +0100
> +++ gcc/testsuite/gcc.target/i386/pr65782.c	2020-02-07 12:24:06.820154495 +0100
> @@ -0,0 +1,16 @@
> +/* PR target/65782 */
> +/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */
> +/* { dg-options "-O2 -mavx512vl" } */
> +
> +void
> +foo (void)
> +{
> +  register double x __asm ("xmm14");
> +  register double y __asm ("xmm18");
> +  asm ("" : "=x" (x));
> +  asm ("" : "=v" (y));
> +  x += y;
> +  y += x;
> +  asm ("" : : "x" (x));
> +  asm ("" : : "v" (y));
> +}
> 
> 
> 	Jakub
>
Jakub Jelinek Feb. 8, 2020, 10:05 a.m. UTC | #6
On Sat, Feb 08, 2020 at 08:24:38AM +0000, JonY wrote:
> It does not, I just checked with the master branch of binutils.
...
> I did a -c test build with an older toolchain, it fails to compile
> (invalid register for .seh_savexmm) while the latest gcc is passing,
> both are using the same binutils version.
> 
> I guess that covers it?

I went ahead and committed the patch (with the double config/i386/ in it
fixed obviously).

> > 2020-02-07  Uroš Bizjak  <ubizjak@gmail.com>
> > 	    Jakub Jelinek  <jakub@redhat.com>
> > 
> > 	PR target/65782
> > 	* config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make
> > 	xmm16-xmm31 call-used even in 64-bit ms-abi.
> > 
> > 	* gcc.target/i386/pr65782.c: New test.

	Jakub
Uros Bizjak Feb. 8, 2020, 10:32 a.m. UTC | #7
On Sat, Feb 8, 2020 at 11:05 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Sat, Feb 08, 2020 at 08:24:38AM +0000, JonY wrote:
> > It does not, I just checked with the master branch of binutils.
> ...
> > I did a -c test build with an older toolchain, it fails to compile
> > (invalid register for .seh_savexmm) while the latest gcc is passing,
> > both are using the same binutils version.
> >
> > I guess that covers it?
>
> I went ahead and committed the patch (with the double config/i386/ in it
> fixed obviously).

I think that the patch should also be backported to gcc-9 branch. The
change is backward compatible, since the new code will save and
restore zmm16+ registers at the caller site, and the old code (e.g.
existing libraries) will then unnecessarily save and restore regs in
the callee.

HJ fixed some other MSABI issue there.

Uros.

>
> > > 2020-02-07  Uroš Bizjak  <ubizjak@gmail.com>
> > >         Jakub Jelinek  <jakub@redhat.com>
> > >
> > >     PR target/65782
> > >     * config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make
> > >     xmm16-xmm31 call-used even in 64-bit ms-abi.
> > >
> > >     * gcc.target/i386/pr65782.c: New test.
>
>         Jakub
>
Jakub Jelinek Feb. 8, 2020, 10:34 a.m. UTC | #8
On Sat, Feb 08, 2020 at 11:32:40AM +0100, Uros Bizjak wrote:
> On Sat, Feb 8, 2020 at 11:05 AM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Sat, Feb 08, 2020 at 08:24:38AM +0000, JonY wrote:
> > > It does not, I just checked with the master branch of binutils.
> > ...
> > > I did a -c test build with an older toolchain, it fails to compile
> > > (invalid register for .seh_savexmm) while the latest gcc is passing,
> > > both are using the same binutils version.
> > >
> > > I guess that covers it?
> >
> > I went ahead and committed the patch (with the double config/i386/ in it
> > fixed obviously).
> 
> I think that the patch should also be backported to gcc-9 branch. The
> change is backward compatible, since the new code will save and
> restore zmm16+ registers at the caller site, and the old code (e.g.
> existing libraries) will then unnecessarily save and restore regs in
> the callee.

Agreed, will do together with other backports soon.

	Jakub
Jakub Jelinek Feb. 8, 2020, 10:51 a.m. UTC | #9
On Sat, Feb 08, 2020 at 11:32:40AM +0100, Uros Bizjak wrote:
> I think that the patch should also be backported to gcc-9 branch. The
> change is backward compatible, since the new code will save and
> restore zmm16+ registers at the caller site, and the old code (e.g.
> existing libraries) will then unnecessarily save and restore regs in
> the callee.

Actually, it isn't backward compatible.
If both caller and callee touch xmm16+ (sure, unlikely), then if caller
is compiled by new gcc and callee by old gcc it is the case you described,
caller doesn't try to preserve values across the call and callee uselessly
saves them anyway (unless it fails to assemble due to SEH).
But if caller is compiled by old gcc and callee by new gcc, caller assumes
it can preserve values across the call in those registers and callee doesn't
save them and clobbers them.

	Jakub
Uros Bizjak Feb. 8, 2020, 12:52 p.m. UTC | #10
On Sat, Feb 8, 2020 at 11:52 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Sat, Feb 08, 2020 at 11:32:40AM +0100, Uros Bizjak wrote:
> > I think that the patch should also be backported to gcc-9 branch. The
> > change is backward compatible, since the new code will save and
> > restore zmm16+ registers at the caller site, and the old code (e.g.
> > existing libraries) will then unnecessarily save and restore regs in
> > the callee.
>
> Actually, it isn't backward compatible.
> If both caller and callee touch xmm16+ (sure, unlikely), then if caller
> is compiled by new gcc and callee by old gcc it is the case you described,
> caller doesn't try to preserve values across the call and callee uselessly
> saves them anyway (unless it fails to assemble due to SEH).
> But if caller is compiled by old gcc and callee by new gcc, caller assumes
> it can preserve values across the call in those registers and callee doesn't
> save them and clobbers them.

I was considering only the most likely scenario where the compiler is
upgraded without recompiling existing system libraries. So, with the
new compiler installed, the new compiled code calls either new or old
libraries. *Assuming* that old libraries never call new code, the
situation is OK.

As you point out, when the old code calls new code, registers get
clobbered. By not backporting the patch, we can consider gcc-9 and
gcc-10 ABI incompatible, otherwise the forward incompatibility moves
in between gcc-9.2 and gcc-9.3. I agree that the former is somehow
more convenient.

IMO, this issue should be pointed out in the compiler release notes in any case.

Uros.
diff mbox series

Patch

--- gcc/config/i386/config/i386/i386.h.jj	2020-01-22 10:19:24.199221986 +0100
+++ gcc/config/i386/config/i386/i386.h	2020-02-04 12:09:12.338341003 +0100
@@ -1128,9 +1128,9 @@  extern const char *host_detect_local_cpu
 /*xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15*/		\
      6,   6,    6,    6,    6,    6,    6,    6,		\
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
-     6,    6,     6,    6,    6,    6,    6,    6,		\
+     1,    1,     1,    1,    1,    1,    1,    1,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     6,    6,     6,    6,    6,    6,    6,    6,		\
+     1,    1,     1,    1,    1,    1,    1,    1,		\
  /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/			\
      1,   1,   1,   1,   1,   1,   1,   1 }