Message ID | 20190520213626.27024-1-hjl.tools@gmail.com |
---|---|
State | New |
Headers | show |
Series | i386: Adjust gcc.target/i386/pr22076.c for 64-bit | expand |
On 5/20/19 3:36 PM, H.J. Lu wrote: > With SSE emulation of MMX intrinsics in 64-bit mode, > > --- > __v8qi test () > { > __v8qi mm0 = {1,2,3,4,5,6,7,8}; > __v8qi mm1 = {11,22,33,44,55,66,77,88}; > volatile __m64 x; > > x = _mm_add_pi8 (mm0, mm1); > > return x; > } > --- > > is compiled into > > movq .LC0(%rip), %xmm0 > movq .LC1(%rip), %xmm1 > paddb %xmm1, %xmm0 > movq %xmm0, -8(%rsp) > movq -8(%rsp), %xmm0 > ret > > instead of > > movq .LC1(%rip), %mm0 > paddb .LC0(%rip), %mm0 > movq %mm0, -8(%rsp) > movq -8(%rsp), %xmm0 > ret > > Adjust gcc.target/i386/pr22076.c for 64-bit. > > * gcc.target/i386/pr22076.c: Adjusted for 64-bit. Well, it looks like you're just papering over a code quality regression? Or am I missing something? Jeff
On Mon, May 20, 2019 at 11:39 PM Jeff Law <law@redhat.com> wrote: > > On 5/20/19 3:36 PM, H.J. Lu wrote: > > With SSE emulation of MMX intrinsics in 64-bit mode, > > > > --- > > __v8qi test () > > { > > __v8qi mm0 = {1,2,3,4,5,6,7,8}; > > __v8qi mm1 = {11,22,33,44,55,66,77,88}; > > volatile __m64 x; > > > > x = _mm_add_pi8 (mm0, mm1); > > > > return x; > > } > > --- > > > > is compiled into > > > > movq .LC0(%rip), %xmm0 > > movq .LC1(%rip), %xmm1 > > paddb %xmm1, %xmm0 > > movq %xmm0, -8(%rsp) > > movq -8(%rsp), %xmm0 > > ret > > > > instead of > > > > movq .LC1(%rip), %mm0 > > paddb .LC0(%rip), %mm0 > > movq %mm0, -8(%rsp) > > movq -8(%rsp), %xmm0 > > ret > > > > Adjust gcc.target/i386/pr22076.c for 64-bit. > > > > * gcc.target/i386/pr22076.c: Adjusted for 64-bit. > Well, it looks like you're just papering over a code quality regression? > Or am I missing something? We have to load 64bit value from memory to 128 bit XMM register using movq. OTOH, we could just use -mno-sse2 which disables XMM emulation. Uros.
On Mon, May 20, 2019 at 2:52 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Mon, May 20, 2019 at 11:39 PM Jeff Law <law@redhat.com> wrote: > > > > On 5/20/19 3:36 PM, H.J. Lu wrote: > > > With SSE emulation of MMX intrinsics in 64-bit mode, > > > > > > --- > > > __v8qi test () > > > { > > > __v8qi mm0 = {1,2,3,4,5,6,7,8}; > > > __v8qi mm1 = {11,22,33,44,55,66,77,88}; > > > volatile __m64 x; > > > > > > x = _mm_add_pi8 (mm0, mm1); > > > > > > return x; > > > } > > > --- > > > > > > is compiled into > > > > > > movq .LC0(%rip), %xmm0 > > > movq .LC1(%rip), %xmm1 > > > paddb %xmm1, %xmm0 > > > movq %xmm0, -8(%rsp) > > > movq -8(%rsp), %xmm0 > > > ret > > > > > > instead of > > > > > > movq .LC1(%rip), %mm0 > > > paddb .LC0(%rip), %mm0 > > > movq %mm0, -8(%rsp) > > > movq -8(%rsp), %xmm0 > > > ret > > > > > > Adjust gcc.target/i386/pr22076.c for 64-bit. > > > > > > * gcc.target/i386/pr22076.c: Adjusted for 64-bit. > > Well, it looks like you're just papering over a code quality regression? > > Or am I missing something? > > We have to load 64bit value from memory to 128 bit XMM register using movq. > > OTOH, we could just use -mno-sse2 which disables XMM emulation. > Works for me.
"H.J. Lu" <hjl.tools@gmail.com> writes: > With SSE emulation of MMX intrinsics in 64-bit mode, > > --- > __v8qi test () > { > __v8qi mm0 = {1,2,3,4,5,6,7,8}; > __v8qi mm1 = {11,22,33,44,55,66,77,88}; > volatile __m64 x; > > x = _mm_add_pi8 (mm0, mm1); > > return x; > } > --- > > is compiled into > > movq .LC0(%rip), %xmm0 > movq .LC1(%rip), %xmm1 > paddb %xmm1, %xmm0 > movq %xmm0, -8(%rsp) > movq -8(%rsp), %xmm0 > ret > > instead of > > movq .LC1(%rip), %mm0 > paddb .LC0(%rip), %mm0 > movq %mm0, -8(%rsp) > movq -8(%rsp), %xmm0 > ret This is PR target/90503. Rainer
On 5/20/19 3:51 PM, Uros Bizjak wrote: > On Mon, May 20, 2019 at 11:39 PM Jeff Law <law@redhat.com> wrote: >> >> On 5/20/19 3:36 PM, H.J. Lu wrote: >>> With SSE emulation of MMX intrinsics in 64-bit mode, >>> >>> --- >>> __v8qi test () >>> { >>> __v8qi mm0 = {1,2,3,4,5,6,7,8}; >>> __v8qi mm1 = {11,22,33,44,55,66,77,88}; >>> volatile __m64 x; >>> >>> x = _mm_add_pi8 (mm0, mm1); >>> >>> return x; >>> } >>> --- >>> >>> is compiled into >>> >>> movq .LC0(%rip), %xmm0 >>> movq .LC1(%rip), %xmm1 >>> paddb %xmm1, %xmm0 >>> movq %xmm0, -8(%rsp) >>> movq -8(%rsp), %xmm0 >>> ret >>> >>> instead of >>> >>> movq .LC1(%rip), %mm0 >>> paddb .LC0(%rip), %mm0 >>> movq %mm0, -8(%rsp) >>> movq -8(%rsp), %xmm0 >>> ret >>> >>> Adjust gcc.target/i386/pr22076.c for 64-bit. >>> >>> * gcc.target/i386/pr22076.c: Adjusted for 64-bit. >> Well, it looks like you're just papering over a code quality regression? >> Or am I missing something? > > We have to load 64bit value from memory to 128 bit XMM register using movq. > > OTOH, we could just use -mno-sse2 which disables XMM emulation. Ah, we can't have a MEM operand on paddb for xmm registers... Jeff
On Mon, May 20, 2019 at 03:54:49PM -0600, Jeff Law wrote: > >>> Adjust gcc.target/i386/pr22076.c for 64-bit. > >>> > >>> * gcc.target/i386/pr22076.c: Adjusted for 64-bit. > >> Well, it looks like you're just papering over a code quality regression? > >> Or am I missing something? > > > > We have to load 64bit value from memory to 128 bit XMM register using movq. > > > > OTOH, we could just use -mno-sse2 which disables XMM emulation. > Ah, we can't have a MEM operand on paddb for xmm registers... We can, but then it will load 128-bits from memory instead of just 64-bits, it might not be sufficiently aligned, unaligned load might cross page boundary, ... I think what we generate is fine. Jakub
On Mon, May 20, 2019 at 11:54 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > Adjust gcc.target/i386/pr22076.c for 64-bit. > > > > > > > > * gcc.target/i386/pr22076.c: Adjusted for 64-bit. > > > Well, it looks like you're just papering over a code quality regression? > > > Or am I missing something? > > > > We have to load 64bit value from memory to 128 bit XMM register using movq. > > > > OTOH, we could just use -mno-sse2 which disables XMM emulation. > > > > Works for me. 2019-05-20 Uroš Bizjak <ubizjak@gmail.com> PR testsuite/90503 * gcc.target/i386/pr22076.c (dg-options): Add -mno-sse2. Tested on x86_64-linux-gnu and committed. Uros. Index: gcc.target/i386/pr22076.c =================================================================== --- gcc.target/i386/pr22076.c (revision 271430) +++ gcc.target/i386/pr22076.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fomit-frame-pointer -flax-vector-conversions -mmmx" } */ +/* { dg-options "-O2 -fomit-frame-pointer -flax-vector-conversions -mmmx -mno-sse2" } */ /* { dg-additional-options "-mno-vect8-ret-in-mem" { target *-*-vxworks* } } */ #include <mmintrin.h>
On Tue, May 21, 2019 at 12:03 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Mon, May 20, 2019 at 11:54 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > > > Adjust gcc.target/i386/pr22076.c for 64-bit. > > > > > > > > > > * gcc.target/i386/pr22076.c: Adjusted for 64-bit. > > > > Well, it looks like you're just papering over a code quality regression? > > > > Or am I missing something? > > > > > > We have to load 64bit value from memory to 128 bit XMM register using movq. > > > > > > OTOH, we could just use -mno-sse2 which disables XMM emulation. > > > > > > > Works for me. > > 2019-05-20 Uroš Bizjak <ubizjak@gmail.com> > > PR testsuite/90503 > * gcc.target/i386/pr22076.c (dg-options): Add -mno-sse2. > > Tested on x86_64-linux-gnu and committed. While looking at the testcase, attached patch modernizes it a bit. 2019-05-20 Uroš Bizjak <ubizjak@gmail.com> PR testsuite/90503 * gcc.target/i386/pr22076.c (dg-options): Add -mno-sse2. Remove -flax-vector-conversions. (dg-additional-options): Remove. (test): Change to void. Declare m0 and m1 as __m64 and cast initializer in a proper way. Do not return result. (dg-final): Scan for 2 instances of movq. Uros. Index: gcc.target/i386/pr22076.c =================================================================== --- gcc.target/i386/pr22076.c (revision 271442) +++ gcc.target/i386/pr22076.c (working copy) @@ -1,19 +1,17 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fomit-frame-pointer -flax-vector-conversions -mmmx -mno-sse2" } */ -/* { dg-additional-options "-mno-vect8-ret-in-mem" { target *-*-vxworks* } } */ +/* { dg-options "-O2 -fomit-frame-pointer -mmmx -mno-sse2" } */ #include <mmintrin.h> -__v8qi test () +__m64 x; + +void test () { - __v8qi mm0 = {1,2,3,4,5,6,7,8}; - __v8qi mm1 = {11,22,33,44,55,66,77,88}; - volatile __m64 x; + __m64 mm0 = (__m64)(__v8qi) {1,2,3,4,5,6,7,8}; + __m64 mm1 = (__m64)(__v8qi) {11,22,33,44,55,66,77,88}; x = _mm_add_pi8 (mm0, mm1); - - return x; } -/* { dg-final { scan-assembler-times "movq" 3 } } */ +/* { dg-final { scan-assembler-times "movq" 2 } } */ /* { dg-final { scan-assembler-not "movl" { target nonpic } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr22076.c b/gcc/testsuite/gcc.target/i386/pr22076.c index 6c1620b4a3b..38c29dcc35b 100644 --- a/gcc/testsuite/gcc.target/i386/pr22076.c +++ b/gcc/testsuite/gcc.target/i386/pr22076.c @@ -15,5 +15,6 @@ __v8qi test () return x; } -/* { dg-final { scan-assembler-times "movq" 3 } } */ +/* { dg-final { scan-assembler-times "movq" 3 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movq" 4 { target { ! ia32 } } } } */ /* { dg-final { scan-assembler-not "movl" { target nonpic } } } */