i386: Adjust gcc.target/i386/pr22076.c for 64-bit

Message ID	20190520213626.27024-1-hjl.tools@gmail.com
State	New
Headers	show Return-Path: <gcc-patches-return-501265-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version :content-transfer-encoding; q=dns; s=default; b=EH7G83um0nCptDVv I5+G9u4cdwuhyX7CN94+YHTOn8LWJMpgWSicp2pBdLC6/lOF8+LlL5etSwQtI4zR ThblYVRsMWB2Gb3DdEDessqDWPIIyQDQ8iYKtbQoCTOXZu3P1TBsHVUBFZnSlRis vWKQwpjiNf/6WUyT6wCU5J5oRJE= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: "H.J. Lu" <hjl.tools@gmail.com> To: gcc-patches@gcc.gnu.org Cc: Uros Bizjak <ubizjak@gmail.com>, Jeff Law <law@redhat.com> Subject: [PATCH] i386: Adjust gcc.target/i386/pr22076.c for 64-bit Date: Mon, 20 May 2019 14:36:26 -0700 Message-Id: <20190520213626.27024-1-hjl.tools@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	i386: Adjust gcc.target/i386/pr22076.c for 64-bit \| expand i386: Adjust gcc.target/i386/pr22076.c for 64-bit

H.J. Lu May 20, 2019, 9:36 p.m. UTC

With SSE emulation of MMX intrinsics in 64-bit mode,

---
__v8qi test ()
{
  __v8qi mm0 = {1,2,3,4,5,6,7,8};
  __v8qi mm1 = {11,22,33,44,55,66,77,88};
  volatile __m64 x;

  x = _mm_add_pi8 (mm0, mm1);

  return x;
}
---

is compiled into

	movq	.LC0(%rip), %xmm0
	movq	.LC1(%rip), %xmm1
	paddb	%xmm1, %xmm0
	movq	%xmm0, -8(%rsp)
	movq	-8(%rsp), %xmm0
	ret

instead of

	movq	.LC1(%rip), %mm0
	paddb	.LC0(%rip), %mm0
	movq	%mm0, -8(%rsp)
	movq	-8(%rsp), %xmm0
	ret

Adjust gcc.target/i386/pr22076.c for 64-bit.

	* gcc.target/i386/pr22076.c: Adjusted for 64-bit.
---
 gcc/testsuite/gcc.target/i386/pr22076.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Jeff Law May 20, 2019, 9:39 p.m. UTC | #1

On 5/20/19 3:36 PM, H.J. Lu wrote:
> With SSE emulation of MMX intrinsics in 64-bit mode,
> 
> ---
> __v8qi test ()
> {
>   __v8qi mm0 = {1,2,3,4,5,6,7,8};
>   __v8qi mm1 = {11,22,33,44,55,66,77,88};
>   volatile __m64 x;
> 
>   x = _mm_add_pi8 (mm0, mm1);
> 
>   return x;
> }
> ---
> 
> is compiled into
> 
> 	movq	.LC0(%rip), %xmm0
> 	movq	.LC1(%rip), %xmm1
> 	paddb	%xmm1, %xmm0
> 	movq	%xmm0, -8(%rsp)
> 	movq	-8(%rsp), %xmm0
> 	ret
> 
> instead of
> 
> 	movq	.LC1(%rip), %mm0
> 	paddb	.LC0(%rip), %mm0
> 	movq	%mm0, -8(%rsp)
> 	movq	-8(%rsp), %xmm0
> 	ret
> 
> Adjust gcc.target/i386/pr22076.c for 64-bit.
> 
> 	* gcc.target/i386/pr22076.c: Adjusted for 64-bit.
Well, it looks like you're just papering over a code quality regression?
Or am I missing something?

Jeff

Uros Bizjak May 20, 2019, 9:51 p.m. UTC | #2

On Mon, May 20, 2019 at 11:39 PM Jeff Law <law@redhat.com> wrote:
>
> On 5/20/19 3:36 PM, H.J. Lu wrote:
> > With SSE emulation of MMX intrinsics in 64-bit mode,
> >
> > ---
> > __v8qi test ()
> > {
> >   __v8qi mm0 = {1,2,3,4,5,6,7,8};
> >   __v8qi mm1 = {11,22,33,44,55,66,77,88};
> >   volatile __m64 x;
> >
> >   x = _mm_add_pi8 (mm0, mm1);
> >
> >   return x;
> > }
> > ---
> >
> > is compiled into
> >
> >       movq    .LC0(%rip), %xmm0
> >       movq    .LC1(%rip), %xmm1
> >       paddb   %xmm1, %xmm0
> >       movq    %xmm0, -8(%rsp)
> >       movq    -8(%rsp), %xmm0
> >       ret
> >
> > instead of
> >
> >       movq    .LC1(%rip), %mm0
> >       paddb   .LC0(%rip), %mm0
> >       movq    %mm0, -8(%rsp)
> >       movq    -8(%rsp), %xmm0
> >       ret
> >
> > Adjust gcc.target/i386/pr22076.c for 64-bit.
> >
> >       * gcc.target/i386/pr22076.c: Adjusted for 64-bit.
> Well, it looks like you're just papering over a code quality regression?
> Or am I missing something?

We have to load 64bit value from memory to 128 bit XMM register using movq.

OTOH, we could just use -mno-sse2 which disables XMM emulation.

Uros.

H.J. Lu May 20, 2019, 9:54 p.m. UTC | #3

On Mon, May 20, 2019 at 2:52 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, May 20, 2019 at 11:39 PM Jeff Law <law@redhat.com> wrote:
> >
> > On 5/20/19 3:36 PM, H.J. Lu wrote:
> > > With SSE emulation of MMX intrinsics in 64-bit mode,
> > >
> > > ---
> > > __v8qi test ()
> > > {
> > >   __v8qi mm0 = {1,2,3,4,5,6,7,8};
> > >   __v8qi mm1 = {11,22,33,44,55,66,77,88};
> > >   volatile __m64 x;
> > >
> > >   x = _mm_add_pi8 (mm0, mm1);
> > >
> > >   return x;
> > > }
> > > ---
> > >
> > > is compiled into
> > >
> > >       movq    .LC0(%rip), %xmm0
> > >       movq    .LC1(%rip), %xmm1
> > >       paddb   %xmm1, %xmm0
> > >       movq    %xmm0, -8(%rsp)
> > >       movq    -8(%rsp), %xmm0
> > >       ret
> > >
> > > instead of
> > >
> > >       movq    .LC1(%rip), %mm0
> > >       paddb   .LC0(%rip), %mm0
> > >       movq    %mm0, -8(%rsp)
> > >       movq    -8(%rsp), %xmm0
> > >       ret
> > >
> > > Adjust gcc.target/i386/pr22076.c for 64-bit.
> > >
> > >       * gcc.target/i386/pr22076.c: Adjusted for 64-bit.
> > Well, it looks like you're just papering over a code quality regression?
> > Or am I missing something?
>
> We have to load 64bit value from memory to 128 bit XMM register using movq.
>
> OTOH, we could just use -mno-sse2 which disables XMM emulation.
>

Works for me.

Rainer Orth May 20, 2019, 9:54 p.m. UTC | #4

"H.J. Lu" <hjl.tools@gmail.com> writes:

> With SSE emulation of MMX intrinsics in 64-bit mode,
>
> ---
> __v8qi test ()
> {
>   __v8qi mm0 = {1,2,3,4,5,6,7,8};
>   __v8qi mm1 = {11,22,33,44,55,66,77,88};
>   volatile __m64 x;
>
>   x = _mm_add_pi8 (mm0, mm1);
>
>   return x;
> }
> ---
>
> is compiled into
>
> 	movq	.LC0(%rip), %xmm0
> 	movq	.LC1(%rip), %xmm1
> 	paddb	%xmm1, %xmm0
> 	movq	%xmm0, -8(%rsp)
> 	movq	-8(%rsp), %xmm0
> 	ret
>
> instead of
>
> 	movq	.LC1(%rip), %mm0
> 	paddb	.LC0(%rip), %mm0
> 	movq	%mm0, -8(%rsp)
> 	movq	-8(%rsp), %xmm0
> 	ret

This is PR target/90503.

	Rainer

Jeff Law May 20, 2019, 9:54 p.m. UTC | #5

On 5/20/19 3:51 PM, Uros Bizjak wrote:
> On Mon, May 20, 2019 at 11:39 PM Jeff Law <law@redhat.com> wrote:
>>
>> On 5/20/19 3:36 PM, H.J. Lu wrote:
>>> With SSE emulation of MMX intrinsics in 64-bit mode,
>>>
>>> ---
>>> __v8qi test ()
>>> {
>>>   __v8qi mm0 = {1,2,3,4,5,6,7,8};
>>>   __v8qi mm1 = {11,22,33,44,55,66,77,88};
>>>   volatile __m64 x;
>>>
>>>   x = _mm_add_pi8 (mm0, mm1);
>>>
>>>   return x;
>>> }
>>> ---
>>>
>>> is compiled into
>>>
>>>       movq    .LC0(%rip), %xmm0
>>>       movq    .LC1(%rip), %xmm1
>>>       paddb   %xmm1, %xmm0
>>>       movq    %xmm0, -8(%rsp)
>>>       movq    -8(%rsp), %xmm0
>>>       ret
>>>
>>> instead of
>>>
>>>       movq    .LC1(%rip), %mm0
>>>       paddb   .LC0(%rip), %mm0
>>>       movq    %mm0, -8(%rsp)
>>>       movq    -8(%rsp), %xmm0
>>>       ret
>>>
>>> Adjust gcc.target/i386/pr22076.c for 64-bit.
>>>
>>>       * gcc.target/i386/pr22076.c: Adjusted for 64-bit.
>> Well, it looks like you're just papering over a code quality regression?
>> Or am I missing something?
> 
> We have to load 64bit value from memory to 128 bit XMM register using movq.
> 
> OTOH, we could just use -mno-sse2 which disables XMM emulation.
Ah, we can't have a MEM operand on paddb for xmm registers...

Jeff

Jakub Jelinek May 20, 2019, 9:57 p.m. UTC | #6

On Mon, May 20, 2019 at 03:54:49PM -0600, Jeff Law wrote:
> >>> Adjust gcc.target/i386/pr22076.c for 64-bit.
> >>>
> >>>       * gcc.target/i386/pr22076.c: Adjusted for 64-bit.
> >> Well, it looks like you're just papering over a code quality regression?
> >> Or am I missing something?
> > 
> > We have to load 64bit value from memory to 128 bit XMM register using movq.
> > 
> > OTOH, we could just use -mno-sse2 which disables XMM emulation.
> Ah, we can't have a MEM operand on paddb for xmm registers...

We can, but then it will load 128-bits from memory instead of just 64-bits,
it might not be sufficiently aligned, unaligned load might cross page
boundary, ...
I think what we generate is fine.

	Jakub

Uros Bizjak May 20, 2019, 10:03 p.m. UTC | #7

On Mon, May 20, 2019 at 11:54 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> > > > Adjust gcc.target/i386/pr22076.c for 64-bit.
> > > >
> > > >       * gcc.target/i386/pr22076.c: Adjusted for 64-bit.
> > > Well, it looks like you're just papering over a code quality regression?
> > > Or am I missing something?
> >
> > We have to load 64bit value from memory to 128 bit XMM register using movq.
> >
> > OTOH, we could just use -mno-sse2 which disables XMM emulation.
> >
>
> Works for me.

2019-05-20  Uroš Bizjak  <ubizjak@gmail.com>

    PR testsuite/90503
    * gcc.target/i386/pr22076.c (dg-options): Add -mno-sse2.

Tested on x86_64-linux-gnu and committed.

Uros.
Index: gcc.target/i386/pr22076.c
===================================================================
--- gcc.target/i386/pr22076.c	(revision 271430)
+++ gcc.target/i386/pr22076.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fomit-frame-pointer -flax-vector-conversions -mmmx" } */
+/* { dg-options "-O2 -fomit-frame-pointer -flax-vector-conversions -mmmx -mno-sse2" } */
 /* { dg-additional-options "-mno-vect8-ret-in-mem" { target *-*-vxworks* } } */
 
 #include <mmintrin.h>

Uros Bizjak May 20, 2019, 10:22 p.m. UTC | #8

On Tue, May 21, 2019 at 12:03 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, May 20, 2019 at 11:54 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> > > > > Adjust gcc.target/i386/pr22076.c for 64-bit.
> > > > >
> > > > >       * gcc.target/i386/pr22076.c: Adjusted for 64-bit.
> > > > Well, it looks like you're just papering over a code quality regression?
> > > > Or am I missing something?
> > >
> > > We have to load 64bit value from memory to 128 bit XMM register using movq.
> > >
> > > OTOH, we could just use -mno-sse2 which disables XMM emulation.
> > >
> >
> > Works for me.
>
> 2019-05-20  Uroš Bizjak  <ubizjak@gmail.com>
>
>     PR testsuite/90503
>     * gcc.target/i386/pr22076.c (dg-options): Add -mno-sse2.
>
> Tested on x86_64-linux-gnu and committed.

While looking at the testcase, attached patch modernizes it a bit.

2019-05-20  Uroš Bizjak  <ubizjak@gmail.com>

    PR testsuite/90503
    * gcc.target/i386/pr22076.c (dg-options): Add -mno-sse2.
    Remove -flax-vector-conversions.
    (dg-additional-options): Remove.
    (test): Change to void.  Declare m0 and m1 as __m64 and
    cast initializer in a proper way.  Do not return result.
    (dg-final): Scan for 2 instances of movq.

Uros.
Index: gcc.target/i386/pr22076.c
===================================================================
--- gcc.target/i386/pr22076.c	(revision 271442)
+++ gcc.target/i386/pr22076.c	(working copy)
@@ -1,19 +1,17 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fomit-frame-pointer -flax-vector-conversions -mmmx -mno-sse2" } */
-/* { dg-additional-options "-mno-vect8-ret-in-mem" { target *-*-vxworks* } } */
+/* { dg-options "-O2 -fomit-frame-pointer -mmmx -mno-sse2" } */
 
 #include <mmintrin.h>
 
-__v8qi test ()
+__m64 x;
+
+void test ()
 {
-  __v8qi mm0 = {1,2,3,4,5,6,7,8};
-  __v8qi mm1 = {11,22,33,44,55,66,77,88};
-  volatile __m64 x;
+  __m64 mm0 = (__m64)(__v8qi) {1,2,3,4,5,6,7,8};
+  __m64 mm1 = (__m64)(__v8qi) {11,22,33,44,55,66,77,88};
 
   x = _mm_add_pi8 (mm0, mm1);
-
-  return x;
 }
 
-/* { dg-final { scan-assembler-times "movq" 3 } } */
+/* { dg-final { scan-assembler-times "movq" 2 } } */
 /* { dg-final { scan-assembler-not "movl" { target nonpic } } } */

i386: Adjust gcc.target/i386/pr22076.c for 64-bit

Commit Message

Comments

Patch