diff mbox series

PR rtl-optimization/7061: Complex number arguments on x86_64-like ABIs.

Message ID 00e101d8740c$e785e110$b691a330$@nextmovesoftware.com
State New
Headers show
Series PR rtl-optimization/7061: Complex number arguments on x86_64-like ABIs. | expand

Commit Message

Roger Sayle May 30, 2022, 10:06 a.m. UTC
This patch addresses the issue in comment #6 of PR rtl-optimization/7061
(a four digit PR number) from 2006 where on x86_64 complex number arguments
are unconditionally spilled to the stack.

For the test cases below:
float re(float _Complex a) { return __real__ a; }
float im(float _Complex a) { return __imag__ a; }

GCC with -O2 currently generates:

re:     movq    %xmm0, -8(%rsp)
        movss   -8(%rsp), %xmm0
        ret
im:     movq    %xmm0, -8(%rsp)
        movss   -4(%rsp), %xmm0
        ret

with this patch we now generate:

re:     ret
im:     movq    %xmm0, %rax
        shrq    $32, %rax
        movd    %eax, %xmm0
        ret

[Technically, this shift can be performed on %xmm0 in a single
instruction, but the backend needs to be taught to do that, the
important bit is that the SCmode argument isn't written to the
stack].

The patch itself is to emit_group_store where just before RTL
expansion commits to writing to the stack, we check if the store
group consists of a single scalar integer register that holds
a complex mode value; on x86_64 SCmode arguments are passed in
DImode registers.  If this is the case, we can use a SUBREG to
"view_convert" the integer to the equivalent complex mode.

An interesting corner case that showed up during testing is that
x86_64 also passes HCmode arguments in DImode registers(!), i.e.
using modes of different sizes.  This is easily handled/supported
by first converting to an integer mode of the correct size, and
then generating a complex mode SUBREG of this.  This is similar
in concept to the patch I proposed here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html
which was almost (but not quite) approved here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591139.html

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2020-05-30  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR rtl-optimization/7061
        * expr.cc (emit_group_stote): For groups that consist of a single
        scalar integer register that hold a complex mode value, use
        gen_lowpart to generate a SUBREG to "view_convert" to the complex
        mode.  For modes of different sizes, first convert to an integer
        mode of the appropriate size.

gcc/testsuite/ChangeLog
        PR rtl-optimization/7061
        * gcc.target/i386/pr7061-1.c: New test case.
        * gcc.target/i386/pr7061-2.c: New test case.


Thanks in advance,
Roger
--

Comments

Jeff Law June 1, 2022, 3:04 p.m. UTC | #1
On 5/30/2022 4:06 AM, Roger Sayle wrote:
> This patch addresses the issue in comment #6 of PR rtl-optimization/7061
> (a four digit PR number) from 2006 where on x86_64 complex number arguments
> are unconditionally spilled to the stack.
>
> For the test cases below:
> float re(float _Complex a) { return __real__ a; }
> float im(float _Complex a) { return __imag__ a; }
>
> GCC with -O2 currently generates:
>
> re:     movq    %xmm0, -8(%rsp)
>          movss   -8(%rsp), %xmm0
>          ret
> im:     movq    %xmm0, -8(%rsp)
>          movss   -4(%rsp), %xmm0
>          ret
>
> with this patch we now generate:
>
> re:     ret
> im:     movq    %xmm0, %rax
>          shrq    $32, %rax
>          movd    %eax, %xmm0
>          ret
>
> [Technically, this shift can be performed on %xmm0 in a single
> instruction, but the backend needs to be taught to do that, the
> important bit is that the SCmode argument isn't written to the
> stack].
>
> The patch itself is to emit_group_store where just before RTL
> expansion commits to writing to the stack, we check if the store
> group consists of a single scalar integer register that holds
> a complex mode value; on x86_64 SCmode arguments are passed in
> DImode registers.  If this is the case, we can use a SUBREG to
> "view_convert" the integer to the equivalent complex mode.
>
> An interesting corner case that showed up during testing is that
> x86_64 also passes HCmode arguments in DImode registers(!), i.e.
> using modes of different sizes.  This is easily handled/supported
> by first converting to an integer mode of the correct size, and
> then generating a complex mode SUBREG of this.  This is similar
> in concept to the patch I proposed here:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html
> which was almost (but not quite) approved here:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591139.html
Yea, sorry.  Too much to do at the new job.  Trying to work my way 
through queued up stuff now...


>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2020-05-30  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>          PR rtl-optimization/7061
>          * expr.cc (emit_group_stote): For groups that consist of a single
>          scalar integer register that hold a complex mode value, use
>          gen_lowpart to generate a SUBREG to "view_convert" to the complex
>          mode.  For modes of different sizes, first convert to an integer
>          mode of the appropriate size.
>
> gcc/testsuite/ChangeLog
>          PR rtl-optimization/7061
>          * gcc.target/i386/pr7061-1.c: New test case.
>          * gcc.target/i386/pr7061-2.c: New test case.
OK
jeff
diff mbox series

Patch

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 7197996..c9df206 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -2803,10 +2803,26 @@  emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
 	    {
 	      machine_mode dest_mode = GET_MODE (dest);
 	      machine_mode tmp_mode = GET_MODE (tmps[i]);
+	      scalar_int_mode imode;
 
 	      gcc_assert (known_eq (bytepos, 0) && XVECLEN (src, 0));
 
-	      if (GET_MODE_ALIGNMENT (dest_mode)
+	      if (finish == 1
+		  && REG_P (tmps[i])
+		  && COMPLEX_MODE_P (dest_mode)
+		  && SCALAR_INT_MODE_P (tmp_mode)
+		  && int_mode_for_mode (dest_mode).exists (&imode))
+		{
+		  if (tmp_mode != imode)
+		    {
+		      rtx tmp = gen_reg_rtx (imode);
+		      emit_move_insn (tmp, gen_lowpart (imode, tmps[i]));
+		      dst = gen_lowpart (dest_mode, tmp);
+		    }
+		  else
+		    dst = gen_lowpart (dest_mode, tmps[i]);
+		}
+	      else if (GET_MODE_ALIGNMENT (dest_mode)
 		  >= GET_MODE_ALIGNMENT (tmp_mode))
 		{
 		  dest = assign_stack_temp (dest_mode,
diff --git a/gcc/testsuite/gcc.target/i386/pr7061-1.c b/gcc/testsuite/gcc.target/i386/pr7061-1.c
new file mode 100644
index 0000000..ce5f6b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr7061-1.c
@@ -0,0 +1,4 @@ 
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+float re(float _Complex a) { return __real__ a; }
+/* { dg-final { scan-assembler-not "mov" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr7061-2.c b/gcc/testsuite/gcc.target/i386/pr7061-2.c
new file mode 100644
index 0000000..ac33340
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr7061-2.c
@@ -0,0 +1,5 @@ 
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+float im(float _Complex a) { return __imag__ a; }
+/* { dg-final { scan-assembler-not "movss" } } */
+/* { dg-final { scan-assembler-not "rsp" } } */