diff mbox series

PR target/114187: Fix ?Fmode SUBREG simplification in simplify_subreg.

Message ID 000901da6d6a$2d642630$882c7290$@nextmovesoftware.com
State New
Headers show
Series PR target/114187: Fix ?Fmode SUBREG simplification in simplify_subreg. | expand

Commit Message

Roger Sayle March 3, 2024, 12:56 p.m. UTC
This patch fixes PR target/114187 a typo/missed-optimization in simplify-rtx
that's exposed by (my) changes to x86_64's parameter passing.  The context
is that construction of double word (TImode) values now uses the idiom:

(ior:TI (ashift:TI (zero_extend:TI (reg:DI x)) (const_int 64 [0x40]))
        (zero_extend:TI (reg:DI y)))

Extracting the DImode highpart and lowpart halves of this complex expression
is supported by simplications in simplify_subreg.  The problem is when the
doubleword TImode value represents two DFmode fields, there isn't a direct
simplification to extract the highpart or lowpart SUBREGs, but instead GCC
uses two steps, extract the DImode {high,low} part and then cast the result
back to a floating point mode, DFmode.

The (buggy) code to do this is:

  /* If the outer mode is not integral, try taking a subreg with the
equivalent
     integer outer mode and then bitcasting the result.
     Other simplifications rely on integer to integer subregs and we'd
     potentially miss out on optimizations otherwise.  */
  if (known_gt (GET_MODE_SIZE (innermode),
                GET_MODE_SIZE (outermode))
      && SCALAR_INT_MODE_P (innermode)
      && !SCALAR_INT_MODE_P (outermode)
      && int_mode_for_size (GET_MODE_BITSIZE (outermode),
                            0).exists (&int_outermode))
    {
      rtx tem = simplify_subreg (int_outermode, op, innermode, byte);
      if (tem)
        return simplify_gen_subreg (outermode, tem, int_outermode, byte);
    }

The issue/mistake is that the second call, to simplify_subreg, shouldn't
use "byte" as the final argument; the offset has already been handled by
the first call, to simplify_subreg, and this second call is just a type
conversion from an integer mode to floating point (from DImode to DFmode).

Interestingly, this mistake was already spotted by Richard Sandiford when
the optimization was originally contributed in January 2023.
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610920.html
>> Richard Sandiford writes:
>> Also, the final line should pass 0 rather than byte.

Unfortunately a miscommunication/misunderstanding in a later thread
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612898.html
resulted in this correction being undone.  Alas the lack of any test
cases when the optimization was added/modified potentially contributed
to this lapse.  Using lowpart_subreg should avoid/reduce confusion in
future.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-03-03  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR target/114187
        * simplify-rtx.cc (simplify_context::simplify_subreg): Call
        lowpart_subreg to perform type conversion, to avoid confusion
        over the offset to use in the call to simplify_reg_subreg.

gcc/testsuite/ChangeLog
        PR target/114187
        * g++.target/i386/pr114187.C: New test case.


Thanks in advance,
Roger
--

Comments

Richard Biener March 3, 2024, 1:56 p.m. UTC | #1
> Am 03.03.2024 um 13:56 schrieb Roger Sayle <roger@nextmovesoftware.com>:
> 
> 
> This patch fixes PR target/114187 a typo/missed-optimization in simplify-rtx
> that's exposed by (my) changes to x86_64's parameter passing.  The context
> is that construction of double word (TImode) values now uses the idiom:
> 
> (ior:TI (ashift:TI (zero_extend:TI (reg:DI x)) (const_int 64 [0x40]))
>        (zero_extend:TI (reg:DI y)))
> 
> Extracting the DImode highpart and lowpart halves of this complex expression
> is supported by simplications in simplify_subreg.  The problem is when the
> doubleword TImode value represents two DFmode fields, there isn't a direct
> simplification to extract the highpart or lowpart SUBREGs, but instead GCC
> uses two steps, extract the DImode {high,low} part and then cast the result
> back to a floating point mode, DFmode.
> 
> The (buggy) code to do this is:
> 
>  /* If the outer mode is not integral, try taking a subreg with the
> equivalent
>     integer outer mode and then bitcasting the result.
>     Other simplifications rely on integer to integer subregs and we'd
>     potentially miss out on optimizations otherwise.  */
>  if (known_gt (GET_MODE_SIZE (innermode),
>                GET_MODE_SIZE (outermode))
>      && SCALAR_INT_MODE_P (innermode)
>      && !SCALAR_INT_MODE_P (outermode)
>      && int_mode_for_size (GET_MODE_BITSIZE (outermode),
>                            0).exists (&int_outermode))
>    {
>      rtx tem = simplify_subreg (int_outermode, op, innermode, byte);
>      if (tem)
>        return simplify_gen_subreg (outermode, tem, int_outermode, byte);
>    }
> 
> The issue/mistake is that the second call, to simplify_subreg, shouldn't
> use "byte" as the final argument; the offset has already been handled by
> the first call, to simplify_subreg, and this second call is just a type
> conversion from an integer mode to floating point (from DImode to DFmode).
> 
> Interestingly, this mistake was already spotted by Richard Sandiford when
> the optimization was originally contributed in January 2023.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610920.html
>>> Richard Sandiford writes:
>>> Also, the final line should pass 0 rather than byte.
> 
> Unfortunately a miscommunication/misunderstanding in a later thread
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612898.html
> resulted in this correction being undone.  Alas the lack of any test
> cases when the optimization was added/modified potentially contributed
> to this lapse.  Using lowpart_subreg should avoid/reduce confusion in
> future.
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

Ok

Richard 

> 
> 2024-03-03  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
>        PR target/114187
>        * simplify-rtx.cc (simplify_context::simplify_subreg): Call
>        lowpart_subreg to perform type conversion, to avoid confusion
>        over the offset to use in the call to simplify_reg_subreg.
> 
> gcc/testsuite/ChangeLog
>        PR target/114187
>        * g++.target/i386/pr114187.C: New test case.
> 
> 
> Thanks in advance,
> Roger
> --
> 
> <patchss.txt>
diff mbox series

Patch

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 36dd522..dceaa13 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -7846,7 +7846,7 @@  simplify_context::simplify_subreg (machine_mode outermode, rtx op,
     {
       rtx tem = simplify_subreg (int_outermode, op, innermode, byte);
       if (tem)
-	return simplify_gen_subreg (outermode, tem, int_outermode, byte);
+	return lowpart_subreg (outermode, tem, int_outermode);
     }
 
   /* If OP is a vector comparison and the subreg is not changing the
diff --git a/gcc/testsuite/g++.target/i386/pr114187.C b/gcc/testsuite/g++.target/i386/pr114187.C
new file mode 100644
index 0000000..69912a9
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr114187.C
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+struct P2d {
+    double x, y;
+};
+
+double sumxy_p(P2d p) {
+    return p.x + p.y;
+}
+
+/* { dg-final { scan-assembler-not "movq" } } */
+/* { dg-final { scan-assembler-not "xchg" } } */