diff mbox series

[AArch64] Don't split 64-bit constant stores to volatile location

Message ID e0374ecb-054b-94f3-e344-24e06a070b04@foss.arm.com
State New
Headers show
Series [AArch64] Don't split 64-bit constant stores to volatile location | expand

Commit Message

Kyrill Tkachov Aug. 22, 2019, 9:16 a.m. UTC
Hi all,

The optimisation to optimise:
    typedef unsigned long long u64;

    void bar(u64 *x)
    {
      *x = 0xabcdef10abcdef10;
    }

from:
         mov     x1, 61200
         movk    x1, 0xabcd, lsl 16
         movk    x1, 0xef10, lsl 32
         movk    x1, 0xabcd, lsl 48
         str     x1, [x0]

into:
         mov     w1, 61200
         movk    w1, 0xabcd, lsl 16
         stp     w1, w1, [x0]

ends up producing two distinct stores if the destination is volatile:
   void bar(u64 *x)
   {
     *(volatile u64 *)x = 0xabcdef10abcdef10;
   }
         mov     w1, 61200
         movk    w1, 0xabcd, lsl 16
         str     w1, [x0]
         str     w1, [x0, 4]

because we end up not merging the strs into an stp. It's questionable 
whether the use of STP is valid for volatile in the first place.
To avoid unnecessary pain in a context where it's unlikely to be 
performance critical [1] (use of volatile), this patch avoids this
transformation for volatile destinations, so we produce the original 
single STR-X.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk (and eventual backports)?

Thanks,
Kyrill

[1] 
https://lore.kernel.org/lkml/20190821103200.kpufwtviqhpbuv2n@willie-the-truck/


gcc/
2019-08-22  Kyrylo Tkachov <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md (mov<mode>): Don't call
     aarch64_split_dimode_const_store on volatile MEM.

gcc/testsuite/
2019-08-22  Kyrylo Tkachov <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/nosplit-di-const-volatile_1.c: New test.

Comments

Andrew Pinski Aug. 22, 2019, 9:33 a.m. UTC | #1
On Thu, Aug 22, 2019 at 2:16 AM Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> Hi all,
>
> The optimisation to optimise:
>     typedef unsigned long long u64;
>
>     void bar(u64 *x)
>     {
>       *x = 0xabcdef10abcdef10;
>     }
>
> from:
>          mov     x1, 61200
>          movk    x1, 0xabcd, lsl 16
>          movk    x1, 0xef10, lsl 32
>          movk    x1, 0xabcd, lsl 48
>          str     x1, [x0]
>
> into:
>          mov     w1, 61200
>          movk    w1, 0xabcd, lsl 16
>          stp     w1, w1, [x0]
>
> ends up producing two distinct stores if the destination is volatile:
>    void bar(u64 *x)
>    {
>      *(volatile u64 *)x = 0xabcdef10abcdef10;
>    }
>          mov     w1, 61200
>          movk    w1, 0xabcd, lsl 16
>          str     w1, [x0]
>          str     w1, [x0, 4]
>
> because we end up not merging the strs into an stp. It's questionable
> whether the use of STP is valid for volatile in the first place.

It is not valid as the architecture does not require stp to be atomic
so then the order of the stores can change

Thanks,
Andrew Pinski

> To avoid unnecessary pain in a context where it's unlikely to be
> performance critical [1] (use of volatile), this patch avoids this
> transformation for volatile destinations, so we produce the original
> single STR-X.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for trunk (and eventual backports)?
>
> Thanks,
> Kyrill
>
> [1]
> https://lore.kernel.org/lkml/20190821103200.kpufwtviqhpbuv2n@willie-the-truck/
>
>
> gcc/
> 2019-08-22  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>      * config/aarch64/aarch64.md (mov<mode>): Don't call
>      aarch64_split_dimode_const_store on volatile MEM.
>
> gcc/testsuite/
> 2019-08-22  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>      * gcc.target/aarch64/nosplit-di-const-volatile_1.c: New test.
>
Kyrill Tkachov Sept. 24, 2019, 1:40 p.m. UTC | #2
Hi all,

On 8/22/19 10:16 AM, Kyrill Tkachov wrote:
> Hi all,
>
> The optimisation to optimise:
>    typedef unsigned long long u64;
>
>    void bar(u64 *x)
>    {
>      *x = 0xabcdef10abcdef10;
>    }
>
> from:
>         mov     x1, 61200
>         movk    x1, 0xabcd, lsl 16
>         movk    x1, 0xef10, lsl 32
>         movk    x1, 0xabcd, lsl 48
>         str     x1, [x0]
>
> into:
>         mov     w1, 61200
>         movk    w1, 0xabcd, lsl 16
>         stp     w1, w1, [x0]
>
> ends up producing two distinct stores if the destination is volatile:
>   void bar(u64 *x)
>   {
>     *(volatile u64 *)x = 0xabcdef10abcdef10;
>   }
>         mov     w1, 61200
>         movk    w1, 0xabcd, lsl 16
>         str     w1, [x0]
>         str     w1, [x0, 4]
>
> because we end up not merging the strs into an stp. It's questionable 
> whether the use of STP is valid for volatile in the first place.
> To avoid unnecessary pain in a context where it's unlikely to be 
> performance critical [1] (use of volatile), this patch avoids this
> transformation for volatile destinations, so we produce the original 
> single STR-X.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for trunk (and eventual backports)?
>
This has been approved by James offline.

Committed to trunk with r276098.

Thanks,

Kyrill

> Thanks,
> Kyrill
>
> [1] 
> https://lore.kernel.org/lkml/20190821103200.kpufwtviqhpbuv2n@willie-the-truck/
>
>
> gcc/
> 2019-08-22  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>     * config/aarch64/aarch64.md (mov<mode>): Don't call
>     aarch64_split_dimode_const_store on volatile MEM.
>
> gcc/testsuite/
> 2019-08-22  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>     * gcc.target/aarch64/nosplit-di-const-volatile_1.c: New test.
>
James Greenhalgh Sept. 25, 2019, 9:24 p.m. UTC | #3
On Tue, Sep 24, 2019 at 02:40:20PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> On 8/22/19 10:16 AM, Kyrill Tkachov wrote:
> > Hi all,
> >
> > The optimisation to optimise:
> >    typedef unsigned long long u64;
> >
> >    void bar(u64 *x)
> >    {
> >      *x = 0xabcdef10abcdef10;
> >    }
> >
> > from:
> >         mov     x1, 61200
> >         movk    x1, 0xabcd, lsl 16
> >         movk    x1, 0xef10, lsl 32
> >         movk    x1, 0xabcd, lsl 48
> >         str     x1, [x0]
> >
> > into:
> >         mov     w1, 61200
> >         movk    w1, 0xabcd, lsl 16
> >         stp     w1, w1, [x0]
> >
> > ends up producing two distinct stores if the destination is volatile:
> >   void bar(u64 *x)
> >   {
> >     *(volatile u64 *)x = 0xabcdef10abcdef10;
> >   }
> >         mov     w1, 61200
> >         movk    w1, 0xabcd, lsl 16
> >         str     w1, [x0]
> >         str     w1, [x0, 4]
> >
> > because we end up not merging the strs into an stp. It's questionable 
> > whether the use of STP is valid for volatile in the first place.
> > To avoid unnecessary pain in a context where it's unlikely to be 
> > performance critical [1] (use of volatile), this patch avoids this
> > transformation for volatile destinations, so we produce the original 
> > single STR-X.
> >
> > Bootstrapped and tested on aarch64-none-linux-gnu.
> >
> > Ok for trunk (and eventual backports)?
> >
> This has been approved by James offline.
> 
> Committed to trunk with r276098.

Does this need backporting?

Thanks,
James

> 
> Thanks,
> 
> Kyrill
> 
> > Thanks,
> > Kyrill
> >
> > [1] 
> > https://lore.kernel.org/lkml/20190821103200.kpufwtviqhpbuv2n@willie-the-truck/
> >
> >
> > gcc/
> > 2019-08-22  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
> >
> >     * config/aarch64/aarch64.md (mov<mode>): Don't call
> >     aarch64_split_dimode_const_store on volatile MEM.
> >
> > gcc/testsuite/
> > 2019-08-22  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
> >
> >     * gcc.target/aarch64/nosplit-di-const-volatile_1.c: New test.
> >
Kyrill Tkachov Sept. 26, 2019, 8:02 a.m. UTC | #4
On 9/25/19 10:24 PM, James Greenhalgh wrote:
> On Tue, Sep 24, 2019 at 02:40:20PM +0100, Kyrill Tkachov wrote:
>> Hi all,
>>
>> On 8/22/19 10:16 AM, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> The optimisation to optimise:
>>>     typedef unsigned long long u64;
>>>
>>>     void bar(u64 *x)
>>>     {
>>>       *x = 0xabcdef10abcdef10;
>>>     }
>>>
>>> from:
>>>          mov     x1, 61200
>>>          movk    x1, 0xabcd, lsl 16
>>>          movk    x1, 0xef10, lsl 32
>>>          movk    x1, 0xabcd, lsl 48
>>>          str     x1, [x0]
>>>
>>> into:
>>>          mov     w1, 61200
>>>          movk    w1, 0xabcd, lsl 16
>>>          stp     w1, w1, [x0]
>>>
>>> ends up producing two distinct stores if the destination is volatile:
>>>    void bar(u64 *x)
>>>    {
>>>      *(volatile u64 *)x = 0xabcdef10abcdef10;
>>>    }
>>>          mov     w1, 61200
>>>          movk    w1, 0xabcd, lsl 16
>>>          str     w1, [x0]
>>>          str     w1, [x0, 4]
>>>
>>> because we end up not merging the strs into an stp. It's questionable
>>> whether the use of STP is valid for volatile in the first place.
>>> To avoid unnecessary pain in a context where it's unlikely to be
>>> performance critical [1] (use of volatile), this patch avoids this
>>> transformation for volatile destinations, so we produce the original
>>> single STR-X.
>>>
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>>
>>> Ok for trunk (and eventual backports)?
>>>
>> This has been approved by James offline.
>>
>> Committed to trunk with r276098.
> Does this need backporting?

Yeah, this has been in since 2016. I've tested it on GCC 8 and 9 and 
will backport there and to 7 in due time.

Kyrill


>
> Thanks,
> James
>
>> Thanks,
>>
>> Kyrill
>>
>>> Thanks,
>>> Kyrill
>>>
>>> [1]
>>> https://lore.kernel.org/lkml/20190821103200.kpufwtviqhpbuv2n@willie-the-truck/
>>>
>>>
>>> gcc/
>>> 2019-08-22  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>
>>>      * config/aarch64/aarch64.md (mov<mode>): Don't call
>>>      aarch64_split_dimode_const_store on volatile MEM.
>>>
>>> gcc/testsuite/
>>> 2019-08-22  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>
>>>      * gcc.target/aarch64/nosplit-di-const-volatile_1.c: New test.
>>>
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 753c7978700d441c71cf97d5ae1e11f160b270af..cb91da796b3e9bfd6f19e714ccee7791b9cf3714 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1108,8 +1108,8 @@ 
 	(match_operand:GPI 1 "general_operand"))]
   ""
   "
-    if (MEM_P (operands[0]) && CONST_INT_P (operands[1])
-	&& <MODE>mode == DImode
+    if (MEM_P (operands[0]) && !MEM_VOLATILE_P (operands[0])
+	&& CONST_INT_P (operands[1]) && <MODE>mode == DImode
 	&& aarch64_split_dimode_const_store (operands[0], operands[1]))
       DONE;
 
diff --git a/gcc/testsuite/gcc.target/aarch64/nosplit-di-const-volatile_1.c b/gcc/testsuite/gcc.target/aarch64/nosplit-di-const-volatile_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..da5975ad1652c00a3b69b3dc19e5c09c77526de7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/nosplit-di-const-volatile_1.c
@@ -0,0 +1,15 @@ 
+/* Check that storing the 64-bit immediate to a volatile location is done
+   with a single store.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned long long u64;
+
+void bar (u64 *x)
+{
+  *(volatile u64 *)x = 0xabcdef10abcdef10ULL;
+}
+
+/* { dg-final { scan-assembler-times "str\tx..?, .*" 1 } } */
+/* { dg-final { scan-assembler-not "str\tw..?, .*" } } */