diff mbox series

[aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb

Message ID 1505499759.2286.61.camel@cavium.com
State New
Headers show
Series [aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb | expand

Commit Message

Steve Ellcey Sept. 15, 2017, 6:22 p.m. UTC
PR 81356 points out that doing a __builtin_strcpy of an empty string on
aarch64 does a copy from memory instead of just writing out a zero byte.
In looking at this I found that it was because of
aarch64_use_by_pieces_infrastructure_p, which returns false for
STORE_BY_PIECES.  The comment says:

  /* STORE_BY_PIECES can be used when copying a constant string, but
     in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
     For now we always fail this and let the move_by_pieces code copy
     the string from read-only memory.  */

But this doesn't seem to be the case anymore.  When I remove this function
and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
for __builtin_strcpy of a constant string seems to be either better or the
same.  The only time I got more instructions after removing this function
was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
instructions to create the source followed by a store instead of doing a
load/store of 8 bytes.  The comment may have been applicable for
-mstrict-align at one time but it doesn't seem to be the case now.  I still
get better code without this routine under that option as well.

Bootstrapped and tested without regressions, OK to checkin?

Steve Ellcey
sellcey@cavium.com



2017-09-15  Steve Ellcey  <sellcey@cavium.com>

	PR target/81356
	* config/aarch64/aarch64.c (aarch64_use_by_pieces_infrastructure_p):
	Remove.
	(TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.


2017-09-15  Steve Ellcey  <sellcey@cavium.com>

	* gcc.target/aarch64/pr81356.c: New test.

Comments

Steve Ellcey Sept. 25, 2017, 5:36 p.m. UTC | #1
Ping.

Steve Ellcey
sellcey@cavium.com


On Fri, 2017-09-15 at 11:22 -0700, Steve Ellcey wrote:
> PR 81356 points out that doing a __builtin_strcpy of an empty string on
> aarch64 does a copy from memory instead of just writing out a zero byte.
> In looking at this I found that it was because of
> aarch64_use_by_pieces_infrastructure_p, which returns false for
> STORE_BY_PIECES.  The comment says:
> 
>   /* STORE_BY_PIECES can be used when copying a constant string, but
>      in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
>      For now we always fail this and let the move_by_pieces code copy
>      the string from read-only memory.  */
> 
> But this doesn't seem to be the case anymore.  When I remove this function
> and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
> for __builtin_strcpy of a constant string seems to be either better or the
> same.  The only time I got more instructions after removing this function
> was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
> instructions to create the source followed by a store instead of doing a
> load/store of 8 bytes.  The comment may have been applicable for
> -mstrict-align at one time but it doesn't seem to be the case now.  I still
> get better code without this routine under that option as well.
> 
> Bootstrapped and tested without regressions, OK to checkin?
> 
> Steve Ellcey
> sellcey@cavium.com
> 
> 
> 
> 2017-09-15  Steve Ellcey  <sellcey@cavium.com>
> 
> 	PR target/81356
> 	* config/aarch64/aarch64.c
> (aarch64_use_by_pieces_infrastructure_p):
> 	Remove.
> 	(TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.
> 
> 
> 2017-09-15  Steve Ellcey  <sellcey@cavium.com>
> 
> 	* gcc.target/aarch64/pr81356.c: New test.
Qing Zhao Oct. 3, 2017, 7:40 p.m. UTC | #2
I think the change is good.  But I don’t have the permission for approval…

Qing
> On Sep 25, 2017, at 12:36 PM, Steve Ellcey <sellcey@cavium.com> wrote:
> 
> Ping.
> 
> Steve Ellcey
> sellcey@cavium.com
> 
> 
> On Fri, 2017-09-15 at 11:22 -0700, Steve Ellcey wrote:
>> PR 81356 points out that doing a __builtin_strcpy of an empty string on
>> aarch64 does a copy from memory instead of just writing out a zero byte.
>> In looking at this I found that it was because of
>> aarch64_use_by_pieces_infrastructure_p, which returns false for
>> STORE_BY_PIECES.  The comment says:
>> 
>>   /* STORE_BY_PIECES can be used when copying a constant string, but
>>      in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
>>      For now we always fail this and let the move_by_pieces code copy
>>      the string from read-only memory.  */
>> 
>> But this doesn't seem to be the case anymore.  When I remove this function
>> and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
>> for __builtin_strcpy of a constant string seems to be either better or the
>> same.  The only time I got more instructions after removing this function
>> was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
>> instructions to create the source followed by a store instead of doing a
>> load/store of 8 bytes.  The comment may have been applicable for
>> -mstrict-align at one time but it doesn't seem to be the case now.  I still
>> get better code without this routine under that option as well.
>> 
>> Bootstrapped and tested without regressions, OK to checkin?
>> 
>> Steve Ellcey
>> sellcey@cavium.com
>> 
>> 
>> 
>> 2017-09-15  Steve Ellcey  <sellcey@cavium.com>
>> 
>> 	PR target/81356
>> 	* config/aarch64/aarch64.c
>> (aarch64_use_by_pieces_infrastructure_p):
>> 	Remove.
>> 	(TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.
>> 
>> 
>> 2017-09-15  Steve Ellcey  <sellcey@cavium.com>
>> 
>> 	* gcc.target/aarch64/pr81356.c: New test.
Steve Ellcey Oct. 24, 2017, 6:16 p.m. UTC | #3
Re-ping.

Steve Ellcey
sellcey@cavium.com

On Mon, 2017-09-25 at 10:36 -0700, Steve Ellcey wrote:
> Ping.
> 
> Steve Ellcey
> sellcey@cavium.com
> 
> 
> On Fri, 2017-09-15 at 11:22 -0700, Steve Ellcey wrote:
> > 
> > PR 81356 points out that doing a __builtin_strcpy of an empty string on
> > aarch64 does a copy from memory instead of just writing out a zero byte.
> > In looking at this I found that it was because of
> > aarch64_use_by_pieces_infrastructure_p, which returns false for
> > STORE_BY_PIECES.  The comment says:
> > 
> >   /* STORE_BY_PIECES can be used when copying a constant string, but
> >      in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
> >      For now we always fail this and let the move_by_pieces code copy
> >      the string from read-only memory.  */
> > 
> > But this doesn't seem to be the case anymore.  When I remove this function
> > and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
> > for __builtin_strcpy of a constant string seems to be either better or the
> > same.  The only time I got more instructions after removing this function
> > was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
> > instructions to create the source followed by a store instead of doing a
> > load/store of 8 bytes.  The comment may have been applicable for
> > -mstrict-align at one time but it doesn't seem to be the case now.  I still
> > get better code without this routine under that option as well.
> > 
> > Bootstrapped and tested without regressions, OK to checkin?
> > 
> > Steve Ellcey
> > sellcey@cavium.com
> > 
> > 
> > 
> > 2017-09-15  Steve Ellcey  <sellcey@cavium.com>
> > 
> > 	PR target/81356
> > 	* config/aarch64/aarch64.c
> > (aarch64_use_by_pieces_infrastructure_p):
> > 	Remove.
> > 	(TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.
> > 
> > 
> > 2017-09-15  Steve Ellcey  <sellcey@cavium.com>
> > 
> > 	* gcc.target/aarch64/pr81356.c: New test.
Steve Ellcey Nov. 15, 2017, 10:48 p.m. UTC | #4
re-re-ping.

Steve Ellcey
sellcey@cavium.com

On Tue, 2017-10-24 at 11:16 -0700, Steve Ellcey wrote:
> Re-ping.
> 
> Steve Ellcey
> sellcey@cavium.com
> 
> On Mon, 2017-09-25 at 10:36 -0700, Steve Ellcey wrote:
> > 
> > Ping.
> > 
> > Steve Ellcey
> > sellcey@cavium.com
> > 
> > 
> > On Fri, 2017-09-15 at 11:22 -0700, Steve Ellcey wrote:
> > > 
> > > 
> > > PR 81356 points out that doing a __builtin_strcpy of an empty
> > > string on
> > > aarch64 does a copy from memory instead of just writing out a
> > > zero byte.
> > > In looking at this I found that it was because of
> > > aarch64_use_by_pieces_infrastructure_p, which returns false for
> > > STORE_BY_PIECES.  The comment says:
> > > 
> > >   /* STORE_BY_PIECES can be used when copying a constant string,
> > > but
> > >      in that case each 64-bit chunk takes 5 insns instead of 2
> > > (LDR/STR).
> > >      For now we always fail this and let the move_by_pieces code
> > > copy
> > >      the string from read-only memory.  */
> > > 
> > > But this doesn't seem to be the case anymore.  When I remove this
> > > function
> > > and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it
> > > the code
> > > for __builtin_strcpy of a constant string seems to be either
> > > better or the
> > > same.  The only time I got more instructions after removing this
> > > function
> > > was on an 8 byte __builtin_strcpy where we now generate a mov and
> > > 3 movk
> > > instructions to create the source followed by a store instead of
> > > doing a
> > > load/store of 8 bytes.  The comment may have been applicable for
> > > -mstrict-align at one time but it doesn't seem to be the case
> > > now.  I still
> > > get better code without this routine under that option as well.
> > > 
> > > Bootstrapped and tested without regressions, OK to checkin?
> > > 
> > > Steve Ellcey
> > > sellcey@cavium.com
> > > 
> > > 
> > > 
> > > 2017-09-15  Steve Ellcey  <sellcey@cavium.com>
> > > 
> > > 	PR target/81356
> > > 	* config/aarch64/aarch64.c
> > > (aarch64_use_by_pieces_infrastructure_p):
> > > 	Remove.
> > > 	(TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.
> > > 
> > > 
> > > 2017-09-15  Steve Ellcey  <sellcey@cavium.com>
> > > 
> > > 	* gcc.target/aarch64/pr81356.c: New test.
James Greenhalgh Nov. 17, 2017, 9:41 p.m. UTC | #5
On Fri, Sep 15, 2017 at 07:22:39PM +0100, Steve Ellcey wrote:
> PR 81356 points out that doing a __builtin_strcpy of an empty string on
> aarch64 does a copy from memory instead of just writing out a zero byte.
> In looking at this I found that it was because of
> aarch64_use_by_pieces_infrastructure_p, which returns false for
> STORE_BY_PIECES.  The comment says:
> 
>   /* STORE_BY_PIECES can be used when copying a constant string, but
>      in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
>      For now we always fail this and let the move_by_pieces code copy
>      the string from read-only memory.  */
> 
> But this doesn't seem to be the case anymore.  When I remove this function
> and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
> for __builtin_strcpy of a constant string seems to be either better or the
> same.  The only time I got more instructions after removing this function
> was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
> instructions to create the source followed by a store instead of doing a
> load/store of 8 bytes.  The comment may have been applicable for
> -mstrict-align at one time but it doesn't seem to be the case now.  I still
> get better code without this routine under that option as well.

I've been trying to remember why this is the way it is, It looks like it dates
back to the initial port, and the only note I have on it points at an
incorrect PR number. So, I think this is probably a safe and sensible
choice.

OK.

Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>

Thanks,
James

> 
> Bootstrapped and tested without regressions, OK to checkin?
> 
> Steve Ellcey
> sellcey@cavium.com
> 
> 
> 
> 2017-09-15  Steve Ellcey  <sellcey@cavium.com>
> 
> 	PR target/81356
> 	* config/aarch64/aarch64.c (aarch64_use_by_pieces_infrastructure_p):
> 	Remove.
> 	(TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.
> 
> 
> 2017-09-15  Steve Ellcey  <sellcey@cavium.com>
> 
> 	* gcc.target/aarch64/pr81356.c: New test.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 1c14008..fc72236 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -14118,22 +14118,6 @@ aarch64_asan_shadow_offset (void)
>    return (HOST_WIDE_INT_1 << 36);
>  }
>  
> -static bool
> -aarch64_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
> -					unsigned int align,
> -					enum by_pieces_operation op,
> -					bool speed_p)
> -{
> -  /* STORE_BY_PIECES can be used when copying a constant string, but
> -     in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
> -     For now we always fail this and let the move_by_pieces code copy
> -     the string from read-only memory.  */
> -  if (op == STORE_BY_PIECES)
> -    return false;
> -
> -  return default_use_by_pieces_infrastructure_p (size, align, op, speed_p);
> -}
> -
>  static rtx
>  aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq,
>  			int code, tree treeop0, tree treeop1)
> @@ -15631,10 +15615,6 @@ aarch64_libgcc_floating_mode_supported_p
>  #undef TARGET_LEGITIMIZE_ADDRESS
>  #define TARGET_LEGITIMIZE_ADDRESS aarch64_legitimize_address
>  
> -#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
> -#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
> -  aarch64_use_by_pieces_infrastructure_p
> -
>  #undef TARGET_SCHED_CAN_SPECULATE_INSN
>  #define TARGET_SCHED_CAN_SPECULATE_INSN aarch64_sched_can_speculate_insn
>  

> diff --git a/gcc/testsuite/gcc.target/aarch64/pr81356.c b/gcc/testsuite/gcc.target/aarch64/pr81356.c
> index e69de29..9fd6baa 100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr81356.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr81356.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +void f(char *a)
> +{
> +  __builtin_strcpy (a, "");
> +}
> +
> +/* { dg-final { scan-assembler-not "ldrb" } } */
Christophe Lyon Nov. 20, 2017, 2:24 p.m. UTC | #6
On 17 November 2017 at 22:41, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> On Fri, Sep 15, 2017 at 07:22:39PM +0100, Steve Ellcey wrote:
>> PR 81356 points out that doing a __builtin_strcpy of an empty string on
>> aarch64 does a copy from memory instead of just writing out a zero byte.
>> In looking at this I found that it was because of
>> aarch64_use_by_pieces_infrastructure_p, which returns false for
>> STORE_BY_PIECES.  The comment says:
>>
>>   /* STORE_BY_PIECES can be used when copying a constant string, but
>>      in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
>>      For now we always fail this and let the move_by_pieces code copy
>>      the string from read-only memory.  */
>>
>> But this doesn't seem to be the case anymore.  When I remove this function
>> and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
>> for __builtin_strcpy of a constant string seems to be either better or the
>> same.  The only time I got more instructions after removing this function
>> was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
>> instructions to create the source followed by a store instead of doing a
>> load/store of 8 bytes.  The comment may have been applicable for
>> -mstrict-align at one time but it doesn't seem to be the case now.  I still
>> get better code without this routine under that option as well.
>
> I've been trying to remember why this is the way it is, It looks like it dates
> back to the initial port, and the only note I have on it points at an
> incorrect PR number. So, I think this is probably a safe and sensible
> choice.
>
> OK.
>

As a result of this patch, we now have:
XPASS: gfortran.dg/pr45636.f90   -O   scan-tree-dump-times forwprop2 "memset" 0
instead of:
XFAIL: gfortran.dg/pr45636.f90   -O   scan-tree-dump-times forwprop2
"memset" 0 (found 2 times)

Is it expected?

Christophe


> Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
>
> Thanks,
> James
>
>>
>> Bootstrapped and tested without regressions, OK to checkin?
>>
>> Steve Ellcey
>> sellcey@cavium.com
>>
>>
>>
>> 2017-09-15  Steve Ellcey  <sellcey@cavium.com>
>>
>>       PR target/81356
>>       * config/aarch64/aarch64.c (aarch64_use_by_pieces_infrastructure_p):
>>       Remove.
>>       (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.
>>
>>
>> 2017-09-15  Steve Ellcey  <sellcey@cavium.com>
>>
>>       * gcc.target/aarch64/pr81356.c: New test.
>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 1c14008..fc72236 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -14118,22 +14118,6 @@ aarch64_asan_shadow_offset (void)
>>    return (HOST_WIDE_INT_1 << 36);
>>  }
>>
>> -static bool
>> -aarch64_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
>> -                                     unsigned int align,
>> -                                     enum by_pieces_operation op,
>> -                                     bool speed_p)
>> -{
>> -  /* STORE_BY_PIECES can be used when copying a constant string, but
>> -     in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
>> -     For now we always fail this and let the move_by_pieces code copy
>> -     the string from read-only memory.  */
>> -  if (op == STORE_BY_PIECES)
>> -    return false;
>> -
>> -  return default_use_by_pieces_infrastructure_p (size, align, op, speed_p);
>> -}
>> -
>>  static rtx
>>  aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq,
>>                       int code, tree treeop0, tree treeop1)
>> @@ -15631,10 +15615,6 @@ aarch64_libgcc_floating_mode_supported_p
>>  #undef TARGET_LEGITIMIZE_ADDRESS
>>  #define TARGET_LEGITIMIZE_ADDRESS aarch64_legitimize_address
>>
>> -#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
>> -#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
>> -  aarch64_use_by_pieces_infrastructure_p
>> -
>>  #undef TARGET_SCHED_CAN_SPECULATE_INSN
>>  #define TARGET_SCHED_CAN_SPECULATE_INSN aarch64_sched_can_speculate_insn
>>
>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr81356.c b/gcc/testsuite/gcc.target/aarch64/pr81356.c
>> index e69de29..9fd6baa 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/pr81356.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/pr81356.c
>> @@ -0,0 +1,9 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +void f(char *a)
>> +{
>> +  __builtin_strcpy (a, "");
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "ldrb" } } */
>
Steve Ellcey Nov. 20, 2017, 11:50 p.m. UTC | #7
On Mon, 2017-11-20 at 15:24 +0100, Christophe Lyon wrote:

> As a result of this patch, we now have:
> XPASS: gfortran.dg/pr45636.f90   -O   scan-tree-dump-times forwprop2
> "memset" 0
> instead of:
> XFAIL: gfortran.dg/pr45636.f90   -O   scan-tree-dump-times forwprop2
> "memset" 0 (found 2 times)
> 
> Is it expected?
> 
> Christophe

I think it is.  I hadn't seen this failure before but I may not have
tested my change with Fortran.  I am going to consider fixing this an
obvious change and will check in the fix (removing aarch64 from the
list of xfails) today.

Steve Ellcey
sellcey@cavium.com
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.target/aarch64/pr81356.c b/gcc/testsuite/gcc.target/aarch64/pr81356.c
index e69de29..9fd6baa 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr81356.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr81356.c
@@ -0,0 +1,9 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+void f(char *a)
+{
+  __builtin_strcpy (a, "");
+}
+
+/* { dg-final { scan-assembler-not "ldrb" } } */