diff mbox series

[v3,aarch64] Correct the maximum shift amount for shifted operands.

Message ID 20181126195017.1718-1-christoph.muellner@theobroma-systems.com
State New
Headers show
Series [v3,aarch64] Correct the maximum shift amount for shifted operands. | expand

Commit Message

Christoph Muellner Nov. 26, 2018, 7:50 p.m. UTC
The aarch64 ISA specification allows a left shift amount to be applied
after extension in the range of 0 to 4 (encoded in the imm3 field).

This is true for at least the following instructions:

 * ADD (extend register)
 * ADDS (extended register)
 * SUB (extended register)

The result of this patch can be seen, when compiling the following code:

uint64_t myadd(uint64_t a, uint64_t b)
{
  return a+(((uint8_t)b)<<4);
}

Without the patch the following sequence will be generated:

0000000000000000 <myadd>:
   0:	d37c1c21 	ubfiz	x1, x1, #4, #8
   4:	8b000020 	add	x0, x1, x0
   8:	d65f03c0 	ret

With the patch the ubfiz will be merged into the add instruction:

0000000000000000 <myadd>:
   0:	8b211000 	add	x0, x0, w1, uxtb #4
   4:	d65f03c0 	ret

Tested with "make check" and no regressions found.

*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner  <christoph.muellner@theobroma-systems.com>
	    Philipp Tomsich  <philipp.tomsich@theobroma-systems.com>

	* config/aarch64/aarch64.c (aarch64_uxt_size): Correct the maximum
	shift amount for shifted operands.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Christoph Muellner  <christoph.muellner@theobroma-systems.com>
	    Philipp Tomsich  <philipp.tomsich@theobroma-systems.com>

	* gcc.target/aarch64/extend.c: Adjust the testcases to cover
	the changed shift amount.

Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
---
 gcc/config/aarch64/aarch64.c              |  2 +-
 gcc/testsuite/gcc.target/aarch64/extend.c | 28 ++++++++++++++++++++++++++++
 2 files changed, 29 insertions(+), 1 deletion(-)

Comments

Sam Tebbs Nov. 27, 2018, 1:06 p.m. UTC | #1
On 11/26/18 7:50 PM, Christoph Muellner wrote:
> The aarch64 ISA specification allows a left shift amount to be applied
> after extension in the range of 0 to 4 (encoded in the imm3 field).
>
> This is true for at least the following instructions:
>
> * ADD (extend register)
> * ADDS (extended register)
> * SUB (extended register)
>
> The result of this patch can be seen, when compiling the following code:
>
> uint64_t myadd(uint64_t a, uint64_t b)
> {
> return a+(((uint8_t)b)<<4);
> }
>
> Without the patch the following sequence will be generated:
>
> 0000000000000000 <myadd>:
> 0: d37c1c21 ubfiz x1, x1, #4, #8
> 4: 8b000020 add x0, x1, x0
> 8: d65f03c0 ret
>
> With the patch the ubfiz will be merged into the add instruction:
>
> 0000000000000000 <myadd>:
> 0: 8b211000 add x0, x0, w1, uxtb #4
> 4: d65f03c0 ret

Hi Christoph,

Thanks for this, I'm not a maintainer but this looks good to me. A good 
point to mention would be how it affects shifts smaller than 4 bits, 
since I don't see any testing for that in the files you have changed.

> Tested with "make check" and no regressions found.
Could you perhaps elaborate on your testing? So what triplets you 
tested, if you bootstrapped successfully etc.
Philipp Tomsich Nov. 27, 2018, 1:18 p.m. UTC | #2
Sam,

> On 27.11.2018, at 14:06, Sam Tebbs <Sam.Tebbs@arm.com> wrote:
> 
> 
> On 11/26/18 7:50 PM, Christoph Muellner wrote:
>> The aarch64 ISA specification allows a left shift amount to be applied
>> after extension in the range of 0 to 4 (encoded in the imm3 field).
>> 
>> This is true for at least the following instructions:
>> 
>> * ADD (extend register)
>> * ADDS (extended register)
>> * SUB (extended register)
>> 
>> The result of this patch can be seen, when compiling the following code:
>> 
>> uint64_t myadd(uint64_t a, uint64_t b)
>> {
>> return a+(((uint8_t)b)<<4);
>> }
>> 
>> Without the patch the following sequence will be generated:
>> 
>> 0000000000000000 <myadd>:
>> 0: d37c1c21 ubfiz x1, x1, #4, #8
>> 4: 8b000020 add x0, x1, x0
>> 8: d65f03c0 ret
>> 
>> With the patch the ubfiz will be merged into the add instruction:
>> 
>> 0000000000000000 <myadd>:
>> 0: 8b211000 add x0, x0, w1, uxtb #4
>> 4: d65f03c0 ret
> 
> Hi Christoph,
> 
> Thanks for this, I'm not a maintainer but this looks good to me. A good 
> point to mention would be how it affects shifts smaller than 4 bits, 
> since I don't see any testing for that in the files you have changed.
> 
>> Tested with "make check" and no regressions found.
> Could you perhaps elaborate on your testing? So what triplets you 
> tested, if you bootstrapped successfully etc.


Just one bit of background info...
We’ve had this change in production since 2014 both in AppliedMicro’s
(now Ampere’s) compiler branch as well as in Ubuntu PPA for various
workloads.

Thanks,
Philipp.
Christoph Muellner Nov. 27, 2018, 8:53 p.m. UTC | #3
> On 27.11.2018, at 14:04, Sam Tebbs <Sam.Tebbs@arm.com> wrote:
> 
> 
> On 11/26/18 7:50 PM, Christoph Muellner wrote:
>> The aarch64 ISA specification allows a left shift amount to be applied
>> after extension in the range of 0 to 4 (encoded in the imm3 field).
>> 
>> This is true for at least the following instructions:
>> 
>>  * ADD (extend register)
>>  * ADDS (extended register)
>>  * SUB (extended register)
>> 
>> The result of this patch can be seen, when compiling the following code:
>> 
>> uint64_t myadd(uint64_t a, uint64_t b)
>> {
>>   return a+(((uint8_t)b)<<4);
>> }
>> 
>> Without the patch the following sequence will be generated:
>> 
>> 0000000000000000 <myadd>:
>>    0:d37c1c21 ubfizx1, x1, #4, #8
>>    4:8b000020 addx0, x1, x0
>>    8:d65f03c0 ret
>> 
>> With the patch the ubfiz will be merged into the add instruction:
>> 
>> 0000000000000000 <myadd>:
>>    0:8b211000 addx0, x0, w1, uxtb #4
>>    4:d65f03c0 ret
> 
> Hi Christoph,
> 
> Thanks for this, I'm not a maintainer but this looks good to me. A good
> point to mention would be if it affects shifts smaller than 4 bits,
> since I don't see any testing for that in the files you have changed.
> 

Hi Sam,

shifts smaller than 4 bits are already tested in gcc/testsuite/gcc.target/aarch64/extend.c.
E.g. subdi_sxtw() does so for shift-by-3.

All existing test cases where executed and did not show any regressions.
In other words: shifts smaller than 4 bits are not affected.

>> Tested with "make check" and no regressions found.
> Could you perhaps elaborate on your testing? So what triplets you
> tested, if you bootstrapped successfully etc.

For the "make check" regression test, I compiled native on an AArch64 machine
running CentOS 7.5. Configure flags were "--enable-bootstrap".
I did so with and without the patch and compared the results for differences.
I think that's the essential requirement to get an OK for this patch.

Besides that we've had this change in our aarch64-linux-gnu toolchain since 2014.
This toolchain has been used to compile a wide range of OSS projects, benchmarks
as well as proprietary code over the years.

For example we ran the SPEC CPU2000, CPU2006, CPU2017 benchmarks,
built Linux kernels (at least from 4.4 to 4.19), glibc (from 2.17 to 2.28), binutils
(ancient times till 2.30), TCmalloc (2.6 and 2.7), jemalloc (4 and 5), buildroot (2017, 2018),
U-Boot (2015-2018), Tianocore, HHVM, OpenSSL, MySQL,...

Our work involved interwork with existing libraries (e.g. HHVM and its dependencies
to 100 external shared libraries) on several operating systems (Ubuntu, Fedora,
CentOS, OpenSuse, Oracle Linux, Debian, ...).

Besides LP64 we also used (and still use) the ILP32 ABI.

The binaries created by the compilers using that change were running
at least on APM Xgene processors, Ampere Computer's eMAG processors,
as well as on ARM's Cortex-A53, Cortex-A72 cores.

If you want me to test something specific, then just let me know.

Thanks,
Christoph
Richard Earnshaw (lists) Nov. 28, 2018, 12:10 p.m. UTC | #4
On 26/11/2018 19:50, Christoph Muellner wrote:
> The aarch64 ISA specification allows a left shift amount to be applied
> after extension in the range of 0 to 4 (encoded in the imm3 field).
> 
> This is true for at least the following instructions:
> 
>  * ADD (extend register)
>  * ADDS (extended register)
>  * SUB (extended register)
> 
> The result of this patch can be seen, when compiling the following code:
> 
> uint64_t myadd(uint64_t a, uint64_t b)
> {
>   return a+(((uint8_t)b)<<4);
> }
> 
> Without the patch the following sequence will be generated:
> 
> 0000000000000000 <myadd>:
>    0:	d37c1c21 	ubfiz	x1, x1, #4, #8
>    4:	8b000020 	add	x0, x1, x0
>    8:	d65f03c0 	ret
> 
> With the patch the ubfiz will be merged into the add instruction:
> 
> 0000000000000000 <myadd>:
>    0:	8b211000 	add	x0, x0, w1, uxtb #4
>    4:	d65f03c0 	ret
> 
> Tested with "make check" and no regressions found.
> 
> *** gcc/ChangeLog ***
> 
> 2018-xx-xx  Christoph Muellner  <christoph.muellner@theobroma-systems.com>
> 	    Philipp Tomsich  <philipp.tomsich@theobroma-systems.com>
> 
> 	* config/aarch64/aarch64.c (aarch64_uxt_size): Correct the maximum
> 	shift amount for shifted operands.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2018-xx-xx  Christoph Muellner  <christoph.muellner@theobroma-systems.com>
> 	    Philipp Tomsich  <philipp.tomsich@theobroma-systems.com>
> 
> 	* gcc.target/aarch64/extend.c: Adjust the testcases to cover
> 	the changed shift amount.
> 

This is OK.  Thanks.

R.

PS, I was sufficiently surprised by this that I went and checked the
original commit (it's not an obvious off-by-one error).  But it does
appear that it's been this way since the code was originally added
(prior to the initial publication of the port) and there's no obvious
reason why.


> Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
> ---
>  gcc/config/aarch64/aarch64.c              |  2 +-
>  gcc/testsuite/gcc.target/aarch64/extend.c | 28 ++++++++++++++++++++++++++++
>  2 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 90bbc57..27b53f4 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -8226,7 +8226,7 @@ aarch64_output_casesi (rtx *operands)
>  int
>  aarch64_uxt_size (int shift, HOST_WIDE_INT mask)
>  {
> -  if (shift >= 0 && shift <= 3)
> +  if (shift >= 0 && shift <= 4)
>      {
>        int size;
>        for (size = 8; size <= 32; size *= 2)
> diff --git a/gcc/testsuite/gcc.target/aarch64/extend.c b/gcc/testsuite/gcc.target/aarch64/extend.c
> index f399e55..19b120d 100644
> --- a/gcc/testsuite/gcc.target/aarch64/extend.c
> +++ b/gcc/testsuite/gcc.target/aarch64/extend.c
> @@ -37,6 +37,13 @@ adddi_uxtw (unsigned long long a, unsigned int i)
>  }
>  
>  unsigned long long
> +adddi_uxtw4 (unsigned long long a, unsigned int i)
> +{
> +  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw #?4" } } */
> +  return a + ((unsigned long long)i << 4);
> +}
> +
> +unsigned long long
>  adddi_uxtw0 (unsigned long long a, unsigned int i)
>  {
>    /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw\n" } } */
> @@ -51,6 +58,13 @@ adddi_sxtw (long long a, int i)
>  }
>  
>  long long
> +adddi_sxtw4 (long long a, int i)
> +{
> +  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw #?4" } } */
> +  return a + ((long long)i << 4);
> +}
> +
> +long long
>  adddi_sxtw0 (long long a, int i)
>  {
>    /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw\n" } } */
> @@ -65,6 +79,13 @@ subdi_uxtw (unsigned long long a, unsigned int i)
>  }
>  
>  unsigned long long
> +subdi_uxtw4 (unsigned long long a, unsigned int i)
> +{
> +  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw #?4" } } */
> +  return a - ((unsigned long long)i << 4);
> +}
> +
> +unsigned long long
>  subdi_uxtw0 (unsigned long long a, unsigned int i)
>  {
>    /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw\n" } } */
> @@ -79,6 +100,13 @@ subdi_sxtw (long long a, int i)
>  }
>  
>  long long
> +subdi_sxtw4 (long long a, int i)
> +{
> +  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw #?4" } } */
> +  return a - ((long long)i << 4);
> +}
> +
> +long long
>  subdi_sxtw0 (long long a, int i)
>  {
>    /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw\n" } } */
>
Philipp Tomsich Nov. 28, 2018, 1:08 p.m. UTC | #5
> On 28.11.2018, at 13:10, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> 
> On 26/11/2018 19:50, Christoph Muellner wrote:
>> The aarch64 ISA specification allows a left shift amount to be applied
>> after extension in the range of 0 to 4 (encoded in the imm3 field).
>> 
>> This is true for at least the following instructions:
>> 
>> * ADD (extend register)
>> * ADDS (extended register)
>> * SUB (extended register)
>> 
>> The result of this patch can be seen, when compiling the following code:
>> 
>> uint64_t myadd(uint64_t a, uint64_t b)
>> {
>>  return a+(((uint8_t)b)<<4);
>> }
>> 
>> Without the patch the following sequence will be generated:
>> 
>> 0000000000000000 <myadd>:
>>   0:	d37c1c21 	ubfiz	x1, x1, #4, #8
>>   4:	8b000020 	add	x0, x1, x0
>>   8:	d65f03c0 	ret
>> 
>> With the patch the ubfiz will be merged into the add instruction:
>> 
>> 0000000000000000 <myadd>:
>>   0:	8b211000 	add	x0, x0, w1, uxtb #4
>>   4:	d65f03c0 	ret
>> 
>> Tested with "make check" and no regressions found.
>> 
>> *** gcc/ChangeLog ***
>> 
>> 2018-xx-xx  Christoph Muellner  <christoph.muellner@theobroma-systems.com>
>> 	    Philipp Tomsich  <philipp.tomsich@theobroma-systems.com>
>> 
>> 	* config/aarch64/aarch64.c (aarch64_uxt_size): Correct the maximum
>> 	shift amount for shifted operands.
>> 
>> *** gcc/testsuite/ChangeLog ***
>> 
>> 2018-xx-xx  Christoph Muellner  <christoph.muellner@theobroma-systems.com>
>> 	    Philipp Tomsich  <philipp.tomsich@theobroma-systems.com>
>> 
>> 	* gcc.target/aarch64/extend.c: Adjust the testcases to cover
>> 	the changed shift amount.
>> 
> 
> This is OK.  Thanks.
> 
> R.
> 
> PS, I was sufficiently surprised by this that I went and checked the
> original commit (it's not an obvious off-by-one error).  But it does
> appear that it's been this way since the code was originally added
> (prior to the initial publication of the port) and there's no obvious
> reason why.

Since we don’t have any Reported-by: tags in GCC: the credit for initially finding and
reporting this goes to AppliedMicro's original chip verification team for the XGene1.

Cheers,
Philipp.
James Greenhalgh Nov. 28, 2018, 4:31 p.m. UTC | #6
On Wed, Nov 28, 2018 at 07:08:02AM -0600, Philipp Tomsich wrote:
> 
> 
> On 28.11.2018, at 13:10, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com<mailto:Richard.Earnshaw@arm.com>> wrote:
> 
> On 26/11/2018 19:50, Christoph Muellner wrote:
> The aarch64 ISA specification allows a left shift amount to be applied
> after extension in the range of 0 to 4 (encoded in the imm3 field).
> 
> This is true for at least the following instructions:
> 
> * ADD (extend register)
> * ADDS (extended register)
> * SUB (extended register)
> 
> The result of this patch can be seen, when compiling the following code:
> 
> uint64_t myadd(uint64_t a, uint64_t b)
> {
>  return a+(((uint8_t)b)<<4);
> }
> 
> Without the patch the following sequence will be generated:
> 
> 0000000000000000 <myadd>:
>   0: d37c1c21  ubfiz x1, x1, #4, #8
>   4: 8b000020  add x0, x1, x0
>   8: d65f03c0  ret
> 
> With the patch the ubfiz will be merged into the add instruction:
> 
> 0000000000000000 <myadd>:
>   0: 8b211000  add x0, x0, w1, uxtb #4
>   4: d65f03c0  ret
> 
> Tested with "make check" and no regressions found.
> 
> *** gcc/ChangeLog ***
> 
> 2018-xx-xx  Christoph Muellner  <christoph.muellner@theobroma-systems.com<mailto:christoph.muellner@theobroma-systems.com>>
>     Philipp Tomsich  <philipp.tomsich@theobroma-systems.com<mailto:philipp.tomsich@theobroma-systems.com>>
> 
> * config/aarch64/aarch64.c (aarch64_uxt_size): Correct the maximum
> shift amount for shifted operands.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2018-xx-xx  Christoph Muellner  <christoph.muellner@theobroma-systems.com<mailto:christoph.muellner@theobroma-systems.com>>
>     Philipp Tomsich  <philipp.tomsich@theobroma-systems.com<mailto:philipp.tomsich@theobroma-systems.com>>
> 
> * gcc.target/aarch64/extend.c: Adjust the testcases to cover
> the changed shift amount.
> 
> 
> This is OK.  Thanks.
> 
> R.
> 
> PS, I was sufficiently surprised by this that I went and checked the
> original commit (it's not an obvious off-by-one error).  But it does
> appear that it's been this way since the code was originally added
> (prior to the initial publication of the port) and there's no obvious
> reason why.
> 
> Since we don’t have any Reported-by: tags in GCC: the credit for initially finding and
> reporting this goes to AppliedMicro's original chip verification team for the XGene1.

Good spot that team! This also took me down a rabbit hole of Architecture
Reference Manuals and old source trees wondering how we got off by one. My
best guess is that 0-3 just feels like the more natural range than 0-4...

Thanks for the patch and the distraction!

James
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 90bbc57..27b53f4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8226,7 +8226,7 @@  aarch64_output_casesi (rtx *operands)
 int
 aarch64_uxt_size (int shift, HOST_WIDE_INT mask)
 {
-  if (shift >= 0 && shift <= 3)
+  if (shift >= 0 && shift <= 4)
     {
       int size;
       for (size = 8; size <= 32; size *= 2)
diff --git a/gcc/testsuite/gcc.target/aarch64/extend.c b/gcc/testsuite/gcc.target/aarch64/extend.c
index f399e55..19b120d 100644
--- a/gcc/testsuite/gcc.target/aarch64/extend.c
+++ b/gcc/testsuite/gcc.target/aarch64/extend.c
@@ -37,6 +37,13 @@  adddi_uxtw (unsigned long long a, unsigned int i)
 }
 
 unsigned long long
+adddi_uxtw4 (unsigned long long a, unsigned int i)
+{
+  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw #?4" } } */
+  return a + ((unsigned long long)i << 4);
+}
+
+unsigned long long
 adddi_uxtw0 (unsigned long long a, unsigned int i)
 {
   /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw\n" } } */
@@ -51,6 +58,13 @@  adddi_sxtw (long long a, int i)
 }
 
 long long
+adddi_sxtw4 (long long a, int i)
+{
+  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw #?4" } } */
+  return a + ((long long)i << 4);
+}
+
+long long
 adddi_sxtw0 (long long a, int i)
 {
   /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw\n" } } */
@@ -65,6 +79,13 @@  subdi_uxtw (unsigned long long a, unsigned int i)
 }
 
 unsigned long long
+subdi_uxtw4 (unsigned long long a, unsigned int i)
+{
+  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw #?4" } } */
+  return a - ((unsigned long long)i << 4);
+}
+
+unsigned long long
 subdi_uxtw0 (unsigned long long a, unsigned int i)
 {
   /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw\n" } } */
@@ -79,6 +100,13 @@  subdi_sxtw (long long a, int i)
 }
 
 long long
+subdi_sxtw4 (long long a, int i)
+{
+  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw #?4" } } */
+  return a - ((long long)i << 4);
+}
+
+long long
 subdi_sxtw0 (long long a, int i)
 {
   /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw\n" } } */