Message ID | 20181126195017.1718-1-christoph.muellner@theobroma-systems.com |
---|---|
State | New |
Headers | show |
Series | [v3,aarch64] Correct the maximum shift amount for shifted operands. | expand |
On 11/26/18 7:50 PM, Christoph Muellner wrote: > The aarch64 ISA specification allows a left shift amount to be applied > after extension in the range of 0 to 4 (encoded in the imm3 field). > > This is true for at least the following instructions: > > * ADD (extend register) > * ADDS (extended register) > * SUB (extended register) > > The result of this patch can be seen, when compiling the following code: > > uint64_t myadd(uint64_t a, uint64_t b) > { > return a+(((uint8_t)b)<<4); > } > > Without the patch the following sequence will be generated: > > 0000000000000000 <myadd>: > 0: d37c1c21 ubfiz x1, x1, #4, #8 > 4: 8b000020 add x0, x1, x0 > 8: d65f03c0 ret > > With the patch the ubfiz will be merged into the add instruction: > > 0000000000000000 <myadd>: > 0: 8b211000 add x0, x0, w1, uxtb #4 > 4: d65f03c0 ret Hi Christoph, Thanks for this, I'm not a maintainer but this looks good to me. A good point to mention would be how it affects shifts smaller than 4 bits, since I don't see any testing for that in the files you have changed. > Tested with "make check" and no regressions found. Could you perhaps elaborate on your testing? So what triplets you tested, if you bootstrapped successfully etc.
Sam, > On 27.11.2018, at 14:06, Sam Tebbs <Sam.Tebbs@arm.com> wrote: > > > On 11/26/18 7:50 PM, Christoph Muellner wrote: >> The aarch64 ISA specification allows a left shift amount to be applied >> after extension in the range of 0 to 4 (encoded in the imm3 field). >> >> This is true for at least the following instructions: >> >> * ADD (extend register) >> * ADDS (extended register) >> * SUB (extended register) >> >> The result of this patch can be seen, when compiling the following code: >> >> uint64_t myadd(uint64_t a, uint64_t b) >> { >> return a+(((uint8_t)b)<<4); >> } >> >> Without the patch the following sequence will be generated: >> >> 0000000000000000 <myadd>: >> 0: d37c1c21 ubfiz x1, x1, #4, #8 >> 4: 8b000020 add x0, x1, x0 >> 8: d65f03c0 ret >> >> With the patch the ubfiz will be merged into the add instruction: >> >> 0000000000000000 <myadd>: >> 0: 8b211000 add x0, x0, w1, uxtb #4 >> 4: d65f03c0 ret > > Hi Christoph, > > Thanks for this, I'm not a maintainer but this looks good to me. A good > point to mention would be how it affects shifts smaller than 4 bits, > since I don't see any testing for that in the files you have changed. > >> Tested with "make check" and no regressions found. > Could you perhaps elaborate on your testing? So what triplets you > tested, if you bootstrapped successfully etc. Just one bit of background info... We’ve had this change in production since 2014 both in AppliedMicro’s (now Ampere’s) compiler branch as well as in Ubuntu PPA for various workloads. Thanks, Philipp.
> On 27.11.2018, at 14:04, Sam Tebbs <Sam.Tebbs@arm.com> wrote: > > > On 11/26/18 7:50 PM, Christoph Muellner wrote: >> The aarch64 ISA specification allows a left shift amount to be applied >> after extension in the range of 0 to 4 (encoded in the imm3 field). >> >> This is true for at least the following instructions: >> >> * ADD (extend register) >> * ADDS (extended register) >> * SUB (extended register) >> >> The result of this patch can be seen, when compiling the following code: >> >> uint64_t myadd(uint64_t a, uint64_t b) >> { >> return a+(((uint8_t)b)<<4); >> } >> >> Without the patch the following sequence will be generated: >> >> 0000000000000000 <myadd>: >> 0:d37c1c21 ubfizx1, x1, #4, #8 >> 4:8b000020 addx0, x1, x0 >> 8:d65f03c0 ret >> >> With the patch the ubfiz will be merged into the add instruction: >> >> 0000000000000000 <myadd>: >> 0:8b211000 addx0, x0, w1, uxtb #4 >> 4:d65f03c0 ret > > Hi Christoph, > > Thanks for this, I'm not a maintainer but this looks good to me. A good > point to mention would be if it affects shifts smaller than 4 bits, > since I don't see any testing for that in the files you have changed. > Hi Sam, shifts smaller than 4 bits are already tested in gcc/testsuite/gcc.target/aarch64/extend.c. E.g. subdi_sxtw() does so for shift-by-3. All existing test cases where executed and did not show any regressions. In other words: shifts smaller than 4 bits are not affected. >> Tested with "make check" and no regressions found. > Could you perhaps elaborate on your testing? So what triplets you > tested, if you bootstrapped successfully etc. For the "make check" regression test, I compiled native on an AArch64 machine running CentOS 7.5. Configure flags were "--enable-bootstrap". I did so with and without the patch and compared the results for differences. I think that's the essential requirement to get an OK for this patch. Besides that we've had this change in our aarch64-linux-gnu toolchain since 2014. This toolchain has been used to compile a wide range of OSS projects, benchmarks as well as proprietary code over the years. For example we ran the SPEC CPU2000, CPU2006, CPU2017 benchmarks, built Linux kernels (at least from 4.4 to 4.19), glibc (from 2.17 to 2.28), binutils (ancient times till 2.30), TCmalloc (2.6 and 2.7), jemalloc (4 and 5), buildroot (2017, 2018), U-Boot (2015-2018), Tianocore, HHVM, OpenSSL, MySQL,... Our work involved interwork with existing libraries (e.g. HHVM and its dependencies to 100 external shared libraries) on several operating systems (Ubuntu, Fedora, CentOS, OpenSuse, Oracle Linux, Debian, ...). Besides LP64 we also used (and still use) the ILP32 ABI. The binaries created by the compilers using that change were running at least on APM Xgene processors, Ampere Computer's eMAG processors, as well as on ARM's Cortex-A53, Cortex-A72 cores. If you want me to test something specific, then just let me know. Thanks, Christoph
On 26/11/2018 19:50, Christoph Muellner wrote: > The aarch64 ISA specification allows a left shift amount to be applied > after extension in the range of 0 to 4 (encoded in the imm3 field). > > This is true for at least the following instructions: > > * ADD (extend register) > * ADDS (extended register) > * SUB (extended register) > > The result of this patch can be seen, when compiling the following code: > > uint64_t myadd(uint64_t a, uint64_t b) > { > return a+(((uint8_t)b)<<4); > } > > Without the patch the following sequence will be generated: > > 0000000000000000 <myadd>: > 0: d37c1c21 ubfiz x1, x1, #4, #8 > 4: 8b000020 add x0, x1, x0 > 8: d65f03c0 ret > > With the patch the ubfiz will be merged into the add instruction: > > 0000000000000000 <myadd>: > 0: 8b211000 add x0, x0, w1, uxtb #4 > 4: d65f03c0 ret > > Tested with "make check" and no regressions found. > > *** gcc/ChangeLog *** > > 2018-xx-xx Christoph Muellner <christoph.muellner@theobroma-systems.com> > Philipp Tomsich <philipp.tomsich@theobroma-systems.com> > > * config/aarch64/aarch64.c (aarch64_uxt_size): Correct the maximum > shift amount for shifted operands. > > *** gcc/testsuite/ChangeLog *** > > 2018-xx-xx Christoph Muellner <christoph.muellner@theobroma-systems.com> > Philipp Tomsich <philipp.tomsich@theobroma-systems.com> > > * gcc.target/aarch64/extend.c: Adjust the testcases to cover > the changed shift amount. > This is OK. Thanks. R. PS, I was sufficiently surprised by this that I went and checked the original commit (it's not an obvious off-by-one error). But it does appear that it's been this way since the code was originally added (prior to the initial publication of the port) and there's no obvious reason why. > Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com> > --- > gcc/config/aarch64/aarch64.c | 2 +- > gcc/testsuite/gcc.target/aarch64/extend.c | 28 ++++++++++++++++++++++++++++ > 2 files changed, 29 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 90bbc57..27b53f4 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -8226,7 +8226,7 @@ aarch64_output_casesi (rtx *operands) > int > aarch64_uxt_size (int shift, HOST_WIDE_INT mask) > { > - if (shift >= 0 && shift <= 3) > + if (shift >= 0 && shift <= 4) > { > int size; > for (size = 8; size <= 32; size *= 2) > diff --git a/gcc/testsuite/gcc.target/aarch64/extend.c b/gcc/testsuite/gcc.target/aarch64/extend.c > index f399e55..19b120d 100644 > --- a/gcc/testsuite/gcc.target/aarch64/extend.c > +++ b/gcc/testsuite/gcc.target/aarch64/extend.c > @@ -37,6 +37,13 @@ adddi_uxtw (unsigned long long a, unsigned int i) > } > > unsigned long long > +adddi_uxtw4 (unsigned long long a, unsigned int i) > +{ > + /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw #?4" } } */ > + return a + ((unsigned long long)i << 4); > +} > + > +unsigned long long > adddi_uxtw0 (unsigned long long a, unsigned int i) > { > /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw\n" } } */ > @@ -51,6 +58,13 @@ adddi_sxtw (long long a, int i) > } > > long long > +adddi_sxtw4 (long long a, int i) > +{ > + /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw #?4" } } */ > + return a + ((long long)i << 4); > +} > + > +long long > adddi_sxtw0 (long long a, int i) > { > /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw\n" } } */ > @@ -65,6 +79,13 @@ subdi_uxtw (unsigned long long a, unsigned int i) > } > > unsigned long long > +subdi_uxtw4 (unsigned long long a, unsigned int i) > +{ > + /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw #?4" } } */ > + return a - ((unsigned long long)i << 4); > +} > + > +unsigned long long > subdi_uxtw0 (unsigned long long a, unsigned int i) > { > /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw\n" } } */ > @@ -79,6 +100,13 @@ subdi_sxtw (long long a, int i) > } > > long long > +subdi_sxtw4 (long long a, int i) > +{ > + /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw #?4" } } */ > + return a - ((long long)i << 4); > +} > + > +long long > subdi_sxtw0 (long long a, int i) > { > /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw\n" } } */ >
> On 28.11.2018, at 13:10, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote: > > On 26/11/2018 19:50, Christoph Muellner wrote: >> The aarch64 ISA specification allows a left shift amount to be applied >> after extension in the range of 0 to 4 (encoded in the imm3 field). >> >> This is true for at least the following instructions: >> >> * ADD (extend register) >> * ADDS (extended register) >> * SUB (extended register) >> >> The result of this patch can be seen, when compiling the following code: >> >> uint64_t myadd(uint64_t a, uint64_t b) >> { >> return a+(((uint8_t)b)<<4); >> } >> >> Without the patch the following sequence will be generated: >> >> 0000000000000000 <myadd>: >> 0: d37c1c21 ubfiz x1, x1, #4, #8 >> 4: 8b000020 add x0, x1, x0 >> 8: d65f03c0 ret >> >> With the patch the ubfiz will be merged into the add instruction: >> >> 0000000000000000 <myadd>: >> 0: 8b211000 add x0, x0, w1, uxtb #4 >> 4: d65f03c0 ret >> >> Tested with "make check" and no regressions found. >> >> *** gcc/ChangeLog *** >> >> 2018-xx-xx Christoph Muellner <christoph.muellner@theobroma-systems.com> >> Philipp Tomsich <philipp.tomsich@theobroma-systems.com> >> >> * config/aarch64/aarch64.c (aarch64_uxt_size): Correct the maximum >> shift amount for shifted operands. >> >> *** gcc/testsuite/ChangeLog *** >> >> 2018-xx-xx Christoph Muellner <christoph.muellner@theobroma-systems.com> >> Philipp Tomsich <philipp.tomsich@theobroma-systems.com> >> >> * gcc.target/aarch64/extend.c: Adjust the testcases to cover >> the changed shift amount. >> > > This is OK. Thanks. > > R. > > PS, I was sufficiently surprised by this that I went and checked the > original commit (it's not an obvious off-by-one error). But it does > appear that it's been this way since the code was originally added > (prior to the initial publication of the port) and there's no obvious > reason why. Since we don’t have any Reported-by: tags in GCC: the credit for initially finding and reporting this goes to AppliedMicro's original chip verification team for the XGene1. Cheers, Philipp.
On Wed, Nov 28, 2018 at 07:08:02AM -0600, Philipp Tomsich wrote: > > > On 28.11.2018, at 13:10, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com<mailto:Richard.Earnshaw@arm.com>> wrote: > > On 26/11/2018 19:50, Christoph Muellner wrote: > The aarch64 ISA specification allows a left shift amount to be applied > after extension in the range of 0 to 4 (encoded in the imm3 field). > > This is true for at least the following instructions: > > * ADD (extend register) > * ADDS (extended register) > * SUB (extended register) > > The result of this patch can be seen, when compiling the following code: > > uint64_t myadd(uint64_t a, uint64_t b) > { > return a+(((uint8_t)b)<<4); > } > > Without the patch the following sequence will be generated: > > 0000000000000000 <myadd>: > 0: d37c1c21 ubfiz x1, x1, #4, #8 > 4: 8b000020 add x0, x1, x0 > 8: d65f03c0 ret > > With the patch the ubfiz will be merged into the add instruction: > > 0000000000000000 <myadd>: > 0: 8b211000 add x0, x0, w1, uxtb #4 > 4: d65f03c0 ret > > Tested with "make check" and no regressions found. > > *** gcc/ChangeLog *** > > 2018-xx-xx Christoph Muellner <christoph.muellner@theobroma-systems.com<mailto:christoph.muellner@theobroma-systems.com>> > Philipp Tomsich <philipp.tomsich@theobroma-systems.com<mailto:philipp.tomsich@theobroma-systems.com>> > > * config/aarch64/aarch64.c (aarch64_uxt_size): Correct the maximum > shift amount for shifted operands. > > *** gcc/testsuite/ChangeLog *** > > 2018-xx-xx Christoph Muellner <christoph.muellner@theobroma-systems.com<mailto:christoph.muellner@theobroma-systems.com>> > Philipp Tomsich <philipp.tomsich@theobroma-systems.com<mailto:philipp.tomsich@theobroma-systems.com>> > > * gcc.target/aarch64/extend.c: Adjust the testcases to cover > the changed shift amount. > > > This is OK. Thanks. > > R. > > PS, I was sufficiently surprised by this that I went and checked the > original commit (it's not an obvious off-by-one error). But it does > appear that it's been this way since the code was originally added > (prior to the initial publication of the port) and there's no obvious > reason why. > > Since we don’t have any Reported-by: tags in GCC: the credit for initially finding and > reporting this goes to AppliedMicro's original chip verification team for the XGene1. Good spot that team! This also took me down a rabbit hole of Architecture Reference Manuals and old source trees wondering how we got off by one. My best guess is that 0-3 just feels like the more natural range than 0-4... Thanks for the patch and the distraction! James
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 90bbc57..27b53f4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -8226,7 +8226,7 @@ aarch64_output_casesi (rtx *operands) int aarch64_uxt_size (int shift, HOST_WIDE_INT mask) { - if (shift >= 0 && shift <= 3) + if (shift >= 0 && shift <= 4) { int size; for (size = 8; size <= 32; size *= 2) diff --git a/gcc/testsuite/gcc.target/aarch64/extend.c b/gcc/testsuite/gcc.target/aarch64/extend.c index f399e55..19b120d 100644 --- a/gcc/testsuite/gcc.target/aarch64/extend.c +++ b/gcc/testsuite/gcc.target/aarch64/extend.c @@ -37,6 +37,13 @@ adddi_uxtw (unsigned long long a, unsigned int i) } unsigned long long +adddi_uxtw4 (unsigned long long a, unsigned int i) +{ + /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw #?4" } } */ + return a + ((unsigned long long)i << 4); +} + +unsigned long long adddi_uxtw0 (unsigned long long a, unsigned int i) { /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw\n" } } */ @@ -51,6 +58,13 @@ adddi_sxtw (long long a, int i) } long long +adddi_sxtw4 (long long a, int i) +{ + /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw #?4" } } */ + return a + ((long long)i << 4); +} + +long long adddi_sxtw0 (long long a, int i) { /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw\n" } } */ @@ -65,6 +79,13 @@ subdi_uxtw (unsigned long long a, unsigned int i) } unsigned long long +subdi_uxtw4 (unsigned long long a, unsigned int i) +{ + /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw #?4" } } */ + return a - ((unsigned long long)i << 4); +} + +unsigned long long subdi_uxtw0 (unsigned long long a, unsigned int i) { /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw\n" } } */ @@ -79,6 +100,13 @@ subdi_sxtw (long long a, int i) } long long +subdi_sxtw4 (long long a, int i) +{ + /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw #?4" } } */ + return a - ((long long)i << 4); +} + +long long subdi_sxtw0 (long long a, int i) { /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw\n" } } */
The aarch64 ISA specification allows a left shift amount to be applied after extension in the range of 0 to 4 (encoded in the imm3 field). This is true for at least the following instructions: * ADD (extend register) * ADDS (extended register) * SUB (extended register) The result of this patch can be seen, when compiling the following code: uint64_t myadd(uint64_t a, uint64_t b) { return a+(((uint8_t)b)<<4); } Without the patch the following sequence will be generated: 0000000000000000 <myadd>: 0: d37c1c21 ubfiz x1, x1, #4, #8 4: 8b000020 add x0, x1, x0 8: d65f03c0 ret With the patch the ubfiz will be merged into the add instruction: 0000000000000000 <myadd>: 0: 8b211000 add x0, x0, w1, uxtb #4 4: d65f03c0 ret Tested with "make check" and no regressions found. *** gcc/ChangeLog *** 2018-xx-xx Christoph Muellner <christoph.muellner@theobroma-systems.com> Philipp Tomsich <philipp.tomsich@theobroma-systems.com> * config/aarch64/aarch64.c (aarch64_uxt_size): Correct the maximum shift amount for shifted operands. *** gcc/testsuite/ChangeLog *** 2018-xx-xx Christoph Muellner <christoph.muellner@theobroma-systems.com> Philipp Tomsich <philipp.tomsich@theobroma-systems.com> * gcc.target/aarch64/extend.c: Adjust the testcases to cover the changed shift amount. Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com> --- gcc/config/aarch64/aarch64.c | 2 +- gcc/testsuite/gcc.target/aarch64/extend.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 29 insertions(+), 1 deletion(-)