diff mbox

[ARM] stop changing signedness in PROMOTE_MODE

Message ID 56C1B74D.4070009@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov Feb. 15, 2016, 11:32 a.m. UTC
On 04/02/16 08:58, Ramana Radhakrishnan wrote:
> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org> wrote:
>> This is my suggested fix for PR 65932, which is a linux kernel
>> miscompile with gcc-5.1.
>>
>> The problem here is caused by a chain of events.  The first is that
>> the relatively new eipa_sra pass creates fake parameters that behave
>> slightly differently than normal parameters.  The second is that the
>> optimizer creates phi nodes that copy local variables to fake
>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>> assumes that it can emit simple move instructions for these phi nodes.
>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>> forces QImode and HImode to unsigned, but a
>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>> short parameters have different in register representations than local
>> variables, and require a conversion when copying between them, a
>> conversion that the out-of-ssa pass can't easily emit.
>>
>> Ultimately, I think this is a problem in the arm backend.  It should
>> not have a PROMOTE_MODE macro that is changing the sign of char and
>> short local variables.  I also think that we should merge the
>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>> prevent this from happening again.
>>
>> I see four general problems with the current ARM PROMOTE_MODE definition.
>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>> instruction was added.  It is a lose for armv6 and later.
>> 2) Unsigned short was only faster for targets that don't support
>> unaligned accesses.  Support for these targets was removed a while
>> ago, and this PROMODE_MODE hunk should have been removed at the same
>> time.  It was accidentally left behind.
>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>> converted to a function, the PROMOTE_MODE code was copied without the
>> UNSIGNEDP changes.  Thus it is only an accident that
>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>> changes to resolve the difference are safe.
>> 4) There is a general principle that you should only change signedness
>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>> in extra conversion instructions that make code slower.  The mips64
>> hardware for instance requires that 32-bit values be sign-extended
>> regardless of type, and instructions may trap if this is not true.
>> However, it has a set of 32-bit instructions that operate on these
>> values, and hence no conversions are required.  There is no similar
>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>> can be seen in the testcases where gcc emits both a zero-extend and a
>> sign-extend inside a loop, as the sign-extend is required for a
>> compare, and the zero-extend is required by PROMOTE_MODE.
> Given Kyrill's testing with the patch and the reasonably detailed
> check of the effects of code generation changes - The arm.h hunk is ok
> - I do think we should make this explicit in the documentation that
> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
> better still maybe put in a checking assert for the same in the
> mid-end but that could be the subject of a follow-up patch.
>
> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
> the testsuite fallout separately.
Hi all,

I'd like to backport the arm.h from this ( r233130) to the GCC 5
branch. As the CSE patch from my series had some fallout on x86_64
due to a deficiency in the AVX patterns that is too invasive to fix
at this stage (and presumably backport), I'd like to just backport
this arm.h fix and adjust the tests to XFAIL the fallout that comes
with not applying the CSE patch. The attached patch does that.

The code quality fallout on code outside the testsuite is not
that gread. The SPEC benchmarks are not affected by not applying
the CSE change, and only a single sequence in a popular embedded benchmark
shows some degradation for -mtune=cortex-a9 in the same way as the
wmul-1.c and wmul-2.c tests.

I think that's a fair tradeoff for fixing the wrong code bug on that branch.

Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch?

Thanks,
Kyrill

2016-02-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65932
     * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
     xfail the scan-assembler test.
     * gcc.target/arm/wmul-2.c: Likewise.
     * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb.



>
> regards
> Ramana
>
>
>
>
>> My change was tested with an arm bootstrap, make check, and SPEC
>> CPU2000 run.  The original poster verified that this gives a linux
>> kernel that boots correctly.
>>
>> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
>> are tests to verify that smulbb and/or smlabb are generated.
>> Eliminating the unnecessary sign conversions causes us to get better
>> code that doesn't include the smulbb and smlabb instructions.  I had
>> to modify the testcases to get them to emit the desired instructions.
>> With the testcase changes there are no additional testsuite failures,
>> though I'm concerned that these testcases with the changes may be
>> fragile, and future changes may break them again.
>
>
>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>> then maybe we should try to add new patterns to get the instructions
>> emitted again for the unmodified testcases.
>>
>> Jim

Comments

Ramana Radhakrishnan Feb. 16, 2016, 10:44 a.m. UTC | #1
On Mon, Feb 15, 2016 at 11:32 AM, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>
>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org> wrote:
>>>
>>> This is my suggested fix for PR 65932, which is a linux kernel
>>> miscompile with gcc-5.1.
>>>
>>> The problem here is caused by a chain of events.  The first is that
>>> the relatively new eipa_sra pass creates fake parameters that behave
>>> slightly differently than normal parameters.  The second is that the
>>> optimizer creates phi nodes that copy local variables to fake
>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>> assumes that it can emit simple move instructions for these phi nodes.
>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>> forces QImode and HImode to unsigned, but a
>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>> short parameters have different in register representations than local
>>> variables, and require a conversion when copying between them, a
>>> conversion that the out-of-ssa pass can't easily emit.
>>>
>>> Ultimately, I think this is a problem in the arm backend.  It should
>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>> short local variables.  I also think that we should merge the
>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>> prevent this from happening again.
>>>
>>> I see four general problems with the current ARM PROMOTE_MODE definition.
>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>> instruction was added.  It is a lose for armv6 and later.
>>> 2) Unsigned short was only faster for targets that don't support
>>> unaligned accesses.  Support for these targets was removed a while
>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>> time.  It was accidentally left behind.
>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>> converted to a function, the PROMOTE_MODE code was copied without the
>>> UNSIGNEDP changes.  Thus it is only an accident that
>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>> changes to resolve the difference are safe.
>>> 4) There is a general principle that you should only change signedness
>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>> in extra conversion instructions that make code slower.  The mips64
>>> hardware for instance requires that 32-bit values be sign-extended
>>> regardless of type, and instructions may trap if this is not true.
>>> However, it has a set of 32-bit instructions that operate on these
>>> values, and hence no conversions are required.  There is no similar
>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>> sign-extend inside a loop, as the sign-extend is required for a
>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>
>> Given Kyrill's testing with the patch and the reasonably detailed
>> check of the effects of code generation changes - The arm.h hunk is ok
>> - I do think we should make this explicit in the documentation that
>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>> better still maybe put in a checking assert for the same in the
>> mid-end but that could be the subject of a follow-up patch.
>>
>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>> the testsuite fallout separately.
>
> Hi all,
>
> I'd like to backport the arm.h from this ( r233130) to the GCC 5
> branch. As the CSE patch from my series had some fallout on x86_64
> due to a deficiency in the AVX patterns that is too invasive to fix
> at this stage (and presumably backport), I'd like to just backport
> this arm.h fix and adjust the tests to XFAIL the fallout that comes
> with not applying the CSE patch. The attached patch does that.

I would hope that someone looks to fix the AVX port for GCC 7  - as
the CSE patch is something that is useful on ARM for performance in
real terms and could have benefits elsewhere.

OK to apply if no regressions - thanks for pushing this through.

Thanks,
Ramana

>
> The code quality fallout on code outside the testsuite is not
> that gread. The SPEC benchmarks are not affected by not applying
> the CSE change, and only a single sequence in a popular embedded benchmark
> shows some degradation for -mtune=cortex-a9 in the same way as the
> wmul-1.c and wmul-2.c tests.
>
> I think that's a fair tradeoff for fixing the wrong code bug on that branch.
>
> Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch?
>
> Thanks,
> Kyrill
>
> 2016-02-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR target/65932
>     * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
>     xfail the scan-assembler test.
>     * gcc.target/arm/wmul-2.c: Likewise.
>     * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb.
>
>
>
>
>>
>> regards
>> Ramana
>>
>>
>>
>>
>>> My change was tested with an arm bootstrap, make check, and SPEC
>>> CPU2000 run.  The original poster verified that this gives a linux
>>> kernel that boots correctly.
>>>
>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
>>> are tests to verify that smulbb and/or smlabb are generated.
>>> Eliminating the unnecessary sign conversions causes us to get better
>>> code that doesn't include the smulbb and smlabb instructions.  I had
>>> to modify the testcases to get them to emit the desired instructions.
>>> With the testcase changes there are no additional testsuite failures,
>>> though I'm concerned that these testcases with the changes may be
>>> fragile, and future changes may break them again.
>>
>>
>>
>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>>> then maybe we should try to add new patterns to get the instructions
>>> emitted again for the unmodified testcases.
>>>
>>> Jim
>
>
Christophe Lyon Feb. 17, 2016, 10:03 a.m. UTC | #2
On 15 February 2016 at 12:32, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>
>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org> wrote:
>>>
>>> This is my suggested fix for PR 65932, which is a linux kernel
>>> miscompile with gcc-5.1.
>>>
>>> The problem here is caused by a chain of events.  The first is that
>>> the relatively new eipa_sra pass creates fake parameters that behave
>>> slightly differently than normal parameters.  The second is that the
>>> optimizer creates phi nodes that copy local variables to fake
>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>> assumes that it can emit simple move instructions for these phi nodes.
>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>> forces QImode and HImode to unsigned, but a
>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>> short parameters have different in register representations than local
>>> variables, and require a conversion when copying between them, a
>>> conversion that the out-of-ssa pass can't easily emit.
>>>
>>> Ultimately, I think this is a problem in the arm backend.  It should
>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>> short local variables.  I also think that we should merge the
>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>> prevent this from happening again.
>>>
>>> I see four general problems with the current ARM PROMOTE_MODE definition.
>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>> instruction was added.  It is a lose for armv6 and later.
>>> 2) Unsigned short was only faster for targets that don't support
>>> unaligned accesses.  Support for these targets was removed a while
>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>> time.  It was accidentally left behind.
>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>> converted to a function, the PROMOTE_MODE code was copied without the
>>> UNSIGNEDP changes.  Thus it is only an accident that
>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>> changes to resolve the difference are safe.
>>> 4) There is a general principle that you should only change signedness
>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>> in extra conversion instructions that make code slower.  The mips64
>>> hardware for instance requires that 32-bit values be sign-extended
>>> regardless of type, and instructions may trap if this is not true.
>>> However, it has a set of 32-bit instructions that operate on these
>>> values, and hence no conversions are required.  There is no similar
>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>> sign-extend inside a loop, as the sign-extend is required for a
>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>
>> Given Kyrill's testing with the patch and the reasonably detailed
>> check of the effects of code generation changes - The arm.h hunk is ok
>> - I do think we should make this explicit in the documentation that
>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>> better still maybe put in a checking assert for the same in the
>> mid-end but that could be the subject of a follow-up patch.
>>
>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>> the testsuite fallout separately.
>
> Hi all,
>
> I'd like to backport the arm.h from this ( r233130) to the GCC 5
> branch. As the CSE patch from my series had some fallout on x86_64
> due to a deficiency in the AVX patterns that is too invasive to fix
> at this stage (and presumably backport), I'd like to just backport
> this arm.h fix and adjust the tests to XFAIL the fallout that comes
> with not applying the CSE patch. The attached patch does that.
>
> The code quality fallout on code outside the testsuite is not
> that gread. The SPEC benchmarks are not affected by not applying
> the CSE change, and only a single sequence in a popular embedded benchmark
> shows some degradation for -mtune=cortex-a9 in the same way as the
> wmul-1.c and wmul-2.c tests.
>
> I think that's a fair tradeoff for fixing the wrong code bug on that branch.
>
> Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch?
>
> Thanks,
> Kyrill
>
> 2016-02-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR target/65932
>     * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
>     xfail the scan-assembler test.
>     * gcc.target/arm/wmul-2.c: Likewise.
>     * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb.
>
>
Hi Kyrill,

I've noticed that wmul-3 still fails on the gcc-5 branch when forcing GCC
configuration to:
--with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
(target arm-none-linux-gnueabihf)

The generated code is:
        sxth    r0, r0
        sxth    r1, r1
        mul     r0, r1, r0
instead of
        smulbb  r0, r1, r0
on trunk.

I guess we don't worry?

>
>
>>
>> regards
>> Ramana
>>
>>
>>
>>
>>> My change was tested with an arm bootstrap, make check, and SPEC
>>> CPU2000 run.  The original poster verified that this gives a linux
>>> kernel that boots correctly.
>>>
>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
>>> are tests to verify that smulbb and/or smlabb are generated.
>>> Eliminating the unnecessary sign conversions causes us to get better
>>> code that doesn't include the smulbb and smlabb instructions.  I had
>>> to modify the testcases to get them to emit the desired instructions.
>>> With the testcase changes there are no additional testsuite failures,
>>> though I'm concerned that these testcases with the changes may be
>>> fragile, and future changes may break them again.
>>
>>
>>
>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>>> then maybe we should try to add new patterns to get the instructions
>>> emitted again for the unmodified testcases.
>>>
>>> Jim
>
>
Kyrill Tkachov Feb. 17, 2016, 10:05 a.m. UTC | #3
On 17/02/16 10:03, Christophe Lyon wrote:
> On 15 February 2016 at 12:32, Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org> wrote:
>>>> This is my suggested fix for PR 65932, which is a linux kernel
>>>> miscompile with gcc-5.1.
>>>>
>>>> The problem here is caused by a chain of events.  The first is that
>>>> the relatively new eipa_sra pass creates fake parameters that behave
>>>> slightly differently than normal parameters.  The second is that the
>>>> optimizer creates phi nodes that copy local variables to fake
>>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>>> assumes that it can emit simple move instructions for these phi nodes.
>>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>>> forces QImode and HImode to unsigned, but a
>>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>>> short parameters have different in register representations than local
>>>> variables, and require a conversion when copying between them, a
>>>> conversion that the out-of-ssa pass can't easily emit.
>>>>
>>>> Ultimately, I think this is a problem in the arm backend.  It should
>>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>>> short local variables.  I also think that we should merge the
>>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>>> prevent this from happening again.
>>>>
>>>> I see four general problems with the current ARM PROMOTE_MODE definition.
>>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>>> instruction was added.  It is a lose for armv6 and later.
>>>> 2) Unsigned short was only faster for targets that don't support
>>>> unaligned accesses.  Support for these targets was removed a while
>>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>>> time.  It was accidentally left behind.
>>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>>> converted to a function, the PROMOTE_MODE code was copied without the
>>>> UNSIGNEDP changes.  Thus it is only an accident that
>>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>>> changes to resolve the difference are safe.
>>>> 4) There is a general principle that you should only change signedness
>>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>>> in extra conversion instructions that make code slower.  The mips64
>>>> hardware for instance requires that 32-bit values be sign-extended
>>>> regardless of type, and instructions may trap if this is not true.
>>>> However, it has a set of 32-bit instructions that operate on these
>>>> values, and hence no conversions are required.  There is no similar
>>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>>> sign-extend inside a loop, as the sign-extend is required for a
>>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>> Given Kyrill's testing with the patch and the reasonably detailed
>>> check of the effects of code generation changes - The arm.h hunk is ok
>>> - I do think we should make this explicit in the documentation that
>>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>>> better still maybe put in a checking assert for the same in the
>>> mid-end but that could be the subject of a follow-up patch.
>>>
>>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>>> the testsuite fallout separately.
>> Hi all,
>>
>> I'd like to backport the arm.h from this ( r233130) to the GCC 5
>> branch. As the CSE patch from my series had some fallout on x86_64
>> due to a deficiency in the AVX patterns that is too invasive to fix
>> at this stage (and presumably backport), I'd like to just backport
>> this arm.h fix and adjust the tests to XFAIL the fallout that comes
>> with not applying the CSE patch. The attached patch does that.
>>
>> The code quality fallout on code outside the testsuite is not
>> that gread. The SPEC benchmarks are not affected by not applying
>> the CSE change, and only a single sequence in a popular embedded benchmark
>> shows some degradation for -mtune=cortex-a9 in the same way as the
>> wmul-1.c and wmul-2.c tests.
>>
>> I think that's a fair tradeoff for fixing the wrong code bug on that branch.
>>
>> Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch?
>>
>> Thanks,
>> Kyrill
>>
>> 2016-02-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      PR target/65932
>>      * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
>>      xfail the scan-assembler test.
>>      * gcc.target/arm/wmul-2.c: Likewise.
>>      * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb.
>>
>>
> Hi Kyrill,
>
> I've noticed that wmul-3 still fails on the gcc-5 branch when forcing GCC
> configuration to:
> --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
> (target arm-none-linux-gnueabihf)
>
> The generated code is:
>          sxth    r0, r0
>          sxth    r1, r1
>          mul     r0, r1, r0
> instead of
>          smulbb  r0, r1, r0
> on trunk.
>
> I guess we don't worry?

Hi Christophe,
Hmmm, I suspect we might want to backport https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01714.html
to fix backend the costing logic of smulbb.
Could you please try that patch to see if it helps?

Thanks,
Kyrill

>>
>>> regards
>>> Ramana
>>>
>>>
>>>
>>>
>>>> My change was tested with an arm bootstrap, make check, and SPEC
>>>> CPU2000 run.  The original poster verified that this gives a linux
>>>> kernel that boots correctly.
>>>>
>>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
>>>> are tests to verify that smulbb and/or smlabb are generated.
>>>> Eliminating the unnecessary sign conversions causes us to get better
>>>> code that doesn't include the smulbb and smlabb instructions.  I had
>>>> to modify the testcases to get them to emit the desired instructions.
>>>> With the testcase changes there are no additional testsuite failures,
>>>> though I'm concerned that these testcases with the changes may be
>>>> fragile, and future changes may break them again.
>>>
>>>
>>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>>>> then maybe we should try to add new patterns to get the instructions
>>>> emitted again for the unmodified testcases.
>>>>
>>>> Jim
>>
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.target/arm/wmul-1.c b/gcc/testsuite/gcc.target/arm/wmul-1.c
index ddddd509fe645ea98877753773e7bcf9b6787897..ce14769c570537f16f022c8cde35843ab0695f74 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-1.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-1.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options "-O1 -fexpensive-optimizations" } */
+/* { dg-options "-O1 -fexpensive-optimizations -mtune=cortex-a9" } */
 
 int mac(const short *a, const short *b, int sqr, int *sum)
 {
@@ -16,4 +16,4 @@  int mac(const short *a, const short *b, int sqr, int *sum)
   return sqr;
 }
 
-/* { dg-final { scan-assembler-times "smlabb" 2 } } */
+/* { dg-final { scan-assembler-times "smlabb" 2  { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/arm/wmul-2.c b/gcc/testsuite/gcc.target/arm/wmul-2.c
index 2ea55f9fbe12f74f38754cb72be791fd6e9495f4..a74d81b195b210650fc1bc1da598ac7c1fe82ac2 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-2.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-2.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options "-O1 -fexpensive-optimizations" } */
+/* { dg-options "-O1 -fexpensive-optimizations -mtune=cortex-a9" } */
 
 void vec_mpy(int y[], const short x[], short scaler)
 {
@@ -10,4 +10,4 @@  void vec_mpy(int y[], const short x[], short scaler)
    y[i] += ((scaler * x[i]) >> 31);
 }
 
-/* { dg-final { scan-assembler-times "smulbb" 1 } } */
+/* { dg-final { scan-assembler-times "smulbb" 1 { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/arm/wmul-3.c b/gcc/testsuite/gcc.target/arm/wmul-3.c
index 144b553082e6158701639f05929987de01e7125a..87eba740142a80a1dc1979b4e79d9272a839e7b2 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-3.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-3.c
@@ -1,19 +1,11 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options "-O1 -fexpensive-optimizations" } */
+/* { dg-options "-O" } */
 
-int mac(const short *a, const short *b, int sqr, int *sum)
+int
+foo (int a, int b)
 {
-  int i;
-  int dotp = *sum;
-
-  for (i = 0; i < 150; i++) {
-    dotp -= b[i] * a[i];
-    sqr -= b[i] * b[i];
-  }
-
-  *sum = dotp;
-  return sqr;
+  return (short) a * (short) b;
 }
 
-/* { dg-final { scan-assembler-times "smulbb" 2 } } */
+/* { dg-final { scan-assembler-times "smulbb" 1 } } */