diff mbox

[ARM,RFC] PR target/65578 Fix gcc.dg/torture/stackalign/builtin-apply-4.c for single-precision fpus

Message ID 56BA2014.1020708@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov Feb. 9, 2016, 5:21 p.m. UTC
Hi all,

In this wrong-code PR the builtin-apply-4.c test fails with -flto but only when targeting an fpu
with only single-precision capabilities.

bar is a function returing a double. For non-LTO compilation the caller of bar reads the return value
from it from the s0 and s1 VFP registers like expected, but for -flto the caller seems to expect the
return value from the r0 and r1 regs.  The RTL dumps show that too.

Debugging the calls to arm_function_value show that in the -flto compilation the function bar is deemed
to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS variant, whereas for the non-LTO (and non-breaking)
compilation it uses the ARM_PCS_AAPCS_VFP variant.

Further down in use_vfp_abi when deciding whether to use VFP registers for the result there is a bit of
logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL variant with a double precision value
on an FPU that is not TARGET_VFP_DOUBLE.

This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means that the function doesn't escape
the translation unit and we can thus use whatever variant we want. From what I understand we want to use the
VFP regs when possible for FP values.

So this patch removes that restriction and for the testcase the caller of bar correctly reads the return
value of bar from the VFP registers and everything works.

This patch has been bootstrapped and tested on arm-none-linux-gnueabihf configured with --with-fpu=fpv4-sp-d16.
The bootstrapped was performed with LTO.
I didn't see any regressions.

It seems that this logic was put there in 2009 with r154034 as part of a large patch to enable support for half-precision
floating point.

I'm not very familiar with this part of the code, so is this a safe patch to do?
The patch should only ever change behaviour for single-precision-only fpus and only for static functions
that don't get called outside their translation units (or during LTO I suppose) so there shouldn't
be any ABI problems, I think.

Is this ok for trunk?

Thanks,
Kyrill

2016-02-09  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65578
     * config/arm/arm.c (use_vfp_abi): Remove id_double argument.
     Don't check for is_double and TARGET_VFP_DOUBLE.
     (aapcs_vfp_is_call_or_return_candidate): Update callsite.
     (aapcs_vfp_is_return_candidate): Likewise.
     (aapcs_vfp_is_call_candidate): Likewise.
     (aapcs_vfp_allocate_return_reg): Likewise.

Comments

Kyrill Tkachov Feb. 9, 2016, 5:24 p.m. UTC | #1
On 09/02/16 17:21, Kyrill Tkachov wrote:
> Hi all,
>
> In this wrong-code PR the builtin-apply-4.c test fails with -flto but only when targeting an fpu
> with only single-precision capabilities.
>
> bar is a function returing a double. For non-LTO compilation the caller of bar reads the return value
> from it from the s0 and s1 VFP registers like expected, but for -flto the caller seems to expect the
> return value from the r0 and r1 regs.  The RTL dumps show that too.
>
> Debugging the calls to arm_function_value show that in the -flto compilation the function bar is deemed
> to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS variant, whereas for the non-LTO (and non-breaking)
> compilation it uses the ARM_PCS_AAPCS_VFP variant.
>
> Further down in use_vfp_abi when deciding whether to use VFP registers for the result there is a bit of
> logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL variant with a double precision value
> on an FPU that is not TARGET_VFP_DOUBLE.
>
> This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means that the function doesn't escape
> the translation unit and we can thus use whatever variant we want. From what I understand we want to use the
> VFP regs when possible for FP values.
>
> So this patch removes that restriction and for the testcase the caller of bar correctly reads the return
> value of bar from the VFP registers and everything works.
>
> This patch has been bootstrapped and tested on arm-none-linux-gnueabihf configured with --with-fpu=fpv4-sp-d16.
> The bootstrapped was performed with LTO.
> I didn't see any regressions.
>
> It seems that this logic was put there in 2009 with r154034 as part of a large patch to enable support for half-precision
> floating point.
>
> I'm not very familiar with this part of the code, so is this a safe patch to do?
> The patch should only ever change behaviour for single-precision-only fpus and only for static functions
> that don't get called outside their translation units (or during LTO I suppose) so there shouldn't
> be any ABI problems, I think.
>
> Is this ok for trunk?
>
> Thanks,
> Kyrill
>

Huh, I just realised I wrote completely the wrong PR number on this.
The PR I'm talking about here is PR target/69538

Sorry for the confusion.

Kyrill


> 2016-02-09 Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR target/65578
>     * config/arm/arm.c (use_vfp_abi): Remove id_double argument.
>     Don't check for is_double and TARGET_VFP_DOUBLE.
>     (aapcs_vfp_is_call_or_return_candidate): Update callsite.
>     (aapcs_vfp_is_return_candidate): Likewise.
>     (aapcs_vfp_is_call_candidate): Likewise.
>     (aapcs_vfp_allocate_return_reg): Likewise.
Kyrill Tkachov Feb. 17, 2016, 10:12 a.m. UTC | #2
Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00634.html

As mentioned before, this is actually a fix for PR target/69538.
I got confused when writing the cover letter and ChangeLog...

Thanks,
Kyrill

On 09/02/16 17:24, Kyrill Tkachov wrote:
>
> On 09/02/16 17:21, Kyrill Tkachov wrote:
>> Hi all,
>>
>> In this wrong-code PR the builtin-apply-4.c test fails with -flto but only when targeting an fpu
>> with only single-precision capabilities.
>>
>> bar is a function returing a double. For non-LTO compilation the caller of bar reads the return value
>> from it from the s0 and s1 VFP registers like expected, but for -flto the caller seems to expect the
>> return value from the r0 and r1 regs.  The RTL dumps show that too.
>>
>> Debugging the calls to arm_function_value show that in the -flto compilation the function bar is deemed
>> to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS variant, whereas for the non-LTO (and non-breaking)
>> compilation it uses the ARM_PCS_AAPCS_VFP variant.
>>
>> Further down in use_vfp_abi when deciding whether to use VFP registers for the result there is a bit of
>> logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL variant with a double precision value
>> on an FPU that is not TARGET_VFP_DOUBLE.
>>
>> This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means that the function doesn't escape
>> the translation unit and we can thus use whatever variant we want. From what I understand we want to use the
>> VFP regs when possible for FP values.
>>
>> So this patch removes that restriction and for the testcase the caller of bar correctly reads the return
>> value of bar from the VFP registers and everything works.
>>
>> This patch has been bootstrapped and tested on arm-none-linux-gnueabihf configured with --with-fpu=fpv4-sp-d16.
>> The bootstrapped was performed with LTO.
>> I didn't see any regressions.
>>
>> It seems that this logic was put there in 2009 with r154034 as part of a large patch to enable support for half-precision
>> floating point.
>>
>> I'm not very familiar with this part of the code, so is this a safe patch to do?
>> The patch should only ever change behaviour for single-precision-only fpus and only for static functions
>> that don't get called outside their translation units (or during LTO I suppose) so there shouldn't
>> be any ABI problems, I think.
>>
>> Is this ok for trunk?
>>
>> Thanks,
>> Kyrill
>>
>
> Huh, I just realised I wrote completely the wrong PR number on this.
> The PR I'm talking about here is PR target/69538
>
> Sorry for the confusion.
>
> Kyrill
>
>
>> 2016-02-09 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>>     PR target/65578
>>     * config/arm/arm.c (use_vfp_abi): Remove id_double argument.
>>     Don't check for is_double and TARGET_VFP_DOUBLE.
>>     (aapcs_vfp_is_call_or_return_candidate): Update callsite.
>>     (aapcs_vfp_is_return_candidate): Likewise.
>>     (aapcs_vfp_is_call_candidate): Likewise.
>>     (aapcs_vfp_allocate_return_reg): Likewise.
>
Kyrill Tkachov Feb. 24, 2016, 1:48 p.m. UTC | #3
Ping*2

Thanks,
Kyrill

On 17/02/16 10:12, Kyrill Tkachov wrote:
> Ping.
> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00634.html
>
> As mentioned before, this is actually a fix for PR target/69538.
> I got confused when writing the cover letter and ChangeLog...
>
> Thanks,
> Kyrill
>
> On 09/02/16 17:24, Kyrill Tkachov wrote:
>>
>> On 09/02/16 17:21, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> In this wrong-code PR the builtin-apply-4.c test fails with -flto but only when targeting an fpu
>>> with only single-precision capabilities.
>>>
>>> bar is a function returing a double. For non-LTO compilation the caller of bar reads the return value
>>> from it from the s0 and s1 VFP registers like expected, but for -flto the caller seems to expect the
>>> return value from the r0 and r1 regs.  The RTL dumps show that too.
>>>
>>> Debugging the calls to arm_function_value show that in the -flto compilation the function bar is deemed
>>> to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS variant, whereas for the non-LTO (and non-breaking)
>>> compilation it uses the ARM_PCS_AAPCS_VFP variant.
>>>
>>> Further down in use_vfp_abi when deciding whether to use VFP registers for the result there is a bit of
>>> logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL variant with a double precision value
>>> on an FPU that is not TARGET_VFP_DOUBLE.
>>>
>>> This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means that the function doesn't escape
>>> the translation unit and we can thus use whatever variant we want. From what I understand we want to use the
>>> VFP regs when possible for FP values.
>>>
>>> So this patch removes that restriction and for the testcase the caller of bar correctly reads the return
>>> value of bar from the VFP registers and everything works.
>>>
>>> This patch has been bootstrapped and tested on arm-none-linux-gnueabihf configured with --with-fpu=fpv4-sp-d16.
>>> The bootstrapped was performed with LTO.
>>> I didn't see any regressions.
>>>
>>> It seems that this logic was put there in 2009 with r154034 as part of a large patch to enable support for half-precision
>>> floating point.
>>>
>>> I'm not very familiar with this part of the code, so is this a safe patch to do?
>>> The patch should only ever change behaviour for single-precision-only fpus and only for static functions
>>> that don't get called outside their translation units (or during LTO I suppose) so there shouldn't
>>> be any ABI problems, I think.
>>>
>>> Is this ok for trunk?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>
>> Huh, I just realised I wrote completely the wrong PR number on this.
>> The PR I'm talking about here is PR target/69538
>>
>> Sorry for the confusion.
>>
>> Kyrill
>>
>>
>>> 2016-02-09 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>
>>>     PR target/65578
>>>     * config/arm/arm.c (use_vfp_abi): Remove id_double argument.
>>>     Don't check for is_double and TARGET_VFP_DOUBLE.
>>>     (aapcs_vfp_is_call_or_return_candidate): Update callsite.
>>>     (aapcs_vfp_is_return_candidate): Likewise.
>>>     (aapcs_vfp_is_call_candidate): Likewise.
>>>     (aapcs_vfp_allocate_return_reg): Likewise.
>>
>
Kyrill Tkachov March 2, 2016, 1:46 p.m. UTC | #4
Ping*3.

Thanks,
Kyrill
On 24/02/16 13:48, Kyrill Tkachov wrote:
> Ping*2
>
> Thanks,
> Kyrill
>
> On 17/02/16 10:12, Kyrill Tkachov wrote:
>> Ping.
>> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00634.html
>>
>> As mentioned before, this is actually a fix for PR target/69538.
>> I got confused when writing the cover letter and ChangeLog...
>>
>> Thanks,
>> Kyrill
>>
>> On 09/02/16 17:24, Kyrill Tkachov wrote:
>>>
>>> On 09/02/16 17:21, Kyrill Tkachov wrote:
>>>> Hi all,
>>>>
>>>> In this wrong-code PR the builtin-apply-4.c test fails with -flto but only when targeting an fpu
>>>> with only single-precision capabilities.
>>>>
>>>> bar is a function returing a double. For non-LTO compilation the caller of bar reads the return value
>>>> from it from the s0 and s1 VFP registers like expected, but for -flto the caller seems to expect the
>>>> return value from the r0 and r1 regs.  The RTL dumps show that too.
>>>>
>>>> Debugging the calls to arm_function_value show that in the -flto compilation the function bar is deemed
>>>> to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS variant, whereas for the non-LTO (and non-breaking)
>>>> compilation it uses the ARM_PCS_AAPCS_VFP variant.
>>>>
>>>> Further down in use_vfp_abi when deciding whether to use VFP registers for the result there is a bit of
>>>> logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL variant with a double precision value
>>>> on an FPU that is not TARGET_VFP_DOUBLE.
>>>>
>>>> This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means that the function doesn't escape
>>>> the translation unit and we can thus use whatever variant we want. From what I understand we want to use the
>>>> VFP regs when possible for FP values.
>>>>
>>>> So this patch removes that restriction and for the testcase the caller of bar correctly reads the return
>>>> value of bar from the VFP registers and everything works.
>>>>
>>>> This patch has been bootstrapped and tested on arm-none-linux-gnueabihf configured with --with-fpu=fpv4-sp-d16.
>>>> The bootstrapped was performed with LTO.
>>>> I didn't see any regressions.
>>>>
>>>> It seems that this logic was put there in 2009 with r154034 as part of a large patch to enable support for half-precision
>>>> floating point.
>>>>
>>>> I'm not very familiar with this part of the code, so is this a safe patch to do?
>>>> The patch should only ever change behaviour for single-precision-only fpus and only for static functions
>>>> that don't get called outside their translation units (or during LTO I suppose) so there shouldn't
>>>> be any ABI problems, I think.
>>>>
>>>> Is this ok for trunk?
>>>>
>>>> Thanks,
>>>> Kyrill
>>>>
>>>
>>> Huh, I just realised I wrote completely the wrong PR number on this.
>>> The PR I'm talking about here is PR target/69538
>>>
>>> Sorry for the confusion.
>>>
>>> Kyrill
>>>
>>>
>>>> 2016-02-09 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>>
>>>>     PR target/65578
>>>>     * config/arm/arm.c (use_vfp_abi): Remove id_double argument.
>>>>     Don't check for is_double and TARGET_VFP_DOUBLE.
>>>>     (aapcs_vfp_is_call_or_return_candidate): Update callsite.
>>>>     (aapcs_vfp_is_return_candidate): Likewise.
>>>>     (aapcs_vfp_is_call_candidate): Likewise.
>>>>     (aapcs_vfp_allocate_return_reg): Likewise.
>>>
>>
>
Kyrill Tkachov March 9, 2016, 9:32 a.m. UTC | #5
Ping*4.

Thanks,
Kyrill
On 02/03/16 13:46, Kyrill Tkachov wrote:
> Ping*3.
>
> Thanks,
> Kyrill
> On 24/02/16 13:48, Kyrill Tkachov wrote:
>> Ping*2
>>
>> Thanks,
>> Kyrill
>>
>> On 17/02/16 10:12, Kyrill Tkachov wrote:
>>> Ping.
>>> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00634.html
>>>
>>> As mentioned before, this is actually a fix for PR target/69538.
>>> I got confused when writing the cover letter and ChangeLog...
>>>
>>> Thanks,
>>> Kyrill
>>>
>>> On 09/02/16 17:24, Kyrill Tkachov wrote:
>>>>
>>>> On 09/02/16 17:21, Kyrill Tkachov wrote:
>>>>> Hi all,
>>>>>
>>>>> In this wrong-code PR the builtin-apply-4.c test fails with -flto but only when targeting an fpu
>>>>> with only single-precision capabilities.
>>>>>
>>>>> bar is a function returing a double. For non-LTO compilation the caller of bar reads the return value
>>>>> from it from the s0 and s1 VFP registers like expected, but for -flto the caller seems to expect the
>>>>> return value from the r0 and r1 regs.  The RTL dumps show that too.
>>>>>
>>>>> Debugging the calls to arm_function_value show that in the -flto compilation the function bar is deemed
>>>>> to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS variant, whereas for the non-LTO (and non-breaking)
>>>>> compilation it uses the ARM_PCS_AAPCS_VFP variant.
>>>>>
>>>>> Further down in use_vfp_abi when deciding whether to use VFP registers for the result there is a bit of
>>>>> logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL variant with a double precision value
>>>>> on an FPU that is not TARGET_VFP_DOUBLE.
>>>>>
>>>>> This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means that the function doesn't escape
>>>>> the translation unit and we can thus use whatever variant we want. From what I understand we want to use the
>>>>> VFP regs when possible for FP values.
>>>>>
>>>>> So this patch removes that restriction and for the testcase the caller of bar correctly reads the return
>>>>> value of bar from the VFP registers and everything works.
>>>>>
>>>>> This patch has been bootstrapped and tested on arm-none-linux-gnueabihf configured with --with-fpu=fpv4-sp-d16.
>>>>> The bootstrapped was performed with LTO.
>>>>> I didn't see any regressions.
>>>>>
>>>>> It seems that this logic was put there in 2009 with r154034 as part of a large patch to enable support for half-precision
>>>>> floating point.
>>>>>
>>>>> I'm not very familiar with this part of the code, so is this a safe patch to do?
>>>>> The patch should only ever change behaviour for single-precision-only fpus and only for static functions
>>>>> that don't get called outside their translation units (or during LTO I suppose) so there shouldn't
>>>>> be any ABI problems, I think.
>>>>>
>>>>> Is this ok for trunk?
>>>>>
>>>>> Thanks,
>>>>> Kyrill
>>>>>
>>>>
>>>> Huh, I just realised I wrote completely the wrong PR number on this.
>>>> The PR I'm talking about here is PR target/69538
>>>>
>>>> Sorry for the confusion.
>>>>
>>>> Kyrill
>>>>
>>>>
>>>>> 2016-02-09 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>>>
>>>>>     PR target/65578
>>>>>     * config/arm/arm.c (use_vfp_abi): Remove id_double argument.
>>>>>     Don't check for is_double and TARGET_VFP_DOUBLE.
>>>>>     (aapcs_vfp_is_call_or_return_candidate): Update callsite.
>>>>>     (aapcs_vfp_is_return_candidate): Likewise.
>>>>>     (aapcs_vfp_is_call_candidate): Likewise.
>>>>>     (aapcs_vfp_allocate_return_reg): Likewise.
>>>>
>>>
>>
>
Ramana Radhakrishnan March 23, 2016, 12:09 p.m. UTC | #6
On Tue, Feb 9, 2016 at 5:21 PM, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
> Hi all,
>
> In this wrong-code PR the builtin-apply-4.c test fails with -flto but only
> when targeting an fpu
> with only single-precision capabilities.
>
> bar is a function returing a double. For non-LTO compilation the caller of
> bar reads the return value
> from it from the s0 and s1 VFP registers like expected, but for -flto the
> caller seems to expect the
> return value from the r0 and r1 regs.  The RTL dumps show that too.
>
> Debugging the calls to arm_function_value show that in the -flto compilation
> the function bar is deemed
> to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS
> variant, whereas for the non-LTO (and non-breaking)
> compilation it uses the ARM_PCS_AAPCS_VFP variant.
>
> Further down in use_vfp_abi when deciding whether to use VFP registers for
> the result there is a bit of
> logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL
> variant with a double precision value
> on an FPU that is not TARGET_VFP_DOUBLE.
>
> This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means
> that the function doesn't escape
> the translation unit and we can thus use whatever variant we want. From what
> I understand we want to use the
> VFP regs when possible for FP values.
>
> So this patch removes that restriction and for the testcase the caller of
> bar correctly reads the return
> value of bar from the VFP registers and everything works.
>
> This patch has been bootstrapped and tested on arm-none-linux-gnueabihf
> configured with --with-fpu=fpv4-sp-d16.
> The bootstrapped was performed with LTO.
> I didn't see any regressions.
>
> It seems that this logic was put there in 2009 with r154034 as part of a
> large patch to enable support for half-precision
> floating point.
>
> I'm not very familiar with this part of the code, so is this a safe patch to
> do?
> The patch should only ever change behaviour for single-precision-only fpus
> and only for static functions
> that don't get called outside their translation units (or during LTO I
> suppose) so there shouldn't
> be any ABI problems, I think.
>
> Is this ok for trunk?

I spent sometime this morning reading through this patch and it does
look reasonably ok. The AAPCS tests if run for hardfloat should catch
any regressions. However given the stage we are in I'd like this
tested through compat.exp and struct-layout.exp across the range of
ABIs and FPU options to ensure we haven't missed anything. Richard ,
could you also give this a once over ?


Ramana







>
> Thanks,
> Kyrill
>
> 2016-02-09  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR target/65578
>     * config/arm/arm.c (use_vfp_abi): Remove id_double argument.
>     Don't check for is_double and TARGET_VFP_DOUBLE.
>     (aapcs_vfp_is_call_or_return_candidate): Update callsite.
>     (aapcs_vfp_is_return_candidate): Likewise.
>     (aapcs_vfp_is_call_candidate): Likewise.
>     (aapcs_vfp_allocate_return_reg): Likewise.
Kyrill Tkachov April 7, 2016, 12:32 p.m. UTC | #7
Hi Ramana,

On 23/03/16 12:09, Ramana Radhakrishnan wrote:
> On Tue, Feb 9, 2016 at 5:21 PM, Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>> Hi all,
>>
>> In this wrong-code PR the builtin-apply-4.c test fails with -flto but only
>> when targeting an fpu
>> with only single-precision capabilities.
>>
>> bar is a function returing a double. For non-LTO compilation the caller of
>> bar reads the return value
>> from it from the s0 and s1 VFP registers like expected, but for -flto the
>> caller seems to expect the
>> return value from the r0 and r1 regs.  The RTL dumps show that too.
>>
>> Debugging the calls to arm_function_value show that in the -flto compilation
>> the function bar is deemed
>> to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS
>> variant, whereas for the non-LTO (and non-breaking)
>> compilation it uses the ARM_PCS_AAPCS_VFP variant.
>>
>> Further down in use_vfp_abi when deciding whether to use VFP registers for
>> the result there is a bit of
>> logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL
>> variant with a double precision value
>> on an FPU that is not TARGET_VFP_DOUBLE.
>>
>> This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means
>> that the function doesn't escape
>> the translation unit and we can thus use whatever variant we want. From what
>> I understand we want to use the
>> VFP regs when possible for FP values.
>>
>> So this patch removes that restriction and for the testcase the caller of
>> bar correctly reads the return
>> value of bar from the VFP registers and everything works.
>>
>> This patch has been bootstrapped and tested on arm-none-linux-gnueabihf
>> configured with --with-fpu=fpv4-sp-d16.
>> The bootstrapped was performed with LTO.
>> I didn't see any regressions.
>>
>> It seems that this logic was put there in 2009 with r154034 as part of a
>> large patch to enable support for half-precision
>> floating point.
>>
>> I'm not very familiar with this part of the code, so is this a safe patch to
>> do?
>> The patch should only ever change behaviour for single-precision-only fpus
>> and only for static functions
>> that don't get called outside their translation units (or during LTO I
>> suppose) so there shouldn't
>> be any ABI problems, I think.
>>
>> Is this ok for trunk?
> I spent sometime this morning reading through this patch and it does
> look reasonably ok. The AAPCS tests if run for hardfloat should catch
> any regressions. However given the stage we are in I'd like this
> tested through compat.exp and struct-layout.exp across the range of
> ABIs and FPU options to ensure we haven't missed anything. Richard ,
> could you also give this a once over ?

I've ran compat.exp and struct-layout-1.exp against GCC 5
and a trunk compiler without this patch and it didn't expose
any failures with any /-mfpu options that I tried.

Thanks,
Kyrill

>
> Ramana
>
>
>
>
>
>
>
>> Thanks,
>> Kyrill
>>
>> 2016-02-09  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      PR target/65578
>>      * config/arm/arm.c (use_vfp_abi): Remove id_double argument.
>>      Don't check for is_double and TARGET_VFP_DOUBLE.
>>      (aapcs_vfp_is_call_or_return_candidate): Update callsite.
>>      (aapcs_vfp_is_return_candidate): Likewise.
>>      (aapcs_vfp_is_call_candidate): Likewise.
>>      (aapcs_vfp_allocate_return_reg): Likewise.
Richard Earnshaw (lists) April 8, 2016, 10:10 a.m. UTC | #8
On 09/02/16 17:21, Kyrill Tkachov wrote:
> Hi all,
> 
> In this wrong-code PR the builtin-apply-4.c test fails with -flto but
> only when targeting an fpu
> with only single-precision capabilities.
> 
> bar is a function returing a double. For non-LTO compilation the caller
> of bar reads the return value
> from it from the s0 and s1 VFP registers like expected, but for -flto
> the caller seems to expect the
> return value from the r0 and r1 regs.  The RTL dumps show that too.
> 
> Debugging the calls to arm_function_value show that in the -flto
> compilation the function bar is deemed
> to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS
> variant, whereas for the non-LTO (and non-breaking)
> compilation it uses the ARM_PCS_AAPCS_VFP variant.
> 
> Further down in use_vfp_abi when deciding whether to use VFP registers
> for the result there is a bit of
> logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL
> variant with a double precision value
> on an FPU that is not TARGET_VFP_DOUBLE.
> 
> This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL
> means that the function doesn't escape
> the translation unit and we can thus use whatever variant we want. From
> what I understand we want to use the
> VFP regs when possible for FP values.
> 

I'm not sure I agree with you here.  The problem is that a core with a
single-precision only FPU is required by the hard-float ABI to pass
double precision values in VFP registers, but it can do nothing useful
with them while they are in that register bank, so before any processing
can be done on a double precision value it has to copy it back to a core
register; copying values between register banks is often relatively
expensive.  Furthermore, if that value is used as a parameter or
function call result it has to put it back again.  That adds further to
the inefficiency.  Don't forget also that the EABI soft-float libcalls
don't follow the normal ABI rules and always take parameters and pass
results in the core registers.

So I think the idea here is that we can avoid some copying overhead when
we know that the function in question does not escape from the current
compilation unit.  If that's going wrong with LTO then there's a bug
somewhere in how we assess whether a function escapes.

R.

> So this patch removes that restriction and for the testcase the caller
> of bar correctly reads the return
> value of bar from the VFP registers and everything works.
> 
> This patch has been bootstrapped and tested on arm-none-linux-gnueabihf
> configured with --with-fpu=fpv4-sp-d16.
> The bootstrapped was performed with LTO.
> I didn't see any regressions.
> 
> It seems that this logic was put there in 2009 with r154034 as part of a
> large patch to enable support for half-precision
> floating point.
> 
> I'm not very familiar with this part of the code, so is this a safe
> patch to do?
> The patch should only ever change behaviour for single-precision-only
> fpus and only for static functions
> that don't get called outside their translation units (or during LTO I
> suppose) so there shouldn't
> be any ABI problems, I think.
> 
> Is this ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2016-02-09  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     PR target/65578
>     * config/arm/arm.c (use_vfp_abi): Remove id_double argument.
>     Don't check for is_double and TARGET_VFP_DOUBLE.
>     (aapcs_vfp_is_call_or_return_candidate): Update callsite.
>     (aapcs_vfp_is_return_candidate): Likewise.
>     (aapcs_vfp_is_call_candidate): Likewise.
>     (aapcs_vfp_allocate_return_reg): Likewise.
diff mbox

Patch

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e71c9f56cbe846fdddf2e42a9f4575bacee570c1..e1404c74f74d01eb9c3362c7250e2b30ba5e47e7 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -5696,7 +5696,7 @@  aapcs_vfp_sub_candidate (const_tree type, machine_mode *modep)
 
 /* Return true if PCS_VARIANT should use VFP registers.  */
 static bool
-use_vfp_abi (enum arm_pcs pcs_variant, bool is_double)
+use_vfp_abi (enum arm_pcs pcs_variant)
 {
   if (pcs_variant == ARM_PCS_AAPCS_VFP)
     {
@@ -5715,8 +5715,7 @@  use_vfp_abi (enum arm_pcs pcs_variant, bool is_double)
   if (pcs_variant != ARM_PCS_AAPCS_LOCAL)
     return false;
 
-  return (TARGET_32BIT && TARGET_VFP && TARGET_HARD_FLOAT &&
-	  (TARGET_VFP_DOUBLE || !is_double));
+  return (TARGET_32BIT && TARGET_VFP && TARGET_HARD_FLOAT);
 }
 
 /* Return true if an argument whose type is TYPE, or mode is MODE, is
@@ -5758,7 +5757,7 @@  aapcs_vfp_is_call_or_return_candidate (enum arm_pcs pcs_variant,
     return false;
 
 
-  if (!use_vfp_abi (pcs_variant, ARM_NUM_REGS (new_mode) > 1))
+  if (!use_vfp_abi (pcs_variant))
     return false;
 
   *base_mode = new_mode;
@@ -5772,7 +5771,7 @@  aapcs_vfp_is_return_candidate (enum arm_pcs pcs_variant,
   int count ATTRIBUTE_UNUSED;
   machine_mode ag_mode ATTRIBUTE_UNUSED;
 
-  if (!use_vfp_abi (pcs_variant, false))
+  if (!use_vfp_abi (pcs_variant))
     return false;
   return aapcs_vfp_is_call_or_return_candidate (pcs_variant, mode, type,
 						&ag_mode, &count);
@@ -5782,7 +5781,7 @@  static bool
 aapcs_vfp_is_call_candidate (CUMULATIVE_ARGS *pcum, machine_mode mode,
 			     const_tree type)
 {
-  if (!use_vfp_abi (pcum->pcs_variant, false))
+  if (!use_vfp_abi (pcum->pcs_variant))
     return false;
 
   return aapcs_vfp_is_call_or_return_candidate (pcum->pcs_variant, mode, type,
@@ -5848,7 +5847,7 @@  aapcs_vfp_allocate_return_reg (enum arm_pcs pcs_variant ATTRIBUTE_UNUSED,
 			       machine_mode mode,
 			       const_tree type ATTRIBUTE_UNUSED)
 {
-  if (!use_vfp_abi (pcs_variant, false))
+  if (!use_vfp_abi (pcs_variant))
     return NULL;
 
   if (mode == BLKmode