diff mbox

[ARM] PR71607: New approach to arm_disable_literal_pool

Message ID 588A1C45.4000607@arm.com
State New
Headers show

Commit Message

Andre Vieira (lists) Jan. 26, 2017, 3:56 p.m. UTC
On 20/01/17 14:08, Ramana Radhakrishnan wrote:
> On Wed, Dec 28, 2016 at 9:58 AM, Andre Vieira (lists)
> <Andre.SimoesDiasVieira@arm.com> wrote:
>> On 29/11/16 09:45, Andre Vieira (lists) wrote:
>>> On 17/11/16 10:00, Ramana Radhakrishnan wrote:
>>>> On Thu, Oct 6, 2016 at 2:57 PM, Andre Vieira (lists)
>>>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>>> Hello,
>>>>>
>>>>> This patch tackles the issue reported in PR71607. This patch takes a
>>>>> different approach for disabling the creation of literal pools. Instead
>>>>> of disabling the patterns that would normally transform the rtl into
>>>>> actual literal pools, it disables the creation of this literal pool rtl
>>>>> by making the target hook TARGET_CANNOT_FORCE_CONST_MEM return true if
>>>>> arm_disable_literal_pool is true. I added patterns to split floating
>>>>> point constants for both SF and DFmode. A pattern to handle the
>>>>> addressing of label_refs had to be included as well since all
>>>>> "memory_operand" patterns are disabled when
>>>>> TARGET_CANNOT_FORCE_CONST_MEM returns true. Also the pattern for
>>>>> splitting 32-bit immediates had to be changed, it was not accepting
>>>>> unsigned 32-bit unsigned integers with the MSB set. I believe
>>>>> const_int_operand expects the mode of the operand to be set to VOIDmode
>>>>> and not SImode. I have only changed it in the patterns that were
>>>>> affecting this code, though I suggest looking into changing it in the
>>>>> rest of the ARM backend.
>>>>>
>>>>> I added more test cases. No regressions for arm-none-eabi with
>>>>> Cortex-M0, Cortex-M3 and Cortex-M7.
>>>>>
>>>>> Is this OK for trunk?
>>>>
>>>> Including -mslow-flash-data in your multilib flags ? If no regressions
>>>> with that ok .
>>>>
>>>>
>>>> regards
>>>> Ramana
>>>>
>>>>>
>>>
>>> Hello,
>>>
>>> I found some new ICE's with the -mslow-flash-data testing so I had to
>>> rework this patch. I took the opportunity to rebase it as well.
>>>
>>> The problem was with the way the old version of the patch handled label
>>> references.  After some digging I found I wasn't using the right target
>>> hook and so I implemented the 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' for
>>> ARM.  This target hook determines whether a literal pool ends up in an
>>> 'object_block' structure. So I reverted the changes made in the old
>>> version of the patch to the ARM implementation of the
>>> 'TARGET_CANNOT_FORCE_CONST_MEM' hook and rely on
>>> 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' instead. This patch adds an ARM
>>> implementation for this hook that returns false if
>>> 'arm_disable_literal_pool' is set to true and true otherwise.
>>>
>>> This version of the patch also reverts back to using the check for
>>> 'SYMBOL_REF' in 'thumb2_legitimate_address_p' that was removed in the
>>> last version, this code is required to place the label references in
>>> rodata sections.
>>>
>>> Another thing this patch does is revert the changes made to the 32-bit
>>> constant split in arm.md. The reason this was needed before was because
>>> 'real_to_target' returns a long array and does not sign-extend values in
>>> it, which would make sense on hosts with 64-bit longs. To fix this the
>>> value is now casted to 'int' first.  It would probably be a good idea to
>>> change the 'real_to_target' function to return an array with
>>> 'HOST_WIDE_INT' elements instead and either use all 64-bits or
>>> sign-extend them.  Something for the future?
>>>
>>> I added more test cases in this patch and reran regression tests for:
>>> Cortex-M0, Cortex-M4 with and without -mslow-flash-data. Also did a
>>> bootstrap+regressions on arm-none-linux-gnueabihf.
>>>
>>> Is this OK for trunk?
>>>
>>> Cheers,
>>> Andre
>>>
>>> gcc/ChangeLog:
>>>
>>> 2016-11-29  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>
>>>     PR target/71607
>>>     * config/arm/arm.md (use_literal_pool): Removes.
>>>     (64-bit immediate split): No longer takes cost into consideration
>>>     if 'arm_disable_literal_pool' is enabled.
>>>     * config/arm/arm.c (arm_use_blocks_for_constant_p): New.
>>>     (TARGET_USE_BLOCKS_FOR_CONSTANT_P): Define.
>>>     (arm_max_const_double_inline_cost): Remove use of
>>> arm_disable_literal_pool.
>>>     * config/arm/vfp.md (no_literal_pool_df_immediate): New.
>>>     (no_literal_pool_sf_immediate): New.
>>>
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2016-11-29  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>             Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>>
>>>     PR target/71607
>>>     * gcc.target/arm/thumb2-slow-flash-data.c: Renamed to ...
>>>     * gcc.target/arm/thumb2-slow-flash-data-1.c: ... this.
>>>     * gcc.target/arm/thumb2-slow-flash-data-2.c: New.
>>>     * gcc.target/arm/thumb2-slow-flash-data-3.c: New.
>>>     * gcc.target/arm/thumb2-slow-flash-data-4.c: New.
>>>     * gcc.target/arm/thumb2-slow-flash-data-5.c: New.
>>>
>> Hello,
>>
>> As I have mentioned in my commit message for the fix on embedded-6  (see
>> https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01304.html) I found an
>> issue with this code when testing its backport on the embedded-6-branch.
>>
>> I misread the definition of the target hook
>> TARGET_USE_BLOCKS_FOR_CONSTANT_P and it seems the way I implemented it
>> before only changed code generation if the -mslow-flash-data option
>> wasn't passed. I.e. I don't need to implement it to solve this issue.
>> The other changes in this patch are sufficient...
>>
>> I reran regressions for arm-none-eabi-gcc with the following targets:
>> Cortex-M0, Cortex-M3, Cortex-M7 and ARMv8-M Baseline. I also ran
>> bootstraps and full regressions for arm-none-linux-gnueabihf both in ARM
>> and THUMB mode. All green!
> 
> And presumably with -mslow-flash-data as a multilib again ?
> 
>>
>> Is this OK for trunk?
> 
> We should also be adding to arm_reorg
> 
> 
> if (arm_disable_literal_pool)
>   return ;
> 
> after we split all the insns.
> 
> 
> After all if the splitters added had done their job, there's no reason
> to run the minipool placement code at all - is there  :) ?
> 
> Furthermore is it guaranteed that we will not have any references to
> the literal pool post reload - IIRC , reload in its older days could
> easily just push any old constant into a standard constant pool entry
> and make things work . Is LRA likely not to produce any constant pool
> entries in this form ? The bad thing here would be producing literal
> pool entries when the user had asked the compiler to do so - thus
> having more belts and braces for this would be good (thus , why is a
> change to arm_cannot_force_const_mem to return false in the face of
> arm_disable_literal_pool not appropriate ? ) Nick also had a couple of
> good points on making the test more usable across all testers in
> private conversation - so please make those changes.
> 
> 
> 
> 
> regards
> Ramana
> 
> 
>>
>> Cheers,
>> Andre
>>
>> gcc/ChangeLog:
>>
>> 2016-12-28  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>
>>     PR target/71607
>>     * config/arm/arm.md (use_literal_pool): Removes.
>>     (64-bit immediate split): No longer takes cost into consideration
>>     if 'arm_disable_literal_pool' is enabled.
>>     * config/arm/arm.c (arm_max_const_double_inline_cost): Remove use
>>     of arm_disable_literal_pool.
>>     * config/arm/vfp.md (no_literal_pool_df_immediate): New.
>>     (no_literal_pool_sf_immediate): New.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-12-28  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>             Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>
>>     PR target/71607
>>     * gcc.target/arm/thumb2-slow-flash-data.c: Renamed to ...
>>     * gcc.target/arm/thumb2-slow-flash-data-1.c: ... this.
>>     * gcc.target/arm/thumb2-slow-flash-data-2.c: New.
>>     * gcc.target/arm/thumb2-slow-flash-data-3.c: New.
>>     * gcc.target/arm/thumb2-slow-flash-data-4.c: New.
>>     * gcc.target/arm/thumb2-slow-flash-data-5.c: New.
Hello,

I forgot to mention I ran the full regression tests for -mcpu=cortex-m3
with the -mslow-flash-data option, this showed up clean.

After I disabled the creation of literal pools in arm_reorg if
'arm_disable_literal' pool is true the same test showed regressions with
TLS tests. The compiler generates assembly that tries to load directly
from a gott table symbol which is invalid assembly.  We do not have the
required AArch32 TLS relocations for instructions.

However, it might be worth mentioning that the GNU ARM Embedded
Toolchain is configured to use TLS emulation and this code is thus not
generated for toolchains configured with --disable-threads and
--disable-tls.  This wrong code generation would thus only occur in the
"niche" case of configuring a toolchain with threads and TLS, for a
M-profile device whilst compiling code containing TLS variables using
-mslow-flash-data.

As for your question whether a change to 'arm_cannot_force_const_mem' to
return false in the face of 'arm_disable_literal_pool'.  I think you
meant to say 'true' here.  I had tried this before, but this leads to
arm_legitimate_constant_p always returning false, which prevents the use
of constants altogether, even when synthesized into movw and movt's.

The way we currently use the target hooks around literal pools should
perhaps be reviewed, but that is a much bigger task. For instance, we
should perhaps be using 'TARGET_USE_BLOCKS_FOR_CONSTANT_P', which reads:
"This hook should return true if pool entries for constant x can be
placed in an object_block structure.". We currently use the default
implementation for this hook which returns false.

I reran regressions for arm-none-eabi-gcc with the following targets:
Cortex-M3, Cortex-M4 and Cortex-M7. I did not reran bootstraps as the
change in this new version can only affect M-profile targets and could
never change code-generation for other targets. Some of these tests are
still running, I will not commit anything before they are done. Though I
don't expect anything to change, since the -mslow-flash-data regression
testing, see bellow, worked fine.

I also reran the -mslow-flash-data for arm-none-eabi with Cortex-M3, and
as I mentioned earlier depending on how GCC is configured it could
regress for TLS tests.

Is this OK for trunk?

Cheers,
Andre

Comments

Ramana Radhakrishnan Jan. 27, 2017, 12:13 p.m. UTC | #1
On Thu, Jan 26, 2017 at 3:56 PM, Andre Vieira (lists)
<Andre.SimoesDiasVieira@arm.com> wrote:
> On 20/01/17 14:08, Ramana Radhakrishnan wrote:
>> On Wed, Dec 28, 2016 at 9:58 AM, Andre Vieira (lists)
>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>> On 29/11/16 09:45, Andre Vieira (lists) wrote:
>>>> On 17/11/16 10:00, Ramana Radhakrishnan wrote:
>>>>> On Thu, Oct 6, 2016 at 2:57 PM, Andre Vieira (lists)
>>>>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> This patch tackles the issue reported in PR71607. This patch takes a
>>>>>> different approach for disabling the creation of literal pools. Instead
>>>>>> of disabling the patterns that would normally transform the rtl into
>>>>>> actual literal pools, it disables the creation of this literal pool rtl
>>>>>> by making the target hook TARGET_CANNOT_FORCE_CONST_MEM return true if
>>>>>> arm_disable_literal_pool is true. I added patterns to split floating
>>>>>> point constants for both SF and DFmode. A pattern to handle the
>>>>>> addressing of label_refs had to be included as well since all
>>>>>> "memory_operand" patterns are disabled when
>>>>>> TARGET_CANNOT_FORCE_CONST_MEM returns true. Also the pattern for
>>>>>> splitting 32-bit immediates had to be changed, it was not accepting
>>>>>> unsigned 32-bit unsigned integers with the MSB set. I believe
>>>>>> const_int_operand expects the mode of the operand to be set to VOIDmode
>>>>>> and not SImode. I have only changed it in the patterns that were
>>>>>> affecting this code, though I suggest looking into changing it in the
>>>>>> rest of the ARM backend.
>>>>>>
>>>>>> I added more test cases. No regressions for arm-none-eabi with
>>>>>> Cortex-M0, Cortex-M3 and Cortex-M7.
>>>>>>
>>>>>> Is this OK for trunk?
>>>>>
>>>>> Including -mslow-flash-data in your multilib flags ? If no regressions
>>>>> with that ok .
>>>>>
>>>>>
>>>>> regards
>>>>> Ramana
>>>>>
>>>>>>
>>>>
>>>> Hello,
>>>>
>>>> I found some new ICE's with the -mslow-flash-data testing so I had to
>>>> rework this patch. I took the opportunity to rebase it as well.
>>>>
>>>> The problem was with the way the old version of the patch handled label
>>>> references.  After some digging I found I wasn't using the right target
>>>> hook and so I implemented the 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' for
>>>> ARM.  This target hook determines whether a literal pool ends up in an
>>>> 'object_block' structure. So I reverted the changes made in the old
>>>> version of the patch to the ARM implementation of the
>>>> 'TARGET_CANNOT_FORCE_CONST_MEM' hook and rely on
>>>> 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' instead. This patch adds an ARM
>>>> implementation for this hook that returns false if
>>>> 'arm_disable_literal_pool' is set to true and true otherwise.
>>>>
>>>> This version of the patch also reverts back to using the check for
>>>> 'SYMBOL_REF' in 'thumb2_legitimate_address_p' that was removed in the
>>>> last version, this code is required to place the label references in
>>>> rodata sections.
>>>>
>>>> Another thing this patch does is revert the changes made to the 32-bit
>>>> constant split in arm.md. The reason this was needed before was because
>>>> 'real_to_target' returns a long array and does not sign-extend values in
>>>> it, which would make sense on hosts with 64-bit longs. To fix this the
>>>> value is now casted to 'int' first.  It would probably be a good idea to
>>>> change the 'real_to_target' function to return an array with
>>>> 'HOST_WIDE_INT' elements instead and either use all 64-bits or
>>>> sign-extend them.  Something for the future?
>>>>
>>>> I added more test cases in this patch and reran regression tests for:
>>>> Cortex-M0, Cortex-M4 with and without -mslow-flash-data. Also did a
>>>> bootstrap+regressions on arm-none-linux-gnueabihf.
>>>>
>>>> Is this OK for trunk?
>>>>
>>>> Cheers,
>>>> Andre
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2016-11-29  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>
>>>>     PR target/71607
>>>>     * config/arm/arm.md (use_literal_pool): Removes.
>>>>     (64-bit immediate split): No longer takes cost into consideration
>>>>     if 'arm_disable_literal_pool' is enabled.
>>>>     * config/arm/arm.c (arm_use_blocks_for_constant_p): New.
>>>>     (TARGET_USE_BLOCKS_FOR_CONSTANT_P): Define.
>>>>     (arm_max_const_double_inline_cost): Remove use of
>>>> arm_disable_literal_pool.
>>>>     * config/arm/vfp.md (no_literal_pool_df_immediate): New.
>>>>     (no_literal_pool_sf_immediate): New.
>>>>
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2016-11-29  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>             Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>>>
>>>>     PR target/71607
>>>>     * gcc.target/arm/thumb2-slow-flash-data.c: Renamed to ...
>>>>     * gcc.target/arm/thumb2-slow-flash-data-1.c: ... this.
>>>>     * gcc.target/arm/thumb2-slow-flash-data-2.c: New.
>>>>     * gcc.target/arm/thumb2-slow-flash-data-3.c: New.
>>>>     * gcc.target/arm/thumb2-slow-flash-data-4.c: New.
>>>>     * gcc.target/arm/thumb2-slow-flash-data-5.c: New.
>>>>
>>> Hello,
>>>
>>> As I have mentioned in my commit message for the fix on embedded-6  (see
>>> https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01304.html) I found an
>>> issue with this code when testing its backport on the embedded-6-branch.
>>>
>>> I misread the definition of the target hook
>>> TARGET_USE_BLOCKS_FOR_CONSTANT_P and it seems the way I implemented it
>>> before only changed code generation if the -mslow-flash-data option
>>> wasn't passed. I.e. I don't need to implement it to solve this issue.
>>> The other changes in this patch are sufficient...
>>>
>>> I reran regressions for arm-none-eabi-gcc with the following targets:
>>> Cortex-M0, Cortex-M3, Cortex-M7 and ARMv8-M Baseline. I also ran
>>> bootstraps and full regressions for arm-none-linux-gnueabihf both in ARM
>>> and THUMB mode. All green!
>>
>> And presumably with -mslow-flash-data as a multilib again ?
>>
>>>
>>> Is this OK for trunk?
>>
>> We should also be adding to arm_reorg
>>
>>
>> if (arm_disable_literal_pool)
>>   return ;
>>
>> after we split all the insns.
>>
>>
>> After all if the splitters added had done their job, there's no reason
>> to run the minipool placement code at all - is there  :) ?
>>
>> Furthermore is it guaranteed that we will not have any references to
>> the literal pool post reload - IIRC , reload in its older days could
>> easily just push any old constant into a standard constant pool entry
>> and make things work . Is LRA likely not to produce any constant pool
>> entries in this form ? The bad thing here would be producing literal
>> pool entries when the user had asked the compiler to do so - thus
>> having more belts and braces for this would be good (thus , why is a
>> change to arm_cannot_force_const_mem to return false in the face of
>> arm_disable_literal_pool not appropriate ? ) Nick also had a couple of
>> good points on making the test more usable across all testers in
>> private conversation - so please make those changes.
>>
>>
>>
>>
>> regards
>> Ramana
>>
>>
> Hello,
>
> I forgot to mention I ran the full regression tests for -mcpu=cortex-m3
> with the -mslow-flash-data option, this showed up clean.
>
> After I disabled the creation of literal pools in arm_reorg if
> 'arm_disable_literal' pool is true the same test showed regressions with
> TLS tests. The compiler generates assembly that tries to load directly
> from a gott table symbol which is invalid assembly.  We do not have the
> required AArch32 TLS relocations for instructions.

Ok. That sounds reasonable.


>
> However, it might be worth mentioning that the GNU ARM Embedded
> Toolchain is configured to use TLS emulation and this code is thus not
> generated for toolchains configured with --disable-threads and
> --disable-tls.  This wrong code generation would thus only occur in the
> "niche" case of configuring a toolchain with threads and TLS, for a
> M-profile device whilst compiling code containing TLS variables using
> -mslow-flash-data.

Agreed -


I have seen other requests for a similar feature even on A profile,
thus documenting this in the form of comments in the source
would  be very welcome.

>
> As for your question whether a change to 'arm_cannot_force_const_mem' to
> return false in the face of 'arm_disable_literal_pool'.  I think you
> meant to say 'true' here.  I had tried this before, but this leads to
> arm_legitimate_constant_p always returning false, which prevents the use
> of constants altogether, even when synthesized into movw and movt's.
>

Yes I meant to say true  , jetlag ...

Did you try refactoring the existing logic into a helper function that
does what it does today and only return true for
target_slow_flash_data in the arm_cannot_force_const_mem interface ?

i.e.

arm_cannot_force_const_mem => arm_cannot_force_const_mem_1

arm_legitimate_constant_p calls arm_cannot_force_const_mem_1 ..
and

New definition of arm_cannot_force_const_mem returns true for
constants for target_flash_slow_data.

and then handles arm_cannot_force_const_mem_1 as it exists today

Can you experiment with that please ?

I'm happy with that as a follow-up patch.  I'm really keen on making
sure that the compiler does *not* produce any literal pools at all for
slow-flash-data and we fail hard in case we can't handle it. The
requirement is no literal pools and the compiler should enforce that
to the best of its ability and not paper over problems.



> The way we currently use the target hooks around literal pools should
> perhaps be reviewed, but that is a much bigger task. For instance, we
> should perhaps be using 'TARGET_USE_BLOCKS_FOR_CONSTANT_P', which reads:
> "This hook should return true if pool entries for constant x can be
> placed in an object_block structure.". We currently use the default
> implementation for this hook which returns false.

I'm not sure if that is appropriate here as this tries to put things
into constant pool blocks that are shared probably between many
functions which implies code is able to access the constant pool
anywhere. There is some tie in to section anchors if I remember
correctly. I don't think we can use this in the ARM backend because we
really need the minipool placement to work for constant pools. I think
that is orthogonal to this.

>
> I reran regressions for arm-none-eabi-gcc with the following targets:
> Cortex-M3, Cortex-M4 and Cortex-M7. I did not reran bootstraps as the
> change in this new version can only affect M-profile targets and could
> never change code-generation for other targets. Some of these tests are
> still running, I will not commit anything before they are done. Though I
> don't expect anything to change, since the -mslow-flash-data regression
> testing, see bellow, worked fine.
>
> I also reran the -mslow-flash-data for arm-none-eabi with Cortex-M3, and
> as I mentioned earlier depending on how GCC is configured it could
> regress for TLS tests.


Given that maybe we should just disable the use of TLS in
-mslow-flash-data and give an error ?

Ok with all the caveats as listed above.


Ramana


>
> Is this OK for trunk?


>
> Cheers,
> Andre
Andre Vieira (lists) Jan. 30, 2017, 5:14 p.m. UTC | #2
On 27/01/17 12:13, Ramana Radhakrishnan wrote:
> On Thu, Jan 26, 2017 at 3:56 PM, Andre Vieira (lists)
> <Andre.SimoesDiasVieira@arm.com> wrote:
>> On 20/01/17 14:08, Ramana Radhakrishnan wrote:
>>> On Wed, Dec 28, 2016 at 9:58 AM, Andre Vieira (lists)
>>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>> On 29/11/16 09:45, Andre Vieira (lists) wrote:
>>>>> On 17/11/16 10:00, Ramana Radhakrishnan wrote:
>>>>>> On Thu, Oct 6, 2016 at 2:57 PM, Andre Vieira (lists)
>>>>>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> This patch tackles the issue reported in PR71607. This patch takes a
>>>>>>> different approach for disabling the creation of literal pools. Instead
>>>>>>> of disabling the patterns that would normally transform the rtl into
>>>>>>> actual literal pools, it disables the creation of this literal pool rtl
>>>>>>> by making the target hook TARGET_CANNOT_FORCE_CONST_MEM return true if
>>>>>>> arm_disable_literal_pool is true. I added patterns to split floating
>>>>>>> point constants for both SF and DFmode. A pattern to handle the
>>>>>>> addressing of label_refs had to be included as well since all
>>>>>>> "memory_operand" patterns are disabled when
>>>>>>> TARGET_CANNOT_FORCE_CONST_MEM returns true. Also the pattern for
>>>>>>> splitting 32-bit immediates had to be changed, it was not accepting
>>>>>>> unsigned 32-bit unsigned integers with the MSB set. I believe
>>>>>>> const_int_operand expects the mode of the operand to be set to VOIDmode
>>>>>>> and not SImode. I have only changed it in the patterns that were
>>>>>>> affecting this code, though I suggest looking into changing it in the
>>>>>>> rest of the ARM backend.
>>>>>>>
>>>>>>> I added more test cases. No regressions for arm-none-eabi with
>>>>>>> Cortex-M0, Cortex-M3 and Cortex-M7.
>>>>>>>
>>>>>>> Is this OK for trunk?
>>>>>>
>>>>>> Including -mslow-flash-data in your multilib flags ? If no regressions
>>>>>> with that ok .
>>>>>>
>>>>>>
>>>>>> regards
>>>>>> Ramana
>>>>>>
>>>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> I found some new ICE's with the -mslow-flash-data testing so I had to
>>>>> rework this patch. I took the opportunity to rebase it as well.
>>>>>
>>>>> The problem was with the way the old version of the patch handled label
>>>>> references.  After some digging I found I wasn't using the right target
>>>>> hook and so I implemented the 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' for
>>>>> ARM.  This target hook determines whether a literal pool ends up in an
>>>>> 'object_block' structure. So I reverted the changes made in the old
>>>>> version of the patch to the ARM implementation of the
>>>>> 'TARGET_CANNOT_FORCE_CONST_MEM' hook and rely on
>>>>> 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' instead. This patch adds an ARM
>>>>> implementation for this hook that returns false if
>>>>> 'arm_disable_literal_pool' is set to true and true otherwise.
>>>>>
>>>>> This version of the patch also reverts back to using the check for
>>>>> 'SYMBOL_REF' in 'thumb2_legitimate_address_p' that was removed in the
>>>>> last version, this code is required to place the label references in
>>>>> rodata sections.
>>>>>
>>>>> Another thing this patch does is revert the changes made to the 32-bit
>>>>> constant split in arm.md. The reason this was needed before was because
>>>>> 'real_to_target' returns a long array and does not sign-extend values in
>>>>> it, which would make sense on hosts with 64-bit longs. To fix this the
>>>>> value is now casted to 'int' first.  It would probably be a good idea to
>>>>> change the 'real_to_target' function to return an array with
>>>>> 'HOST_WIDE_INT' elements instead and either use all 64-bits or
>>>>> sign-extend them.  Something for the future?
>>>>>
>>>>> I added more test cases in this patch and reran regression tests for:
>>>>> Cortex-M0, Cortex-M4 with and without -mslow-flash-data. Also did a
>>>>> bootstrap+regressions on arm-none-linux-gnueabihf.
>>>>>
>>>>> Is this OK for trunk?
>>>>>
>>>>> Cheers,
>>>>> Andre
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> 2016-11-29  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>>
>>>>>     PR target/71607
>>>>>     * config/arm/arm.md (use_literal_pool): Removes.
>>>>>     (64-bit immediate split): No longer takes cost into consideration
>>>>>     if 'arm_disable_literal_pool' is enabled.
>>>>>     * config/arm/arm.c (arm_use_blocks_for_constant_p): New.
>>>>>     (TARGET_USE_BLOCKS_FOR_CONSTANT_P): Define.
>>>>>     (arm_max_const_double_inline_cost): Remove use of
>>>>> arm_disable_literal_pool.
>>>>>     * config/arm/vfp.md (no_literal_pool_df_immediate): New.
>>>>>     (no_literal_pool_sf_immediate): New.
>>>>>
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> 2016-11-29  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>>             Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>>>>
>>>>>     PR target/71607
>>>>>     * gcc.target/arm/thumb2-slow-flash-data.c: Renamed to ...
>>>>>     * gcc.target/arm/thumb2-slow-flash-data-1.c: ... this.
>>>>>     * gcc.target/arm/thumb2-slow-flash-data-2.c: New.
>>>>>     * gcc.target/arm/thumb2-slow-flash-data-3.c: New.
>>>>>     * gcc.target/arm/thumb2-slow-flash-data-4.c: New.
>>>>>     * gcc.target/arm/thumb2-slow-flash-data-5.c: New.
>>>>>
>>>> Hello,
>>>>
>>>> As I have mentioned in my commit message for the fix on embedded-6  (see
>>>> https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01304.html) I found an
>>>> issue with this code when testing its backport on the embedded-6-branch.
>>>>
>>>> I misread the definition of the target hook
>>>> TARGET_USE_BLOCKS_FOR_CONSTANT_P and it seems the way I implemented it
>>>> before only changed code generation if the -mslow-flash-data option
>>>> wasn't passed. I.e. I don't need to implement it to solve this issue.
>>>> The other changes in this patch are sufficient...
>>>>
>>>> I reran regressions for arm-none-eabi-gcc with the following targets:
>>>> Cortex-M0, Cortex-M3, Cortex-M7 and ARMv8-M Baseline. I also ran
>>>> bootstraps and full regressions for arm-none-linux-gnueabihf both in ARM
>>>> and THUMB mode. All green!
>>>
>>> And presumably with -mslow-flash-data as a multilib again ?
>>>
>>>>
>>>> Is this OK for trunk?
>>>
>>> We should also be adding to arm_reorg
>>>
>>>
>>> if (arm_disable_literal_pool)
>>>   return ;
>>>
>>> after we split all the insns.
>>>
>>>
>>> After all if the splitters added had done their job, there's no reason
>>> to run the minipool placement code at all - is there  :) ?
>>>
>>> Furthermore is it guaranteed that we will not have any references to
>>> the literal pool post reload - IIRC , reload in its older days could
>>> easily just push any old constant into a standard constant pool entry
>>> and make things work . Is LRA likely not to produce any constant pool
>>> entries in this form ? The bad thing here would be producing literal
>>> pool entries when the user had asked the compiler to do so - thus
>>> having more belts and braces for this would be good (thus , why is a
>>> change to arm_cannot_force_const_mem to return false in the face of
>>> arm_disable_literal_pool not appropriate ? ) Nick also had a couple of
>>> good points on making the test more usable across all testers in
>>> private conversation - so please make those changes.
>>>
>>>
>>>
>>>
>>> regards
>>> Ramana
>>>
>>>
>> Hello,
>>
>> I forgot to mention I ran the full regression tests for -mcpu=cortex-m3
>> with the -mslow-flash-data option, this showed up clean.
>>
>> After I disabled the creation of literal pools in arm_reorg if
>> 'arm_disable_literal' pool is true the same test showed regressions with
>> TLS tests. The compiler generates assembly that tries to load directly
>> from a gott table symbol which is invalid assembly.  We do not have the
>> required AArch32 TLS relocations for instructions.
> 
> Ok. That sounds reasonable.
> 
> 
>>
>> However, it might be worth mentioning that the GNU ARM Embedded
>> Toolchain is configured to use TLS emulation and this code is thus not
>> generated for toolchains configured with --disable-threads and
>> --disable-tls.  This wrong code generation would thus only occur in the
>> "niche" case of configuring a toolchain with threads and TLS, for a
>> M-profile device whilst compiling code containing TLS variables using
>> -mslow-flash-data.
> 
> Agreed -
> 
> 
> I have seen other requests for a similar feature even on A profile,
> thus documenting this in the form of comments in the source
> would  be very welcome.

I will add a comment to the arm_reorg change with:
  /* FIXME: This will cause the generation of invalid assembly for code
using TLS '__thread' variables.  ARM currently does not provide
relocations to encode these into AArch32 instructions, only data, so
there is no way to currently implement these if a literal pool is
disabled.  */


> 
>>
>> As for your question whether a change to 'arm_cannot_force_const_mem' to
>> return false in the face of 'arm_disable_literal_pool'.  I think you
>> meant to say 'true' here.  I had tried this before, but this leads to
>> arm_legitimate_constant_p always returning false, which prevents the use
>> of constants altogether, even when synthesized into movw and movt's.
>>
> 
> Yes I meant to say true  , jetlag ...
> 
> Did you try refactoring the existing logic into a helper function that
> does what it does today and only return true for
> target_slow_flash_data in the arm_cannot_force_const_mem interface ?
> 
> i.e.
> 
> arm_cannot_force_const_mem => arm_cannot_force_const_mem_1
> 
> arm_legitimate_constant_p calls arm_cannot_force_const_mem_1 ..
> and
> 
> New definition of arm_cannot_force_const_mem returns true for
> constants for target_flash_slow_data.
> 
> and then handles arm_cannot_force_const_mem_1 as it exists today
> 
> Can you experiment with that please ?

Ah I had sort of... and I remembered there was something with label's
and this exposes it.  If I do this, I end up with the following ICE for
thumb2-slow-flash-data-1.c:
error: unrecognizable insn:
 }
 ^
(insn 7 4 8 2 (set (reg/f:SI 114)
        (label_ref:SI 45)) "t.c":55 -1
     (insn_list:REG_LABEL_OPERAND 45 (nil)))

I believe this is because 'force_const_mem' in varasm.c. It is called
from the 'emit_move_insn' expander and it will now keep the 'label_ref'
rather than transform it into something in the following format:
(mem/u/c:SI (symbol_ref/u:SI ("*.LC1") [flags 0x2]) [5  S4 A32])

From reading the comments in emit_move_insn I suspect this might be a
matter of a missing expander for ARM?  The generation of the memory
wrapper symbol_ref does seem to happen in force_const_mem, so I'm not
entirely sure duplicating this is a good idea...

If I do not make the change to arm_cannot_foce_const_mem, the current
code will generate a .rodata section containing the label address, which
is pointed to by .LC1 (the one inside the symbol_ref).

> 
> I'm happy with that as a follow-up patch.  I'm really keen on making
> sure that the compiler does *not* produce any literal pools at all for
> slow-flash-data and we fail hard in case we can't handle it. The
> requirement is no literal pools and the compiler should enforce that
> to the best of its ability and not paper over problems.
> 
> 
> 
>> The way we currently use the target hooks around literal pools should
>> perhaps be reviewed, but that is a much bigger task. For instance, we
>> should perhaps be using 'TARGET_USE_BLOCKS_FOR_CONSTANT_P', which reads:
>> "This hook should return true if pool entries for constant x can be
>> placed in an object_block structure.". We currently use the default
>> implementation for this hook which returns false.
> 
> I'm not sure if that is appropriate here as this tries to put things
> into constant pool blocks that are shared probably between many
> functions which implies code is able to access the constant pool
> anywhere. There is some tie in to section anchors if I remember
> correctly. I don't think we can use this in the ARM backend because we
> really need the minipool placement to work for constant pools. I think
> that is orthogonal to this.
> 
>>
>> I reran regressions for arm-none-eabi-gcc with the following targets:
>> Cortex-M3, Cortex-M4 and Cortex-M7. I did not reran bootstraps as the
>> change in this new version can only affect M-profile targets and could
>> never change code-generation for other targets. Some of these tests are
>> still running, I will not commit anything before they are done. Though I
>> don't expect anything to change, since the -mslow-flash-data regression
>> testing, see bellow, worked fine.
>>
>> I also reran the -mslow-flash-data for arm-none-eabi with Cortex-M3, and
>> as I mentioned earlier depending on how GCC is configured it could
>> regress for TLS tests.
> 
> 
> Given that maybe we should just disable the use of TLS in
> -mslow-flash-data and give an error ?

How can I check for the use of TLS given -mslow-flash-data? Best I could
think of would be to disable the patterns that handle the unspecs for
PIC and TLS. Though AFAICT its the thumb2_movsi_insn pattern that's
generating the wrong assembly...
> 
> Ok with all the caveats as listed above.
> 
> 
> Ramana
> 
> 
>>
>> Is this OK for trunk?
> 
> 
>>
>> Cheers,
>> Andre
diff mbox

Patch

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index bbf10f23987e58a1e066715e2168772da0245ff1..ab4a9059c000417037d0a9c70aebc6987ddae235 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -16373,10 +16373,6 @@  push_minipool_fix (rtx_insn *insn, HOST_WIDE_INT address, rtx *loc,
 int
 arm_max_const_double_inline_cost ()
 {
-  /* Let the value get synthesized to avoid the use of literal pools.  */
-  if (arm_disable_literal_pool)
-    return 99;
-
   return ((optimize_size || arm_ld_sched) ? 3 : 4);
 }
 
@@ -17323,6 +17319,11 @@  arm_reorg (void)
   if (!optimize)
     split_all_insns_noflow ();
 
+  /* Make sure we do not attempt to create a literal pool even though it should
+     no longer be necessary to create any.  */
+  if (arm_disable_literal_pool)
+    return ;
+
   minipool_fix_head = minipool_fix_tail = NULL;
 
   /* The first insn must always be a note, or the code below won't
@@ -30887,5 +30888,4 @@  arm_expand_divmod_libfunc (rtx libfunc, machine_mode mode,
   *quot_p = quotient;
   *rem_p = remainder;
 }
-
 #include "gt-arm.h"
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index ff1f565b850d1b9e7378461c5a808e7d486936bc..7d16e412765ccff4c527871b6b5d6142aa088b56 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -233,10 +233,6 @@ 
 	       (match_test "arm_restrict_it"))
 	  (const_string "no")
 
-	  (and (eq_attr "use_literal_pool" "yes")
-	       (match_test "arm_disable_literal_pool"))
-	  (const_string "no")
-
 	  (eq_attr "arch_enabled" "no")
 	  (const_string "no")]
 	 (const_string "yes")))
@@ -5871,8 +5867,9 @@ 
 	(match_operand:ANY64 1 "immediate_operand" ""))]
   "TARGET_32BIT
    && reload_completed
-   && (arm_const_double_inline_cost (operands[1])
-       <= arm_max_const_double_inline_cost ())"
+   && (arm_disable_literal_pool
+       || (arm_const_double_inline_cost (operands[1])
+	   <= arm_max_const_double_inline_cost ()))"
   [(const_int 0)]
   "
   arm_split_constant (SET, SImode, curr_insn,
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index f83dc9b130e4288faa49e40ee8285a0cd7d325b1..b8db34a607e4fac75789fa6a7c09b3b6104f45d3 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -2079,3 +2079,40 @@ 
 ;; fmdhr et al (VFPv1)
 ;; Support for xD (single precision only) variants.
 ;; fmrrs, fmsrr
+
+;; Split an immediate DF move to two immediate SI moves.
+(define_insn_and_split "no_literal_pool_df_immediate"
+  [(set (match_operand 0 "s_register_operand" "")
+	(match_operand:DF 1 "const_double_operand" ""))]
+  "TARGET_THUMB2 && arm_disable_literal_pool
+  && !(TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE
+       && vfp3_const_double_rtx (operands[1]))"
+  "#"
+  "&& !reload_completed"
+  [(set (subreg:SI (match_dup 1) 0) (match_dup 2))
+   (set (subreg:SI (match_dup 1) 4) (match_dup 3))
+   (set (match_dup 0) (match_dup 1))]
+  "
+  long buf[2];
+  real_to_target (buf, CONST_DOUBLE_REAL_VALUE (operands[1]), DFmode);
+  operands[2] = GEN_INT ((int) buf[0]);
+  operands[3] = GEN_INT ((int) buf[1]);
+  operands[1] = gen_reg_rtx (DFmode);
+  ")
+
+;; Split an immediate SF move to one immediate SI move.
+(define_insn_and_split "no_literal_pool_sf_immediate"
+  [(set (match_operand 0 "s_register_operand" "")
+	(match_operand:SF 1 "const_double_operand" ""))]
+  "TARGET_THUMB2 && arm_disable_literal_pool
+  && !(TARGET_HARD_FLOAT && vfp3_const_double_rtx (operands[1]))"
+  "#"
+  "&& !reload_completed"
+  [(set (subreg:SI (match_dup 1) 0) (match_dup 2))
+   (set (match_dup 0) (match_dup 1))]
+  "
+  long buf;
+  real_to_target (&buf, CONST_DOUBLE_REAL_VALUE (operands[1]), SFmode);
+  operands[2] = GEN_INT ((int) buf);
+  operands[1] = gen_reg_rtx (SFmode);
+  ")
diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-2.c b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..8caf090e20a144b65ac50eea65c665c264c908f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-2.c
@@ -0,0 +1,26 @@ 
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mcpu=*" } { "-mcpu=cortex-m4" "-mcpu=cortex-m7" } } */
+/* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=hard" } } */
+/* { dg-options "-march=armv7e-m -mfloat-abi=hard -O2 -mthumb -mslow-flash-data" } */
+
+float f (float);
+
+const float max = 0.01f;
+
+int
+g (float in)
+{
+  if (f (in) + f (in) < max)
+    return 0;
+  return 1;
+}
+
+double foo (void)
+{
+  return 0xF1EC7A5239123AF;
+}
+
+double bar (void)
+{
+  return 0.0f;
+}
diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-3.c b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..54392866c55f5df25d54b8ad6ee3a8a46f22855f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-3.c
@@ -0,0 +1,23 @@ 
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mcpu=*" } { "-mcpu=cortex-m4" "-mcpu=cortex-m7" } } */
+/* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=hard" } } */
+/* { dg-options "-march=armv7e-m -mfloat-abi=hard -mthumb -mslow-flash-data" } */
+
+/* From PR71607 */
+
+float b;
+void fn1 ();
+
+float
+fn2 ()
+{
+  return 1.1f;
+}
+
+void
+fn3 ()
+{
+  float a[2];
+  a[1] = b;
+  fn1 (a);
+}
diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-4.c b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..4a2b8f7949c9ef14a977e929b6555b9e983543c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-4.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mcpu=*" } { "-mcpu=cortex-m4" "-mcpu=cortex-m7" } } */
+/* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=hard" } } */
+/* { dg-options "-march=armv7e-m -mfloat-abi=hard -O2 -mthumb -mslow-flash-data" } */
+
+double __attribute__ ((target ("fpu=fpv5-d16")))
+foo (void)
+{
+  return 1.0f;
+}
+
+float __attribute__ ((target ("fpu=fpv5-d16")))
+bar (void)
+{
+  return 1.0f;
+}
+
+float __attribute__ ((target ("fpu=fpv5-sp-d16")))
+baz (void)
+{
+  return 1.0f;
+}
+
+/* { dg-final { scan-assembler-times "#1\\.0e\\+0" 3 } } */
diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-5.c b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..96ac45791c77622fc9071b551287a810b5bea75a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-5.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mcpu=*" } { "-mcpu=cortex-m4" "-mcpu=cortex-m7" } } */
+/* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=hard" } } */
+/* { dg-options "-march=armv7e-m -mfloat-abi=hard -O2 -mthumb -mslow-flash-data" } */
+
+double __attribute__ ((target ("fpu=fpv5-sp-d16")))
+foo (void)
+{
+  return 1.0f;
+}
+
+/* { dg-final { scan-assembler-not "#1\\.0e\\+0" } } */
diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c
rename to gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-1.c