diff mbox

[ARM] attribute target (thumb,arm)

Message ID 54293C82.9040607@st.com
State New
Headers show

Commit Message

Christian Bruel Sept. 29, 2014, 11:03 a.m. UTC
Hi Ramana, Richard,

This patch implements the attribute target (and pragma) to allow
function based interworking.

as in the updated documentation, the syntax is:

 __attribute__((target("thumb"))) int foo()
Forces thumb mode for function foo only. If the file was compiled with
-mthumb iit has no effect.

Similarly

 __attribute__((target("arm"))) int foo()
Forces arm mode for function foo. It has no effect if the file was not
compiled with -mthumb.

and regions can be grouped together with

#pragma GCC target ("thumb")
or
#pragma GCC target ("arm")

a few notes
- Inlining is allowed between functions of the same mode (compilation
switch, #pragma and attribute)
- 'arm_option_override' is now reorganized around
'arm_option_override_internal' for thumb related macros
- I kept TARGET_UNIFIED_ASM to minimize changes. Although removing it
would avoid to switch between unified/divided asms
  and simplify arm_declare_function_name. Should be considered at some
point.
- It is only available for Thumb2 variants (for thumb1 lack of interest
and a few complications I was unable to test, although this could be
added easily if needed, I think)

Tested for no regression for arm-none-eabi [,-with-arch=armv7-a]

  OK for trunk ?

many thanks,

Christian

Comments

Ramana Radhakrishnan Oct. 8, 2014, 1:05 p.m. UTC | #1
Hi Christian,

	Thanks for looking at this. I will need to read the code in detail but 
this is a first top level reivew.

On 09/29/14 12:03, Christian Bruel wrote:
> Hi Ramana, Richard,
>
> This patch implements the attribute target (and pragma) to allow
> function based interworking.
>
> as in the updated documentation, the syntax is:
>
>   __attribute__((target("thumb"))) int foo()
> Forces thumb mode for function foo only. If the file was compiled with
> -mthumb iit has no effect.

Indeed


>
> Similarly
>
>   __attribute__((target("arm"))) int foo()
> Forces arm mode for function foo. It has no effect if the file was not
> compiled with -mthumb.

Indeed.

>
> and regions can be grouped together with
>
> #pragma GCC target ("thumb")
> or
> #pragma GCC target ("arm")
>
> a few notes
> - Inlining is allowed between functions of the same mode (compilation
> switch, #pragma and attribute)

Why shouldn't we allow inlining between functions of ARM mode vs Thumb 
mode ? After all the choice of mode is irrelevant at the time of 
inlining (except possibly for inline assembler).

Perhaps an option is to try to disable inlining in the presence of 
inline assembler or if not gate it from a command line option.

> - 'arm_option_override' is now reorganized around
> 'arm_option_override_internal' for thumb related macros

Looks like a reasonable start - We need a couple of tests to make sure 
that __attribute__(("arm")) on a file compiled for the M profile results 
in a syntax error. v7(e)m is Thumb2 only.

for bonus points it would be great to get __attribute__(("target")) 
working properly in the backend. I suspect a number of the tuning flags 
and the global architecture state needs to be moved into this as well to 
handle cases where __attribute__(("arm")) used with M profile options is 
error'd out.

> - I kept TARGET_UNIFIED_ASM to minimize changes. Although removing it
> would avoid to switch between unified/divided asms

I know Terry's been trying to get Thumb1 to also switch by default to 
unified asm. So I think a lot of the logic with "emit_thumb" could just 
go away. Maybe we should just consider switching ARM state to unified 
syntax and that would be as simple as changing TARGET_UNIFIED_SYNTAX in 
arm.h to be TARGET_32BIT. Long overdue IMHO.

The only gotcha here is inline assembler but GAS is so permissive that 
I'm not too worried about it in ARM state and Thumb2 state. I'm a bit 
worried about Thumb1.


>    and simplify arm_declare_function_name. Should be considered at some
> point.

I think that can be done for a lot of newer cores - some of that logic 
is dated now IIUC.

I remember why my original project failed - I couldn't get enough of the 
backend in shape for the state to be saved and restored and then I moved 
on to other more interesting things, so whatever is done here needs to 
make sure that all ISA mode state is saved and restored.

One thing I experimented with while doing this work was adding something 
like the mflip-mips16 option and then have the testsuite run with this 
option to try and make sure enough of the backend state is saved and 
restored properly.

This will give more testing coverage hopefully to the switching logic 
for the attributes and hopefully expose any issues that are there with 
respect to saving and restoring state. The problem you'll probably find 
is that in some of the gcc.target/arm tests the flipping of state will 
cause various interesting failures. The other side is making sure that 
all state that is global is now captured and reinitialized everytime we 
switch context between ARM and Thumb.

I'm not sure how to best test the pragma switching logic but given that 
the 2 hang off each other, it should just work (TM) if one is right. In 
any case adding some testcases that direct test this would be useful.


> - It is only available for Thumb2 variants (for thumb1 lack of interest
> and a few complications I was unable to test, although this could be
> added easily if needed, I think)


I don't think we should restrict this to Thumb1 or Thumb2 it should be 
allowed if the architecture allows it.

For example __attribute__((thumb)) on a function compiled with 
-march=armv5te -mfloat-abi=softfp -mfpu=vfpv3 should give a syntax error 
as this is not supported. No VFP instructions in Thumb1. Explicit tests 
for this would be appreciated.

IIRC all other cases should be accepted.


>
> Tested for no regression for arm-none-eabi [,-with-arch=armv7-a]
>
>    OK for trunk ?

Sorry not yet.

I would also like some documentation added to extend.texi for these 
attributes.

So in summary.

1. Some documentation in extend.texi please.
2. TARGET_UNIFIED_SYNTAX to be turned on for ARM state too.
3. Testcases for this and some testing with a mflip-thumbness switch 
(added only for testing).
4. Investigate further giving up restriction on Thumb1.
5. Investigate lifting inlining restriction for __attribute__(("arm")) 
or __attribute__ (("thumb")) or maybe gate it off a command line option.
6. Split this patch into smaller logical chunks that can help with 
review i.e. the restructuring from the attribute and pragma addition, 
the testcases.
7. Tests for the diagnostics and error cases i.e. __attribute__(("arm")) 
used with -march=armv7em -march=armv7m command line options.


regards
Ramana
>
>
>
>
>
>
Christian Bruel Oct. 8, 2014, 2:38 p.m. UTC | #2
Hi Ramana,

Thanks for your feedback. Just a few comments while you continue the review

1) about the documentation in extend.texi, it was in the  patch already
: did I miss a part ?
      * doc/extend.texi (arm, thumb): Document target attributes.
      * doc/invoke.texi (arm, thumb): Mention target attributes.

2) about supporting thumb1
   OK I'll suppress this limitation. But I covered the testing only for
thumb2 as I don't have a thumb1 platform, if it's OK with you thumb1
will only be covered by "visual checking". Can you help to test this mode ?
   
3) about inlining
  I dislike inlining different modes, From a conceptual use, a user
might want to switch mode only when changing a function's hotness.
Usually inlining a cold function into a hot one is not what the user
explicitly required when setting different mode attributes for them,

The compiler would take a decision that is not what the user wrote. And
in addition if you consider the few instructions to modify R15 to switch
state that would end up with more code executed in the critical path,
voiding a possible size of speed gain.

4) about coverage.
   Thanks for your idea about a mflip like internal option for the
testsuite. I'll give it a try. Note that in the meantime I gave a few
successful tries with LTO, and I'm in the process of running a
combinatorial exploration of a set of larger benchmarcks.
   Thanks for you hint about testing the  -march=armv7em -march=armv7m
error cases. This is indeed needed.

5) I'm still not sure about what to do with TARGET_UNIFIED_ASM. In one
hand I'm reluctant to bind to this development an improvement that
should be orthogonal (or a prerequisite), in another hand I don't really
like the logic with "emit_thumb". If your recommendation is to make
TARGET_UNIFIED_ASM the default for ARM that great, but I'm still
worrying for thumb1. Terry's feedback might be useful for this.

I'll resent the patch in different parts and thumb1 support. For your
inlining concern do you agree that inlining different modes might not be
mandatory (or even counter-productive) at this stage ?

Best Regards

Christian

On 10/08/2014 03:05 PM, Ramana Radhakrishnan wrote:
> Hi Christian,
>
> 	Thanks for looking at this. I will need to read the code in detail but 
> this is a first top level reivew.
>
> On 09/29/14 12:03, Christian Bruel wrote:
>> Hi Ramana, Richard,
>>
>> This patch implements the attribute target (and pragma) to allow
>> function based interworking.
>>
>> as in the updated documentation, the syntax is:
>>
>>   __attribute__((target("thumb"))) int foo()
>> Forces thumb mode for function foo only. If the file was compiled with
>> -mthumb iit has no effect.
> Indeed
>
>
>> Similarly
>>
>>   __attribute__((target("arm"))) int foo()
>> Forces arm mode for function foo. It has no effect if the file was not
>> compiled with -mthumb.
> Indeed.
>
>> and regions can be grouped together with
>>
>> #pragma GCC target ("thumb")
>> or
>> #pragma GCC target ("arm")
>>
>> a few notes
>> - Inlining is allowed between functions of the same mode (compilation
>> switch, #pragma and attribute)
> Why shouldn't we allow inlining between functions of ARM mode vs Thumb 
> mode ? After all the choice of mode is irrelevant at the time of 
> inlining (except possibly for inline assembler).
>
> Perhaps an option is to try to disable inlining in the presence of 
> inline assembler or if not gate it from a command line option.
>
>> - 'arm_option_override' is now reorganized around
>> 'arm_option_override_internal' for thumb related macros
> Looks like a reasonable start - We need a couple of tests to make sure 
> that __attribute__(("arm")) on a file compiled for the M profile results 
> in a syntax error. v7(e)m is Thumb2 only.
>
> for bonus points it would be great to get __attribute__(("target")) 
> working properly in the backend. I suspect a number of the tuning flags 
> and the global architecture state needs to be moved into this as well to 
> handle cases where __attribute__(("arm")) used with M profile options is 
> error'd out.
>
>> - I kept TARGET_UNIFIED_ASM to minimize changes. Although removing it
>> would avoid to switch between unified/divided asms
> I know Terry's been trying to get Thumb1 to also switch by default to 
> unified asm. So I think a lot of the logic with "emit_thumb" could just 
> go away. Maybe we should just consider switching ARM state to unified 
> syntax and that would be as simple as changing TARGET_UNIFIED_SYNTAX in 
> arm.h to be TARGET_32BIT. Long overdue IMHO.
>
> The only gotcha here is inline assembler but GAS is so permissive that 
> I'm not too worried about it in ARM state and Thumb2 state. I'm a bit 
> worried about Thumb1.
>
>
>>    and simplify arm_declare_function_name. Should be considered at some
>> point.
> I think that can be done for a lot of newer cores - some of that logic 
> is dated now IIUC.
>
> I remember why my original project failed - I couldn't get enough of the 
> backend in shape for the state to be saved and restored and then I moved 
> on to other more interesting things, so whatever is done here needs to 
> make sure that all ISA mode state is saved and restored.
>
> One thing I experimented with while doing this work was adding something 
> like the mflip-mips16 option and then have the testsuite run with this 
> option to try and make sure enough of the backend state is saved and 
> restored properly.
>
> This will give more testing coverage hopefully to the switching logic 
> for the attributes and hopefully expose any issues that are there with 
> respect to saving and restoring state. The problem you'll probably find 
> is that in some of the gcc.target/arm tests the flipping of state will 
> cause various interesting failures. The other side is making sure that 
> all state that is global is now captured and reinitialized everytime we 
> switch context between ARM and Thumb.
>
> I'm not sure how to best test the pragma switching logic but given that 
> the 2 hang off each other, it should just work (TM) if one is right. In 
> any case adding some testcases that direct test this would be useful.
>
>
>> - It is only available for Thumb2 variants (for thumb1 lack of interest
>> and a few complications I was unable to test, although this could be
>> added easily if needed, I think)
>
> I don't think we should restrict this to Thumb1 or Thumb2 it should be 
> allowed if the architecture allows it.
>
> For example __attribute__((thumb)) on a function compiled with 
> -march=armv5te -mfloat-abi=softfp -mfpu=vfpv3 should give a syntax error 
> as this is not supported. No VFP instructions in Thumb1. Explicit tests 
> for this would be appreciated.
>
> IIRC all other cases should be accepted.
>
>
>> Tested for no regression for arm-none-eabi [,-with-arch=armv7-a]
>>
>>    OK for trunk ?
> Sorry not yet.
>
> I would also like some documentation added to extend.texi for these 
> attributes.
>
> So in summary.
>
> 1. Some documentation in extend.texi please.
> 2. TARGET_UNIFIED_SYNTAX to be turned on for ARM state too.
> 3. Testcases for this and some testing with a mflip-thumbness switch 
> (added only for testing).
> 4. Investigate further giving up restriction on Thumb1.
> 5. Investigate lifting inlining restriction for __attribute__(("arm")) 
> or __attribute__ (("thumb")) or maybe gate it off a command line option.
> 6. Split this patch into smaller logical chunks that can help with 
> review i.e. the restructuring from the attribute and pragma addition, 
> the testcases.
> 7. Tests for the diagnostics and error cases i.e. __attribute__(("arm")) 
> used with -march=armv7em -march=armv7m command line options.
>
>
> regards
> Ramana
>>
>>
>>
>>
>>
Ramana Radhakrishnan Oct. 8, 2014, 4:56 p.m. UTC | #3
Hi Christian,


On 08/10/14 15:38, Christian Bruel wrote:
> Hi Ramana,
>
> Thanks for your feedback. Just a few comments while you continue the review

Sure - all thoughts are welcome, this isn't a trivial project and I'm 
dredging tertiary storage in my brain and old notes for context on all 
the gotchas involved in this.
>
> 1) about the documentation in extend.texi, it was in the  patch already
> : did I miss a part ?
>        * doc/extend.texi (arm, thumb): Document target attributes.
>        * doc/invoke.texi (arm, thumb): Mention target attributes.
>

Indeed - I don't know how I missed it and as I said it was a brief 
review. I am more interested in discussing the semantics up front.

> 2) about supporting thumb1
>     OK I'll suppress this limitation. But I covered the testing only for
> thumb2 as I don't have a thumb1 platform, if it's OK with you thumb1
> will only be covered by "visual checking". Can you help to test this mode ?
>
> 3) about inlining
>    I dislike inlining different modes, From a conceptual use, a user
> might want to switch mode only when changing a function's hotness.
> Usually inlining a cold function into a hot one is not what the user
> explicitly required when setting different mode attributes for them,

__attribute__((thumb)) should not imply coldness or hotness. Inlining 
between cold and hot functions should be done based on profile feedback. 
The choice of compiling in "Thumb1" state for coldness is a separate one 
because that's where the choice needs to be made.

>
> The compiler would take a decision that is not what the user wrote. And
> in addition if you consider the few instructions to modify R15 to switch
> state that would end up with more code executed in the critical path,
> voiding a possible size of speed gain.

I do not expect there to be any additional instructions needed to switch 
state. If function x is inlined into function y the state would be lost 
and the state would be in terms of the state of function x.

Obviously if the user doesn't want inlining - the user would add 
attributes to disable inlining. You do have extensions such as 
__attribute__((noinline)) and __attribute__((never_inline)) to give the 
user that control and those bits need to be used in addition.

The attribute then purely reflects then the output instruction state of 
the function if a copy of it's body is laid out separately in the output.

IMHO, the heuristics for inlining should be the best judge of when 
functions should be inlined between one and another and we shouldn't be 
second guessing that in the backend.

If there is a copy of the function to be put out by the compiler, only 
then should we choose this based on the state of the "target" i.e. arm 
or thumb.

>
> 4) about coverage.
>     Thanks for your idea about a mflip like internal option for the
> testsuite. I'll give it a try. Note that in the meantime I gave a few
> successful tries with LTO, and I'm in the process of running a
> combinatorial exploration of a set of larger benchmarcks.

Would be interesting to hear in terms of how you played with LTO.

>     Thanks for you hint about testing the  -march=armv7em -march=armv7m
> error cases. This is indeed needed.

Adding some directed tests for the pragmas would also be required.

>
> 5) I'm still not sure about what to do with TARGET_UNIFIED_ASM. In one
> hand I'm reluctant to bind to this development an improvement that
> should be orthogonal (or a prerequisite), in another hand I don't really
> like the logic with "emit_thumb". If your recommendation is to make
> TARGET_UNIFIED_ASM the default for ARM that great, but I'm still
> worrying for thumb1. Terry's feedback might be useful for this.

I want to think about this carefully too.

>
> I'll resent the patch in different parts and thumb1 support. For your
> inlining concern do you agree that inlining different modes might not be
> mandatory (or even counter-productive) at this stage ?

More practically and from memory I remember that there maybe quite a lot 
of false positives in the other parts of the testsuite if you put in the 
-mflip-thumbness switch if inlining is turned off.

I think I've also explained my reasons for allowing inlining above.

Also with the target macros that you are changing between ARM and Thumb 
it would be good to make sure that you capture everything. I am 
concerned about the __ARM_ARCH_ISA_ARM and __ARM_ARCH_ISA_THUMB macros 
with the M profile cores. Those probably also need handling please.

I'll review this again when the next patch set arrives with the 
different parts.

regards
Ramana

>
> Best Regards
>
> Christian
>
> On 10/08/2014 03:05 PM, Ramana Radhakrishnan wrote:
>> Hi Christian,
>>
>> 	Thanks for looking at this. I will need to read the code in detail but
>> this is a first top level reivew.
>>
>> On 09/29/14 12:03, Christian Bruel wrote:
>>> Hi Ramana, Richard,
>>>
>>> This patch implements the attribute target (and pragma) to allow
>>> function based interworking.
>>>
>>> as in the updated documentation, the syntax is:
>>>
>>>    __attribute__((target("thumb"))) int foo()
>>> Forces thumb mode for function foo only. If the file was compiled with
>>> -mthumb iit has no effect.
>> Indeed
>>
>>
>>> Similarly
>>>
>>>    __attribute__((target("arm"))) int foo()
>>> Forces arm mode for function foo. It has no effect if the file was not
>>> compiled with -mthumb.
>> Indeed.
>>
>>> and regions can be grouped together with
>>>
>>> #pragma GCC target ("thumb")
>>> or
>>> #pragma GCC target ("arm")
>>>
>>> a few notes
>>> - Inlining is allowed between functions of the same mode (compilation
>>> switch, #pragma and attribute)
>> Why shouldn't we allow inlining between functions of ARM mode vs Thumb
>> mode ? After all the choice of mode is irrelevant at the time of
>> inlining (except possibly for inline assembler).
>>
>> Perhaps an option is to try to disable inlining in the presence of
>> inline assembler or if not gate it from a command line option.
>>
>>> - 'arm_option_override' is now reorganized around
>>> 'arm_option_override_internal' for thumb related macros
>> Looks like a reasonable start - We need a couple of tests to make sure
>> that __attribute__(("arm")) on a file compiled for the M profile results
>> in a syntax error. v7(e)m is Thumb2 only.
>>
>> for bonus points it would be great to get __attribute__(("target"))
>> working properly in the backend. I suspect a number of the tuning flags
>> and the global architecture state needs to be moved into this as well to
>> handle cases where __attribute__(("arm")) used with M profile options is
>> error'd out.
>>
>>> - I kept TARGET_UNIFIED_ASM to minimize changes. Although removing it
>>> would avoid to switch between unified/divided asms
>> I know Terry's been trying to get Thumb1 to also switch by default to
>> unified asm. So I think a lot of the logic with "emit_thumb" could just
>> go away. Maybe we should just consider switching ARM state to unified
>> syntax and that would be as simple as changing TARGET_UNIFIED_SYNTAX in
>> arm.h to be TARGET_32BIT. Long overdue IMHO.
>>
>> The only gotcha here is inline assembler but GAS is so permissive that
>> I'm not too worried about it in ARM state and Thumb2 state. I'm a bit
>> worried about Thumb1.
>>
>>
>>>     and simplify arm_declare_function_name. Should be considered at some
>>> point.
>> I think that can be done for a lot of newer cores - some of that logic
>> is dated now IIUC.
>>
>> I remember why my original project failed - I couldn't get enough of the
>> backend in shape for the state to be saved and restored and then I moved
>> on to other more interesting things, so whatever is done here needs to
>> make sure that all ISA mode state is saved and restored.
>>
>> One thing I experimented with while doing this work was adding something
>> like the mflip-mips16 option and then have the testsuite run with this
>> option to try and make sure enough of the backend state is saved and
>> restored properly.
>>
>> This will give more testing coverage hopefully to the switching logic
>> for the attributes and hopefully expose any issues that are there with
>> respect to saving and restoring state. The problem you'll probably find
>> is that in some of the gcc.target/arm tests the flipping of state will
>> cause various interesting failures. The other side is making sure that
>> all state that is global is now captured and reinitialized everytime we
>> switch context between ARM and Thumb.
>>
>> I'm not sure how to best test the pragma switching logic but given that
>> the 2 hang off each other, it should just work (TM) if one is right. In
>> any case adding some testcases that direct test this would be useful.
>>
>>
>>> - It is only available for Thumb2 variants (for thumb1 lack of interest
>>> and a few complications I was unable to test, although this could be
>>> added easily if needed, I think)
>>
>> I don't think we should restrict this to Thumb1 or Thumb2 it should be
>> allowed if the architecture allows it.
>>
>> For example __attribute__((thumb)) on a function compiled with
>> -march=armv5te -mfloat-abi=softfp -mfpu=vfpv3 should give a syntax error
>> as this is not supported. No VFP instructions in Thumb1. Explicit tests
>> for this would be appreciated.
>>
>> IIRC all other cases should be accepted.
>>
>>
>>> Tested for no regression for arm-none-eabi [,-with-arch=armv7-a]
>>>
>>>     OK for trunk ?
>> Sorry not yet.
>>
>> I would also like some documentation added to extend.texi for these
>> attributes.
>>
>> So in summary.
>>
>> 1. Some documentation in extend.texi please.
>> 2. TARGET_UNIFIED_SYNTAX to be turned on for ARM state too.
>> 3. Testcases for this and some testing with a mflip-thumbness switch
>> (added only for testing).
>> 4. Investigate further giving up restriction on Thumb1.
>> 5. Investigate lifting inlining restriction for __attribute__(("arm"))
>> or __attribute__ (("thumb")) or maybe gate it off a command line option.
>> 6. Split this patch into smaller logical chunks that can help with
>> review i.e. the restructuring from the attribute and pragma addition,
>> the testcases.
>> 7. Tests for the diagnostics and error cases i.e. __attribute__(("arm"))
>> used with -march=armv7em -march=armv7m command line options.
>>
>>
>> regards
>> Ramana
>>>
>>>
>>>
>>>
>>>
>
>
Christian Bruel Oct. 9, 2014, 11:35 a.m. UTC | #4
On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote:
> Hi Christian,

<< snipped agreed stuf >>
> 3) about inlining
>    I dislike inlining different modes, From a conceptual use, a user
> might want to switch mode only when changing a function's hotness.
> Usually inlining a cold function into a hot one is not what the user
> explicitly required when setting different mode attributes for them,
> __attribute__((thumb)) should not imply coldness or hotness. Inlining 
> between cold and hot functions should be done based on profile feedback. 
> The choice of compiling in "Thumb1" state for coldness is a separate one 
> because that's where the choice needs to be made.

Ideally yes. but I think that a user motivation to use target attribute
(("thumb") is to reduce code size even in the cases where PFO is not
available (libraries, kernel or user application build spec packaged
without profile data). And there are cases where static probabilities
are not enough and that a user wants it own control with gprof or oprofile.
But in this case, we could point to the __attribute__ (("cold")) on the
function ? That would probably be the best workaround to propose if we
recommend this

But here is another scenario: Using of attribute (("arm")) for exception
entry points is indeed not related to hotness. But consider a typical
thumb binary with an entry point in arm compiled in C (ex handler, a
kernel...). Today due to the file boundary the thumb part is not inlined
into the arm point. (Using -flto is not possible because the whole
gimple would be thumb).

Now, using attribute (("target")) for the functions others than the
entry point, with your approach they would all be inlined (assuming the
cost allow this) and we would end up with a arm binary instead of a
thumb binary...

But there are still 3 points  :

- At least 2 other target (i386, Powerpc) that support attribute_target
disable inlining between modes that are not subsets. I like to think
about homogeneity between targets and I find odd to have different
inlining rules...

- Scanning the function body to check for ASM_INPUT does not look very
elegant (if this matters) because the asm could well be unrelated

The only case when it will always be a win to inline thumb into arm is
when the cost of the inlined body is less than a BX instruction (but
still, with branch prediction this cost is pondered).


>
>> The compiler would take a decision that is not what the user wrote. And
>> in addition if you consider the few instructions to modify R15 to switch
>> state that would end up with more code executed in the critical path,
>> voiding a possible size of speed gain.
> I do not expect there to be any additional instructions needed to switch 
> state. If function x is inlined into function y the state would be lost 
> and the state would be in terms of the state of function x.
Yes, indeed. I was in a LCM/mode-switching thinking mode when writing
this. In this case the mode is inherited.

> Obviously if the user doesn't want inlining - the user would add 
> attributes to disable inlining. You do have extensions such as 
> __attribute__((noinline)) and __attribute__((never_inline)) to give the 
> user that control and those bits need to be used in addition.

Those attributes are overkill. They would disable inlining between
caller-callee of a same mode. This is not what we want

>
> The attribute then purely reflects then the output instruction state of 
> the function if a copy of it's body is laid out separately in the output.
>
> IMHO, the heuristics for inlining should be the best judge of when 
> functions should be inlined between one and another and we shouldn't be 
> second guessing that in the backend
>
> If there is a copy of the function to be put out by the compiler, only 
> then should we choose this based on the state of the "target" i.e. arm 
> or thumb.
>
Yes,

So to summarize, we can:

  1) don't inline between different modes. Same behavior with other
targets. Solves asm case
  2) always inline unless the function contains asm statements. ( I
reject adding a new compilation switch)
  3) always inline. But recommend the use of attribute ((noinline)) to
handle asm or attribute ((cold,hot)) in the absence of profile datas

I obviously prefer 1) safe and  homogenous, 3) is the worse as it
requires additional user action (poor user). 2) is less worse.

Thanks for supporting me ::)

Christian
Richard Earnshaw Oct. 9, 2014, 2:11 p.m. UTC | #5
On 09/10/14 12:35, Christian Bruel wrote:
> 
> On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote:
>> Hi Christian,
> 
> << snipped agreed stuf >>
>> 3) about inlining
>>    I dislike inlining different modes, From a conceptual use, a user
>> might want to switch mode only when changing a function's hotness.
>> Usually inlining a cold function into a hot one is not what the user
>> explicitly required when setting different mode attributes for them,
>> __attribute__((thumb)) should not imply coldness or hotness. Inlining 
>> between cold and hot functions should be done based on profile feedback. 
>> The choice of compiling in "Thumb1" state for coldness is a separate one 
>> because that's where the choice needs to be made.
> 
> Ideally yes. but I think that a user motivation to use target attribute
> (("thumb") is to reduce code size even in the cases where PFO is not
> available (libraries, kernel or user application build spec packaged
> without profile data). And there are cases where static probabilities
> are not enough and that a user wants it own control with gprof or oprofile.
> But in this case, we could point to the __attribute__ (("cold")) on the
> function ? That would probably be the best workaround to propose if we
> recommend this
> 

Hot vs cold is interesting, but arm/thumb shouldn't be used to imply
that.  The days when ARM=fast, thumb=small are in the past now, and
thumb2 code should be both fast and small.  Indeed, smaller thumb2 code
can be faster than larger ARM code simply because you can get more of it
in the cache.  The use of arm vs thumb is likely to be much more subtle now.

> But here is another scenario: Using of attribute (("arm")) for exception
> entry points is indeed not related to hotness. But consider a typical
> thumb binary with an entry point in arm compiled in C (ex handler, a
> kernel...). Today due to the file boundary the thumb part is not inlined
> into the arm point. (Using -flto is not possible because the whole
> gimple would be thumb).
> 

We have no-inline attributes for scenarios like that.  I don't think a
specific use case should dominate other cases.

> Now, using attribute (("target")) for the functions others than the
> entry point, with your approach they would all be inlined (assuming the
> cost allow this) and we would end up with a arm binary instead of a
> thumb binary...
> 
> But there are still 3 points  :
> 
> - At least 2 other target (i386, Powerpc) that support attribute_target
> disable inlining between modes that are not subsets. I like to think
> about homogeneity between targets and I find odd to have different
> inlining rules...
> 

That's because use of specific instructions must not be allowed to leak
past a gating check that is in the caller.  It would be a disaster if a
function that used a neon register, for example, was allowed to leak
into code that might run on a target with no Neon register file.  The
ARM/thumb distinction shouldn't, by default, be limited in that manner.

I believe inlining could happen from a subset of the archtiecture into a
function using a superset, just not vice-versa.

> - Scanning the function body to check for ASM_INPUT does not look very
> elegant (if this matters) because the asm could well be unrelated
> 
> The only case when it will always be a win to inline thumb into arm is
> when the cost of the inlined body is less than a BX instruction (but
> still, with branch prediction this cost is pondered).
> 

One of the problems with not inlining is that the C++ abstraction
penalty is likely to shoot up.  There will be many major lost
optimization opportunities if we start down that path.

So I think inlining should only be disabled if there's some technical
reason why it should be disabled, not because of some 'it might not
always be ideal' feelings.  Furthermore, we should expect users to use
the other attributes consistently when they expect specific behaviours
to occur.

My 2p.

R.

> 
>>
>>> The compiler would take a decision that is not what the user wrote. And
>>> in addition if you consider the few instructions to modify R15 to switch
>>> state that would end up with more code executed in the critical path,
>>> voiding a possible size of speed gain.
>> I do not expect there to be any additional instructions needed to switch 
>> state. If function x is inlined into function y the state would be lost 
>> and the state would be in terms of the state of function x.
> Yes, indeed. I was in a LCM/mode-switching thinking mode when writing
> this. In this case the mode is inherited.
> 
>> Obviously if the user doesn't want inlining - the user would add 
>> attributes to disable inlining. You do have extensions such as 
>> __attribute__((noinline)) and __attribute__((never_inline)) to give the 
>> user that control and those bits need to be used in addition.
> 
> Those attributes are overkill. They would disable inlining between
> caller-callee of a same mode. This is not what we want
> 
>>
>> The attribute then purely reflects then the output instruction state of 
>> the function if a copy of it's body is laid out separately in the output.
>>
>> IMHO, the heuristics for inlining should be the best judge of when 
>> functions should be inlined between one and another and we shouldn't be 
>> second guessing that in the backend
>>
>> If there is a copy of the function to be put out by the compiler, only 
>> then should we choose this based on the state of the "target" i.e. arm 
>> or thumb.
>>
> Yes,
> 
> So to summarize, we can:
> 
>   1) don't inline between different modes. Same behavior with other
> targets. Solves asm case
>   2) always inline unless the function contains asm statements. ( I
> reject adding a new compilation switch)
>   3) always inline. But recommend the use of attribute ((noinline)) to
> handle asm or attribute ((cold,hot)) in the absence of profile datas
> 
> I obviously prefer 1) safe and  homogenous, 3) is the worse as it
> requires additional user action (poor user). 2) is less worse.
> 
> Thanks for supporting me ::)
> 
> Christian
> 
>
Christian Bruel Oct. 10, 2014, 2:18 p.m. UTC | #6
On 10/09/2014 04:11 PM, Richard Earnshaw wrote:
> On 09/10/14 12:35, Christian Bruel wrote:
>> On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote:
>>> Hi Christian,
>> << snipped agreed stuf >>
>>> 3) about inlining
>>>    I dislike inlining different modes, From a conceptual use, a user
>>> might want to switch mode only when changing a function's hotness.
>>> Usually inlining a cold function into a hot one is not what the user
>>> explicitly required when setting different mode attributes for them,
>>> __attribute__((thumb)) should not imply coldness or hotness. Inlining 
>>> between cold and hot functions should be done based on profile feedback. 
>>> The choice of compiling in "Thumb1" state for coldness is a separate one 
>>> because that's where the choice needs to be made.
>> Ideally yes. but I think that a user motivation to use target attribute
>> (("thumb") is to reduce code size even in the cases where PFO is not
>> available (libraries, kernel or user application build spec packaged
>> without profile data). And there are cases where static probabilities
>> are not enough and that a user wants it own control with gprof or oprofile.
>> But in this case, we could point to the __attribute__ (("cold")) on the
>> function ? That would probably be the best workaround to propose if we
>> recommend this
>>
> Hot vs cold is interesting, but arm/thumb shouldn't be used to imply
> that.  The days when ARM=fast, thumb=small are in the past now, and
> thumb2 code should be both fast and small.  Indeed, smaller thumb2 code
> can be faster than larger ARM code simply because you can get more of it
> in the cache.  The use of arm vs thumb is likely to be much more subtle now.

I'm also very interested by this. From my last bench session, ARM mode
could bring a speedup (from noise to 5/6%) but with a very big size
penalty.  So I believe there is room for fine tuning at application
level, and I agree this is very subtle and difficult  this is another
topic. (That was with a GCC 4.8, maybe the gap has reduced since).

>
>> But here is another scenario: Using of attribute (("arm")) for exception
>> entry points is indeed not related to hotness. But consider a typical
>> thumb binary with an entry point in arm compiled in C (ex handler, a
>> kernel...). Today due to the file boundary the thumb part is not inlined
>> into the arm point. (Using -flto is not possible because the whole
>> gimple would be thumb).
>>
> We have no-inline attributes for scenarios like that.  I don't think a
> specific use case should dominate other cases.

That's severe, no-inline attribute would disable inlining same modes !

>
>> Now, using attribute (("target")) for the functions others than the
>> entry point, with your approach they would all be inlined (assuming the
>> cost allow this) and we would end up with a arm binary instead of a
>> thumb binary...
>>
>> But there are still 3 points  :
>>
>> - At least 2 other target (i386, Powerpc) that support attribute_target
>> disable inlining between modes that are not subsets. I like to think
>> about homogeneity between targets and I find odd to have different
>> inlining rules...
>>
> That's because use of specific instructions must not be allowed to leak
> past a gating check that is in the caller.  It would be a disaster if a
> function that used a neon register, for example, was allowed to leak
> into code that might run on a target with no Neon register file.  The
> ARM/thumb distinction shouldn't, by default, be limited in that manner.
>
> I believe inlining could happen from a subset of the archtiecture into a
> function using a superset, just not vice-versa.

I'm afraid I misunderstand this, Do you want inlining from Thumb to a
function using ARM because you consider thumb to be a subset of ARM ?
You know better that I but I never thought that, or maybe there is
something to do with the unified assembler ?

In this case I see the problem under a new light. With the unified
assembly, indeed we could inline from any mode as long as no divide mode
asm inside.

>
>> - Scanning the function body to check for ASM_INPUT does not look very
>> elegant (if this matters) because the asm could well be unrelated
>>
>> The only case when it will always be a win to inline thumb into arm is
>> when the cost of the inlined body is less than a BX instruction (but
>> still, with branch prediction this cost is pondered).
>>
> One of the problems with not inlining is that the C++ abstraction
> penalty is likely to shoot up.  There will be many major lost
> optimization opportunities if we start down that path.

I would never consider users to use extensively this attribute on
inlined member functions. But I take your point

> So I think inlining should only be disabled if there's some technical
> reason why it should be disabled, not because of some 'it might not
> always be ideal' feelings.  Furthermore, we should expect users to use
> the other attributes consistently when they expect specific behaviours
> to occur.

Sure, me also I would have preferred objective benchmarks results, but
its a little bit early to have experience with instrumentation of large
code.

Thanks for your input, it's great to see the problem from all directions
between taking a decision

Christian

> My 2p.
>
> R.
>
>>>> The compiler would take a decision that is not what the user wrote. And
>>>> in addition if you consider the few instructions to modify R15 to switch
>>>> state that would end up with more code executed in the critical path,
>>>> voiding a possible size of speed gain.
>>> I do not expect there to be any additional instructions needed to switch 
>>> state. If function x is inlined into function y the state would be lost 
>>> and the state would be in terms of the state of function x.
>> Yes, indeed. I was in a LCM/mode-switching thinking mode when writing
>> this. In this case the mode is inherited.
>>
>>> Obviously if the user doesn't want inlining - the user would add 
>>> attributes to disable inlining. You do have extensions such as 
>>> __attribute__((noinline)) and __attribute__((never_inline)) to give the 
>>> user that control and those bits need to be used in addition.
>> Those attributes are overkill. They would disable inlining between
>> caller-callee of a same mode. This is not what we want
>>
>>> The attribute then purely reflects then the output instruction state of 
>>> the function if a copy of it's body is laid out separately in the output.
>>>
>>> IMHO, the heuristics for inlining should be the best judge of when 
>>> functions should be inlined between one and another and we shouldn't be 
>>> second guessing that in the backend
>>>
>>> If there is a copy of the function to be put out by the compiler, only 
>>> then should we choose this based on the state of the "target" i.e. arm 
>>> or thumb.
>>>
>> Yes,
>>
>> So to summarize, we can:
>>
>>   1) don't inline between different modes. Same behavior with other
>> targets. Solves asm case
>>   2) always inline unless the function contains asm statements. ( I
>> reject adding a new compilation switch)
>>   3) always inline. But recommend the use of attribute ((noinline)) to
>> handle asm or attribute ((cold,hot)) in the absence of profile datas
>>
>> I obviously prefer 1) safe and  homogenous, 3) is the worse as it
>> requires additional user action (poor user). 2) is less worse.
>>
>> Thanks for supporting me ::)
>>
>> Christian
>>
>>
>
Richard Earnshaw Oct. 10, 2014, 4:12 p.m. UTC | #7
On 10/10/14 15:18, Christian Bruel wrote:
> 
> On 10/09/2014 04:11 PM, Richard Earnshaw wrote:
>> On 09/10/14 12:35, Christian Bruel wrote:
>>> On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote:
>>>> Hi Christian,
>>> << snipped agreed stuf >>
>>>> 3) about inlining
>>>>    I dislike inlining different modes, From a conceptual use, a user
>>>> might want to switch mode only when changing a function's hotness.
>>>> Usually inlining a cold function into a hot one is not what the user
>>>> explicitly required when setting different mode attributes for them,
>>>> __attribute__((thumb)) should not imply coldness or hotness. Inlining 
>>>> between cold and hot functions should be done based on profile feedback. 
>>>> The choice of compiling in "Thumb1" state for coldness is a separate one 
>>>> because that's where the choice needs to be made.
>>> Ideally yes. but I think that a user motivation to use target attribute
>>> (("thumb") is to reduce code size even in the cases where PFO is not
>>> available (libraries, kernel or user application build spec packaged
>>> without profile data). And there are cases where static probabilities
>>> are not enough and that a user wants it own control with gprof or oprofile.
>>> But in this case, we could point to the __attribute__ (("cold")) on the
>>> function ? That would probably be the best workaround to propose if we
>>> recommend this
>>>
>> Hot vs cold is interesting, but arm/thumb shouldn't be used to imply
>> that.  The days when ARM=fast, thumb=small are in the past now, and
>> thumb2 code should be both fast and small.  Indeed, smaller thumb2 code
>> can be faster than larger ARM code simply because you can get more of it
>> in the cache.  The use of arm vs thumb is likely to be much more subtle now.
> 
> I'm also very interested by this. From my last bench session, ARM mode
> could bring a speedup (from noise to 5/6%) but with a very big size
> penalty.  So I believe there is room for fine tuning at application
> level, and I agree this is very subtle and difficult  this is another
> topic. (That was with a GCC 4.8, maybe the gap has reduced since).

It will obviously depend on the precise micro-architecture of the chip
you run the code on; but things are improving all the time (at least,
that's what we aim for :-).  For example, we've recently throttled back
the amount of instruction shortening for hot loops to reduce the number
of instruction dependencies through the PSR flags.


> 
>>
>>> But here is another scenario: Using of attribute (("arm")) for exception
>>> entry points is indeed not related to hotness. But consider a typical
>>> thumb binary with an entry point in arm compiled in C (ex handler, a
>>> kernel...). Today due to the file boundary the thumb part is not inlined
>>> into the arm point. (Using -flto is not possible because the whole
>>> gimple would be thumb).
>>>
>> We have no-inline attributes for scenarios like that.  I don't think a
>> specific use case should dominate other cases.
> 
> That's severe, no-inline attribute would disable inlining same modes !
> 

Yes, but why would that be a problem?  What makes you think that
inlining across modes is inherently wrong, when inlining within the same
mode is not?

>>
>>> Now, using attribute (("target")) for the functions others than the
>>> entry point, with your approach they would all be inlined (assuming the
>>> cost allow this) and we would end up with a arm binary instead of a
>>> thumb binary...
>>>
>>> But there are still 3 points  :
>>>
>>> - At least 2 other target (i386, Powerpc) that support attribute_target
>>> disable inlining between modes that are not subsets. I like to think
>>> about homogeneity between targets and I find odd to have different
>>> inlining rules...
>>>
>> That's because use of specific instructions must not be allowed to leak
>> past a gating check that is in the caller.  It would be a disaster if a
>> function that used a neon register, for example, was allowed to leak
>> into code that might run on a target with no Neon register file.  The
>> ARM/thumb distinction shouldn't, by default, be limited in that manner.
>>
>> I believe inlining could happen from a subset of the archtiecture into a
>> function using a superset, just not vice-versa.
> 
> I'm afraid I misunderstand this, Do you want inlining from Thumb to a
> function using ARM because you consider thumb to be a subset of ARM ?
> You know better that I but I never thought that, or maybe there is
> something to do with the unified assembler ?
> 

No, I think we can fundamentally inline either way, provided that
there's no inline assembly code.  Even then, we could probably get away
with it 90% of the time.

> In this case I see the problem under a new light. With the unified
> assembly, indeed we could inline from any mode as long as no divide mode
> asm inside.
> 

Unfortunately, the two ISAs are not identical and neither is a subset of
the other, despite the use of unified assembly syntax.  For example, ARM
state has RSC, but thumb doesn't.  While Thumb has ORN, but ARM doesn't.
 We can't parse the ASM body to find out what it contains, so we have to
play safe.

[US]DIV isn't an issue once you get to ARMv7ve, since it's in both ISAs.


>>
>>> - Scanning the function body to check for ASM_INPUT does not look very
>>> elegant (if this matters) because the asm could well be unrelated
>>>
>>> The only case when it will always be a win to inline thumb into arm is
>>> when the cost of the inlined body is less than a BX instruction (but
>>> still, with branch prediction this cost is pondered).
>>>
>> One of the problems with not inlining is that the C++ abstraction
>> penalty is likely to shoot up.  There will be many major lost
>> optimization opportunities if we start down that path.
> 
> I would never consider users to use extensively this attribute on
> inlined member functions. But I take your point

It's not necessarily the member functions themselves, but the functions
that call the member functions.

> 
>> So I think inlining should only be disabled if there's some technical
>> reason why it should be disabled, not because of some 'it might not
>> always be ideal' feelings.  Furthermore, we should expect users to use
>> the other attributes consistently when they expect specific behaviours
>> to occur.
> 
> Sure, me also I would have preferred objective benchmarks results, but
> its a little bit early to have experience with instrumentation of large
> code.
> 

I'd be careful of benchmark numbers for this sort of argument.  They
only give a snapshot for a particular point in time; it could well be
that compiler or hw evolution may invalidate your original assumptions.

R.

> Thanks for your input, it's great to see the problem from all directions
> between taking a decision
> 
> Christian
> 
>> My 2p.
>>
>> R.
>>
>>>>> The compiler would take a decision that is not what the user wrote. And
>>>>> in addition if you consider the few instructions to modify R15 to switch
>>>>> state that would end up with more code executed in the critical path,
>>>>> voiding a possible size of speed gain.
>>>> I do not expect there to be any additional instructions needed to switch 
>>>> state. If function x is inlined into function y the state would be lost 
>>>> and the state would be in terms of the state of function x.
>>> Yes, indeed. I was in a LCM/mode-switching thinking mode when writing
>>> this. In this case the mode is inherited.
>>>
>>>> Obviously if the user doesn't want inlining - the user would add 
>>>> attributes to disable inlining. You do have extensions such as 
>>>> __attribute__((noinline)) and __attribute__((never_inline)) to give the 
>>>> user that control and those bits need to be used in addition.
>>> Those attributes are overkill. They would disable inlining between
>>> caller-callee of a same mode. This is not what we want
>>>
>>>> The attribute then purely reflects then the output instruction state of 
>>>> the function if a copy of it's body is laid out separately in the output.
>>>>
>>>> IMHO, the heuristics for inlining should be the best judge of when 
>>>> functions should be inlined between one and another and we shouldn't be 
>>>> second guessing that in the backend
>>>>
>>>> If there is a copy of the function to be put out by the compiler, only 
>>>> then should we choose this based on the state of the "target" i.e. arm 
>>>> or thumb.
>>>>
>>> Yes,
>>>
>>> So to summarize, we can:
>>>
>>>   1) don't inline between different modes. Same behavior with other
>>> targets. Solves asm case
>>>   2) always inline unless the function contains asm statements. ( I
>>> reject adding a new compilation switch)
>>>   3) always inline. But recommend the use of attribute ((noinline)) to
>>> handle asm or attribute ((cold,hot)) in the absence of profile datas
>>>
>>> I obviously prefer 1) safe and  homogenous, 3) is the worse as it
>>> requires additional user action (poor user). 2) is less worse.
>>>
>>> Thanks for supporting me ::)
>>>
>>> Christian
>>>
>>>
>>
> 
>
diff mbox

Patch

2014-09-23  Christian Bruel  <christian.bruel@st.com>

	PR target/52144
	* config/arm/arm.opt (THUMB): Sqve target option.
	* config/arm/arm-protos.h (arm_declare_function_name, arm_valid_target_attribute_tree
	arm_register_target_pragmas, arm_reset_previous_fndecl): Declare.
	* config/arm/arm.c (arm_declare_function_name): Move here.
	add attribute target support.
	(emit_thumb): New boolean.
	(arm_file_start): Set emit_thumb mode.
	(arm_pragma_target_parse): New function.
	(arm_valid_target_attribute_p, arm_valid_target_attribute_tree,
	arm_valid_target_attribute_rec): New functions.
	(arm_can_inline_p): New function.
	(arm_set_current_function, arm_reset_previous_fndecl): New functions.
	(arm_option_override): Split.
	(arm_option_override_internal): New function.
	(TARGET_CAN_INLINE_P, TARGET_SET_CURRENT_FUNCTION,
	TARGET_OPTION_VALID_ATTRIBUTE_P): Define.
	* config/arm/arm-c.c (arm_pragma_target_parse, arm_target_modify_macros,
	arm_pragma_target_parse, arm_register_target_pragmas): New functions.
	* config/arm/arm.h (SWITCHABLE_TARGET): Define.
	(ARM_DECLARE_FUNCTION_NAME): Call arm_declare_function_name.
	(REGISTER_TARGET_PRAGMAS): Call arm_register_target_pragma.
	(TREE_TARGET_THUMB): New macro.
	* doc/extend.texi (arm, thumb): Document target attributes.
	* doc/invoke.texi (arm, thumb): Mention target attributes.

2014-09-23  Christian Bruel  <christian.bruel@st.com>

	PR target/52144
	* gcc.target/arm/attr_thumb.c: New test.

Index: gcc/config/arm/arm-c.c
===================================================================
--- gcc/config/arm/arm-c.c	(revision 215680)
+++ gcc/config/arm/arm-c.c	(working copy)
@@ -20,9 +20,12 @@ 
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
-#include "tm_p.h"
 #include "tree.h"
+#include "tm_p.h"
 #include "c-family/c-common.h"
+#include "target.h"
+#include "target-def.h"
+#include "c-family/c-pragma.h"
 
 /* Output C specific EABI object attributes.  These can not be done in
    arm.c because they require information from the C frontend.  */
@@ -42,3 +45,109 @@ 
 {
   arm_lang_output_object_attributes_hook = arm_output_c_attributes;
 }
+
+
+/* Define or undefine macros based on the current target.  If the user does
+   #pragma GCC target, we need to adjust the macros dynamically.  */
+
+static void
+arm_target_modify_macros (bool thumb_p)
+{
+ if (thumb_p)
+   {
+     cpp_define (parse_in, "__thumb__");
+     if (arm_arch_thumb2)
+       cpp_define (parse_in, "__thumb2__");
+     if (TARGET_BIG_END)
+       cpp_define (parse_in, "__THUMBEB__");
+     else
+       cpp_define (parse_in, "__THUMBEL__");
+   }
+ else
+   {
+     cpp_undef (parse_in, "__thumb__");
+     if (arm_arch_thumb2)
+       cpp_undef (parse_in, "__thumb2__");
+     if (TARGET_BIG_END)
+       cpp_undef (parse_in, "__THUMBEB__");
+     else
+       cpp_undef (parse_in, "__THUMBEL__");
+   }
+
+}
+
+/* Hook to validate the current #pragma GCC target and set the FPU custom
+   code option state.  If ARGS is NULL, then POP_TARGET is used to reset
+   the options.  */
+static bool
+arm_pragma_target_parse (tree args, tree pop_target)
+{
+  tree prev_tree = build_target_option_node (&global_options);
+  tree cur_tree;
+  struct cl_target_option *prev_opt;
+  struct cl_target_option *cur_opt;
+  bool cur_mode, prev_mode;
+
+  if (! args)
+    {
+      cur_tree = ((pop_target) ? pop_target : target_option_default_node);
+      cl_target_option_restore (&global_options,
+				TREE_TARGET_OPTION (cur_tree));
+    }
+  else
+    {
+      cur_tree = arm_valid_target_attribute_tree (args,  &global_options);
+      if (cur_tree == NULL_TREE)
+	{
+	  cl_target_option_restore (&global_options,
+				    TREE_TARGET_OPTION (prev_tree));
+	  return false;
+	}
+    }
+
+  target_option_current_node = cur_tree;
+  arm_reset_previous_fndecl ();
+
+  /* Figure out the previous mode.  */
+  prev_opt  = TREE_TARGET_OPTION (prev_tree);
+  cur_opt   = TREE_TARGET_OPTION (cur_tree);
+
+  gcc_assert (prev_opt);
+  gcc_assert (cur_opt);
+
+  cur_mode = TREE_TARGET_THUMB (cur_opt);
+  prev_mode = TREE_TARGET_THUMB (prev_opt);
+
+  if (prev_mode != cur_mode)
+    {
+      /* For the definitions, ensure all newly defined macros are considered
+	 as used for -Wunused-macros.  There is no point warning about the
+	 compiler predefined macros.  */
+      cpp_options *cpp_opts = cpp_get_options (parse_in);
+      unsigned char saved_warn_unused_macros = cpp_opts->warn_unused_macros;
+      cpp_opts->warn_unused_macros = 0;
+
+      /* Update macros.  */
+      arm_target_modify_macros (cur_mode);
+
+      cpp_opts->warn_unused_macros = saved_warn_unused_macros;
+    }
+
+  return true;
+}
+
+/* Register target pragmas.  We need to add the hook for parsing #pragma GCC
+   option here rather than in arm.c since it will pull in various preprocessor
+   functions, and those are not present in languages like fortran without a
+   preprocessor.  */
+
+void
+arm_register_target_pragmas (void)
+{
+  /* Update pragma hook to allow parsing #pragma GCC target.  */
+  targetm.target_option.pragma_parse = arm_pragma_target_parse;
+
+#ifdef REGISTER_SUBTARGET_PRAGMAS
+  REGISTER_SUBTARGET_PRAGMAS ();
+#endif
+}
Index: gcc/config/arm/arm-protos.h
===================================================================
--- gcc/config/arm/arm-protos.h	(revision 215680)
+++ gcc/config/arm/arm-protos.h	(working copy)
@@ -30,6 +30,7 @@ 
 extern int arm_volatile_func (void);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
+extern void arm_declare_function_name (FILE *, const char *, tree);
 extern void thumb2_expand_return (bool);
 extern const char *arm_strip_name_encoding (const char *);
 extern void arm_asm_output_labelref (FILE *, const char *);
@@ -209,14 +210,13 @@ 
 extern int arm_dllimport_p (tree);
 extern void arm_mark_dllexport (tree);
 extern void arm_mark_dllimport (tree);
+extern tree arm_valid_target_attribute_tree (tree, struct gcc_options *);
 #endif
 
 extern void arm_pr_long_calls (struct cpp_reader *);
 extern void arm_pr_no_long_calls (struct cpp_reader *);
 extern void arm_pr_long_calls_off (struct cpp_reader *);
 
-extern void arm_lang_object_attributes_init(void);
-
 extern const char *arm_mangle_type (const_tree);
 
 extern void arm_order_regs_for_local_alloc (void);
@@ -301,9 +301,15 @@ 
 
 extern void arm_emit_eabi_attribute (const char *, int, int);
 
+extern void arm_reset_previous_fndecl (void);
+
 /* Defined in gcc/common/config/arm-common.c.  */
 extern const char *arm_rewrite_selected_cpu (const char *name);
 
+/* Defined in gcc/common/config/arm-c.c.  */
+extern void arm_lang_object_attributes_init(void);
+extern void arm_register_target_pragmas (void);
+
 extern bool arm_is_constant_pool_ref (rtx);
 
 #endif /* ! GCC_ARM_PROTOS_H */
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 215680)
+++ gcc/config/arm/arm.c	(working copy)
@@ -61,6 +61,7 @@ 
 #include "opts.h"
 #include "dumpfile.h"
 #include "gimple-expr.h"
+#include "target-globals.h"
 #include "builtins.h"
 #include "tm-constrs.h"
 
@@ -238,6 +239,9 @@ 
 static void arm_expand_builtin_va_start (tree, rtx);
 static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
 static void arm_option_override (void);
+static void arm_set_current_function (tree);
+static bool arm_can_inline_p (tree, tree);
+static bool arm_valid_target_attribute_p (tree, tree, tree, int);
 static unsigned HOST_WIDE_INT arm_shift_truncation_mask (enum machine_mode);
 static bool arm_cannot_copy_insn_p (rtx_insn *);
 static int arm_issue_rate (void);
@@ -378,6 +382,9 @@ 
 #undef  TARGET_ASM_FUNCTION_EPILOGUE
 #define TARGET_ASM_FUNCTION_EPILOGUE arm_output_function_epilogue
 
+#undef TARGET_CAN_INLINE_P
+#define TARGET_CAN_INLINE_P arm_can_inline_p
+
 #undef  TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE arm_option_override
 
@@ -390,6 +397,12 @@ 
 #undef  TARGET_SCHED_ADJUST_COST
 #define TARGET_SCHED_ADJUST_COST arm_adjust_cost
 
+#undef TARGET_SET_CURRENT_FUNCTION
+#define TARGET_SET_CURRENT_FUNCTION arm_set_current_function
+
+#undef TARGET_OPTION_VALID_ATTRIBUTE_P
+#define TARGET_OPTION_VALID_ATTRIBUTE_P arm_valid_target_attribute_p
+
 #undef TARGET_SCHED_REORDER
 #define TARGET_SCHED_REORDER arm_sched_reorder
 
@@ -2552,7 +2565,104 @@ 
 
 /* Fix up any incompatible options that the user has specified.  */
 static void
-arm_option_override (void)
+arm_option_override_internal (void)
+{
+  /* Make sure that the processor choice does not conflict with any of the
+     other command line choices.  */
+  if (TARGET_ARM && !(insn_flags & FL_NOTM))
+    error ("target CPU does not support ARM mode");
+
+  /* BPABI targets use linker tricks to allow interworking on cores
+     without thumb support.  */
+  if (TARGET_INTERWORK && !((insn_flags & FL_THUMB) || TARGET_BPABI))
+    {
+      warning (0, "target CPU does not support interworking" );
+      target_flags &= ~MASK_INTERWORK;
+    }
+
+  if (TARGET_THUMB && !(insn_flags & FL_THUMB))
+    {
+      warning (0, "target CPU does not support THUMB instructions");
+      target_flags &= ~MASK_THUMB;
+    }
+
+  if (TARGET_APCS_FRAME && TARGET_THUMB)
+    {
+      /* warning (0, "ignoring -mapcs-frame because -mthumb was used"); */
+      target_flags &= ~MASK_APCS_FRAME;
+    }
+
+  /* Callee super interworking implies thumb interworking.  Adding
+     this to the flags here simplifies the logic elsewhere.  */
+  if (TARGET_THUMB && TARGET_CALLEE_INTERWORKING)
+    target_flags |= MASK_INTERWORK;
+
+  /* TARGET_BACKTRACE calls leaf_function_p, which causes a crash if done
+     from here where no function is being compiled currently.  */
+  if ((TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME) && TARGET_ARM)
+    warning (0, "enabling backtrace support is only meaningful when compiling for the Thumb");
+
+  if (TARGET_ARM && TARGET_CALLEE_INTERWORKING)
+    warning (0, "enabling callee interworking support is only meaningful when compiling for the Thumb");
+
+  /* If this target is normally configured to use APCS frames, warn if they
+     are turned off and debugging is turned on.  */
+  if (TARGET_ARM
+      && write_symbols != NO_DEBUG
+      && !TARGET_APCS_FRAME
+      && (TARGET_DEFAULT & MASK_APCS_FRAME))
+    warning (0, "-g with -mno-apcs-frame may not give sensible debugging");
+
+  thumb_code = TARGET_ARM == 0;
+
+  if (arm_restrict_it == 2)
+    arm_restrict_it = arm_arch8 && TARGET_THUMB2;
+
+  if (!TARGET_THUMB2)
+    arm_restrict_it = 0;
+
+  /* If we are not using the default (ARM mode) section anchor offset
+     ranges, then set the correct ranges now.  */
+  if (TARGET_THUMB2)
+    {
+      /* The minimum is set such that the total size of the block
+         for a particular anchor is 248 + 1 + 4095 bytes, which is
+         divisible by eight, ensuring natural spacing of anchors.  */
+      targetm.min_anchor_offset = -248;
+      targetm.max_anchor_offset = 4095;
+    }
+
+  /* iWMMXt unsupported under Thumb mode.  */
+  if (TARGET_THUMB && TARGET_IWMMXT)
+    error ("iWMMXt unsupported under Thumb mode");
+
+  if (!TARGET_ARM && TARGET_VXWORKS_RTP && flag_pic)
+    {
+      error ("RTP PIC is incompatible with Thumb");
+      flag_pic = 0;
+    }
+
+  if (optimize_size)
+    {
+      /* If optimizing for size, bump the number of instructions that we
+         are prepared to conditionally execute (even on a StrongARM).  */
+      max_insns_skipped = 6;
+
+      /* For THUMB2, we limit the conditional sequence to one IT block.  */
+      if (TARGET_THUMB2)
+	max_insns_skipped = MAX_INSN_PER_IT_BLOCK;
+    }
+  else
+    max_insns_skipped = current_tune->max_insns_skipped;
+
+  /* Disable shrink-wrap when optimizing function for size, since it tends to
+     generate additional returns.  */
+  if (optimize_function_for_size_p (cfun) && TARGET_THUMB2)
+    flag_shrink_wrap = false;
+}
+
+static void
+arm_option_override  (void)
 {
   if (global_options_set.x_arm_arch_option)
     arm_selected_arch = &all_architectures[arm_arch_option];
@@ -2692,43 +2802,7 @@ 
   tune_flags = arm_selected_tune->flags;
   current_tune = arm_selected_tune->tune;
 
-  /* Make sure that the processor choice does not conflict with any of the
-     other command line choices.  */
-  if (TARGET_ARM && !(insn_flags & FL_NOTM))
-    error ("target CPU does not support ARM mode");
-
-  /* BPABI targets use linker tricks to allow interworking on cores
-     without thumb support.  */
-  if (TARGET_INTERWORK && !((insn_flags & FL_THUMB) || TARGET_BPABI))
-    {
-      warning (0, "target CPU does not support interworking" );
-      target_flags &= ~MASK_INTERWORK;
-    }
-
-  if (TARGET_THUMB && !(insn_flags & FL_THUMB))
-    {
-      warning (0, "target CPU does not support THUMB instructions");
-      target_flags &= ~MASK_THUMB;
-    }
-
-  if (TARGET_APCS_FRAME && TARGET_THUMB)
-    {
-      /* warning (0, "ignoring -mapcs-frame because -mthumb was used"); */
-      target_flags &= ~MASK_APCS_FRAME;
-    }
-
-  /* Callee super interworking implies thumb interworking.  Adding
-     this to the flags here simplifies the logic elsewhere.  */
-  if (TARGET_THUMB && TARGET_CALLEE_INTERWORKING)
-    target_flags |= MASK_INTERWORK;
-
-  /* TARGET_BACKTRACE calls leaf_function_p, which causes a crash if done
-     from here where no function is being compiled currently.  */
-  if ((TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME) && TARGET_ARM)
-    warning (0, "enabling backtrace support is only meaningful when compiling for the Thumb");
-
-  if (TARGET_ARM && TARGET_CALLEE_INTERWORKING)
-    warning (0, "enabling callee interworking support is only meaningful when compiling for the Thumb");
+  arm_option_override_internal ();
 
   if (TARGET_APCS_STACK && !TARGET_APCS_FRAME)
     {
@@ -2745,14 +2819,6 @@ 
   if (TARGET_APCS_REENT)
     warning (0, "APCS reentrant code not supported.  Ignored");
 
-  /* If this target is normally configured to use APCS frames, warn if they
-     are turned off and debugging is turned on.  */
-  if (TARGET_ARM
-      && write_symbols != NO_DEBUG
-      && !TARGET_APCS_FRAME
-      && (TARGET_DEFAULT & MASK_APCS_FRAME))
-    warning (0, "-g with -mno-apcs-frame may not give sensible debugging");
-
   if (TARGET_APCS_FLOAT)
     warning (0, "passing floating point arguments in fp regs not yet supported");
 
@@ -2774,9 +2840,8 @@ 
 
   arm_ld_sched = (tune_flags & FL_LDSCHED) != 0;
   arm_tune_strongarm = (tune_flags & FL_STRONG) != 0;
-  thumb_code = TARGET_ARM == 0;
-  thumb1_code = TARGET_THUMB1 != 0;
   arm_tune_wbuf = (tune_flags & FL_WBUF) != 0;
+  thumb1_code = TARGET_THUMB1 != 0;
   arm_tune_xscale = (tune_flags & FL_XSCALE) != 0;
   arm_arch_iwmmxt = (insn_flags & FL_IWMMXT) != 0;
   arm_arch_iwmmxt2 = (insn_flags & FL_IWMMXT2) != 0;
@@ -2784,11 +2849,6 @@ 
   arm_arch_arm_hwdiv = (insn_flags & FL_ARM_DIV) != 0;
   arm_tune_cortex_a9 = (arm_tune == cortexa9) != 0;
   arm_arch_crc = (insn_flags & FL_CRC32) != 0;
-  if (arm_restrict_it == 2)
-    arm_restrict_it = arm_arch8 && TARGET_THUMB2;
-
-  if (!TARGET_THUMB2)
-    arm_restrict_it = 0;
 
   /* If we are not using the default (ARM mode) section anchor offset
      ranges, then set the correct ranges now.  */
@@ -2802,14 +2862,6 @@ 
       targetm.min_anchor_offset = 0;
       targetm.max_anchor_offset = 127;
     }
-  else if (TARGET_THUMB2)
-    {
-      /* The minimum is set such that the total size of the block
-         for a particular anchor is 248 + 1 + 4095 bytes, which is
-         divisible by eight, ensuring natural spacing of anchors.  */
-      targetm.min_anchor_offset = -248;
-      targetm.max_anchor_offset = 4095;
-    }
 
   /* V5 code we generate is completely interworking capable, so we turn off
      TARGET_INTERWORK here to avoid many tests later on.  */
@@ -2872,10 +2924,6 @@ 
   if (TARGET_IWMMXT && TARGET_NEON)
     error ("iWMMXt and NEON are incompatible");
 
-  /* iWMMXt unsupported under Thumb mode.  */
-  if (TARGET_THUMB && TARGET_IWMMXT)
-    error ("iWMMXt unsupported under Thumb mode");
-
   /* __fp16 support currently assumes the core has ldrh.  */
   if (!arm_arch4 && arm_fp16_format != ARM_FP16_FORMAT_NONE)
     sorry ("__fp16 and no ldrh");
@@ -2945,12 +2993,6 @@ 
 	}
     }
 
-  if (!TARGET_ARM && TARGET_VXWORKS_RTP && flag_pic)
-    {
-      error ("RTP PIC is incompatible with Thumb");
-      flag_pic = 0;
-    }
-
   /* If stack checking is disabled, we can use r10 as the PIC register,
      which keeps r9 available.  The EABI specifies r9 as the PIC register.  */
   if (flag_pic && TARGET_SINGLE_PIC_BASE)
@@ -3024,19 +3066,6 @@ 
       flag_schedule_insns = 0;
     }
 
-  if (optimize_size)
-    {
-      /* If optimizing for size, bump the number of instructions that we
-         are prepared to conditionally execute (even on a StrongARM).  */
-      max_insns_skipped = 6;
-
-      /* For THUMB2, we limit the conditional sequence to one IT block.  */
-      if (TARGET_THUMB2)
-	max_insns_skipped = MAX_INSN_PER_IT_BLOCK;
-    }
-  else
-    max_insns_skipped = current_tune->max_insns_skipped;
-
   /* Hot/Cold partitioning is not currently supported, since we can't
      handle literal pool placement in that case.  */
   if (flag_reorder_blocks_and_partition)
@@ -3098,10 +3127,6 @@ 
                          global_options.x_param_values,
                          global_options_set.x_param_values);
 
-  /* Disable shrink-wrap when optimizing function for size, since it tends to
-     generate additional returns.  */
-  if (optimize_function_for_size_p (cfun) && TARGET_THUMB2)
-    flag_shrink_wrap = false;
   /* TBD: Dwarf info for apcs frame is not handled yet.  */
   if (TARGET_APCS_FRAME)
     flag_shrink_wrap = false;
@@ -3118,6 +3143,11 @@ 
 
   /* Register global variables with the garbage collector.  */
   arm_add_gc_roots ();
+
+  /* Save the initial options in case the user does function specific
+     options.  */
+  target_option_default_node = target_option_current_node
+    = build_target_option_node (&global_options);
 }
 
 static void
@@ -28356,13 +28386,19 @@ 
   asm_fprintf (asm_out_file, "\n");
 }
 
+/* Allow interwork inside a module.  */
+static bool emit_thumb = false;
+
 static void
 arm_file_start (void)
 {
   int val;
 
   if (TARGET_UNIFIED_ASM)
-    asm_fprintf (asm_out_file, "\t.syntax unified\n");
+    {
+      asm_fprintf (asm_out_file, "\t.syntax unified\n");
+      emit_thumb = true;
+    }
 
   if (TARGET_BPABI)
     {
@@ -32289,4 +32325,250 @@ 
 	  && CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)));
 }
 
+/* Remember the last target of arm_set_current_function.  */
+static GTY(()) tree arm_previous_fndecl;
+
+/* Invalidate arm_previous_fndecl.  */
+void
+arm_reset_previous_fndecl (void)
+{
+  arm_previous_fndecl = NULL_TREE;
+}
+
+/* Establish appropriate back-end context for processing the function
+   FNDECL.  The argument might be NULL to indicate processing at top
+   level, outside of any function scope.  */
+static void
+arm_set_current_function (tree fndecl)
+{
+  if (! fndecl || fndecl == arm_previous_fndecl)
+    return;
+
+  tree old_tree = (arm_previous_fndecl
+		   ? DECL_FUNCTION_SPECIFIC_TARGET (arm_previous_fndecl)
+		   : NULL_TREE);
+
+  tree new_tree = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
+
+  arm_previous_fndecl = fndecl;
+  if (old_tree == new_tree)
+    ;
+
+  else if (new_tree)
+    {
+      cl_target_option_restore (&global_options,
+				TREE_TARGET_OPTION (new_tree));
+
+      if (TREE_TARGET_GLOBALS (new_tree))
+	restore_target_globals (TREE_TARGET_GLOBALS (new_tree));
+      else
+	TREE_TARGET_GLOBALS (new_tree)
+	  = save_target_globals_default_opts ();
+    }
+
+  else if (old_tree)
+    {
+      new_tree = target_option_current_node;
+      cl_target_option_restore (&global_options,
+				TREE_TARGET_OPTION (new_tree));
+      if (TREE_TARGET_GLOBALS (new_tree))
+	restore_target_globals (TREE_TARGET_GLOBALS (new_tree));
+      else if (new_tree == target_option_default_node)
+	restore_target_globals (&default_target_globals);
+      else
+	TREE_TARGET_GLOBALS (new_tree)
+	  = save_target_globals_default_opts ();
+    }
+
+  arm_option_override_internal ();
+}
+
+/* Hook to determine if one function can safely inline another.  */
+
+static bool
+arm_can_inline_p (tree caller, tree callee)
+{
+  tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
+  tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
+
+  if (!callee_tree && !caller_tree)
+    return true;
+
+  /* If caller or callee has no option attributes, but it is ok to
+     inline if they don't change mode.  */
+  else if (!callee_tree)
+    callee_tree = target_option_default_node;
+
+  else if (!caller_tree)
+    caller_tree = target_option_default_node;
+
+  struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
+  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+
+  /* If both caller and callee have attributes, assume that if the
+     pointer is different, the two functions have different target
+     options since build_target_option_node uses a hash table for the
+     options.  */
+  gcc_assert (callee_opts);
+  gcc_assert (caller_opts);
+  return TREE_TARGET_THUMB (callee_opts) == TREE_TARGET_THUMB (caller_opts);
+}
+
+/* Inner function to process the attribute((target(...))), take an argument and
+   set the current options from the argument.  If we have a list, recursively
+   go over the list.  */
+
+static bool
+arm_valid_target_attribute_rec (tree args,  struct gcc_options *opts)
+{
+  if (TREE_CODE (args) == TREE_LIST)
+    {
+      bool ret = true;
+      for (; args; args = TREE_CHAIN (args))
+	if (TREE_VALUE (args)
+	    && !arm_valid_target_attribute_rec (TREE_VALUE (args), opts))
+	  ret = false;
+      return ret;
+    }
+
+  else if (TREE_CODE (args) != STRING_CST)
+    {
+      error ("attribute %<target%> argument not a string");
+      return false;
+    }
+
+  char *argstr = ASTRDUP (TREE_STRING_POINTER (args));
+  while (argstr && *argstr != '\0')
+    {
+      while (ISSPACE (*argstr))
+	argstr++;
+
+      if (!strncmp (argstr, "thumb", 5))
+	{
+	  if (! arm_arch_thumb2)
+	    {
+	      warning (0, "`target thumb only available for thumb2");
+	      return false;
+	    }
+
+	  opts->x_target_flags |= MASK_THUMB;
+	  return true;
+	}
+
+      if (!strncmp (argstr, "arm", 3))
+	{
+	  if (! arm_arch_thumb2)
+	    {
+	      warning (0, "`target arm only available for thumb2");
+	      return false;
+	    }
+
+	  opts->x_target_flags &= ~MASK_THUMB;
+	  return true;
+	}
+
+      warning (0, "attribute(target(\"%s\")) is unknown", argstr);
+      return false;
+    }
+
+  return false;
+}
+
+/* Return a TARGET_OPTION_NODE tree of the target options listed or NULL.  */
+
+tree
+arm_valid_target_attribute_tree (tree args, struct gcc_options *opts)
+{
+  if (!arm_valid_target_attribute_rec (args, opts))
+    return NULL_TREE;
+
+  return build_target_option_node (opts);
+}
+
+/* Hook to validate attribute((target("string"))).  */
+
+static bool
+arm_valid_target_attribute_p (tree fndecl, tree ARG_UNUSED (name),
+			      tree args, int ARG_UNUSED (flags))
+{
+  tree cur_tree, new_optimize;
+  struct gcc_options func_options;
+  gcc_assert ((fndecl != NULL_TREE) && (args != NULL_TREE));
+
+  tree old_optimize = build_optimization_node (&global_options);
+
+  /* Get the optimization options of the current function.  */
+  tree func_optimize = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl);
+
+  if (!func_optimize)
+    func_optimize = old_optimize;
+
+  /* Init func_options.  */
+  memset (&func_options, 0, sizeof (func_options));
+  init_options_struct (&func_options, NULL);
+  lang_hooks.init_options_struct (&func_options);
+
+  /* Initialize func_options to the defaults.  */
+  cl_optimization_restore (&func_options,
+			   TREE_OPTIMIZATION (func_optimize));
+
+  cl_target_option_restore (&func_options,
+			    TREE_TARGET_OPTION (target_option_default_node));
+
+  /* Set func_optionsflags with new target mode.  */
+  cur_tree = arm_valid_target_attribute_tree (args, &func_options);
+
+  if (cur_tree == NULL_TREE)
+    return false;
+
+  new_optimize = build_optimization_node (&func_options);
+
+  if (fndecl)
+    {
+      DECL_FUNCTION_SPECIFIC_TARGET (fndecl) = cur_tree;
+
+      if (old_optimize != new_optimize)
+	DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl) = new_optimize;
+    }
+
+  return true;
+}
+
+void
+arm_declare_function_name (FILE *stream, const char *name, tree decl)
+{
+  if (TARGET_THUMB)
+    {
+      if (! emit_thumb && TARGET_UNIFIED_ASM)
+	{
+	  emit_thumb = true;
+	  fprintf (stream, "\t.syntax unified\n");
+	}
+
+      if (is_called_in_ARM_mode (decl)
+	  || (TARGET_THUMB1 && !TARGET_THUMB1_ONLY
+	      && cfun->is_thunk))
+	fprintf (stream, "\t.code 32\n");
+      else
+	{
+	  if (TARGET_THUMB1)
+	    fprintf (stream, "\t.code\t16\n\t.thumb_func\n");
+	  else
+	    fprintf (stream, "\t.thumb\n\t.thumb_func\n");
+	}
+    }
+  else if (emit_thumb)
+    {
+      emit_thumb = false;
+      fprintf (stream, "\t.syntax divided\n");
+      fprintf (stream, "\t.arm\n");
+    }
+
+  if (TARGET_POKE_FUNCTION_NAME)
+    arm_poke_function_name (stream, (const char *) name);
+}
+
+
+
 #include "gt-arm.h"
+
Index: gcc/config/arm/arm.h
===================================================================
--- gcc/config/arm/arm.h	(revision 215680)
+++ gcc/config/arm/arm.h	(working copy)
@@ -247,6 +247,9 @@ 
 #define SUBTARGET_CPP_SPEC      ""
 #endif
 
+/* Tree Target Specification.  */
+#define TREE_TARGET_THUMB(opt) (opt->x_target_flags & MASK_THUMB)
+
 /* Run-time Target Specification.  */
 #define TARGET_SOFT_FLOAT		(arm_float_abi == ARM_FLOAT_ABI_SOFT)
 /* Use hardware floating point instructions. */
@@ -2118,7 +2121,8 @@ 
   c_register_pragma (0, "long_calls", arm_pr_long_calls);		\
   c_register_pragma (0, "no_long_calls", arm_pr_no_long_calls);		\
   c_register_pragma (0, "long_calls_off", arm_pr_long_calls_off);	\
-  arm_lang_object_attributes_init(); \
+  arm_lang_object_attributes_init();                                    \
+  arm_register_target_pragmas ();                                       \
 } while (0)
 
 /* Condition code information.  */
@@ -2199,23 +2203,7 @@ 
    ? 1 : 0)
 
 #define ARM_DECLARE_FUNCTION_NAME(STREAM, NAME, DECL) 	\
-  do							\
-    {							\
-      if (TARGET_THUMB) 				\
-        {						\
-          if (is_called_in_ARM_mode (DECL)		\
-	      || (TARGET_THUMB1 && !TARGET_THUMB1_ONLY	\
-		  && cfun->is_thunk))	\
-            fprintf (STREAM, "\t.code 32\n") ;		\
-          else if (TARGET_THUMB1)			\
-           fprintf (STREAM, "\t.code\t16\n\t.thumb_func\n") ;	\
-          else						\
-           fprintf (STREAM, "\t.thumb\n\t.thumb_func\n") ;	\
-        }						\
-      if (TARGET_POKE_FUNCTION_NAME)			\
-        arm_poke_function_name (STREAM, (const char *) NAME);	\
-    }							\
-  while (0)
+  arm_declare_function_name ((STREAM), (NAME), (DECL));
 
 /* For aliases of functions we use .thumb_set instead.  */
 #define ASM_OUTPUT_DEF_FROM_DECLS(FILE, DECL1, DECL2)		\
@@ -2390,4 +2378,8 @@ 
 
 #define DRIVER_SELF_SPECS MCPU_MTUNE_NATIVE_SPECS
 #define TARGET_SUPPORTS_WIDE_INT 1
+
+/* For switching between functions with different target attributes.  */
+#define SWITCHABLE_TARGET 1
+
 #endif /* ! GCC_ARM_H */
Index: gcc/config/arm/arm.opt
===================================================================
--- gcc/config/arm/arm.opt	(revision 215680)
+++ gcc/config/arm/arm.opt	(working copy)
@@ -186,7 +186,7 @@ 
 Specify the minimum bit alignment of structures
 
 mthumb
-Target Report RejectNegative Mask(THUMB)
+Target Report RejectNegative Mask(THUMB) Save
 Generate code for Thumb state
 
 mthumb-interwork
Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 215680)
+++ gcc/doc/extend.texi	(working copy)
@@ -3938,9 +3938,25 @@ 
 with a comma (@samp{,}).
 
 The @code{target} attribute is presently implemented for
-i386/x86_64, PowerPC, and Nios II targets only.
+ARM, i386/x86_64, PowerPC, and Nios II targets only.
 The options supported are specific to each target.
 
+for ARM, the following options are allowed for post-ARMv6:
+
+@table @samp
+@item thumb
+@cindex @code{target("thumb")} attribute
+Force Thumb code generation. Only available for architectures supporting Thumb2.
+
+@item arm
+@cindex @code{target("arm")} attribute
+Force ARM code generation. Only available for architectures supporting Thumb2.
+@end table
+
+The inliner does not inline a function that has different mode than the caller.
+For example a function declared with @code{target("arm")} will not be inlined if the file
+is compiled with @option{-mthumb}.
+
 On the 386, the following options are allowed:
 
 @table @samp
@@ -17403,7 +17419,7 @@ 
 @code{target} attribute and the attribute syntax.
 
 The @code{#pragma GCC target} pragma is presently implemented for
-i386/x86_64, PowerPC, and Nios II targets only.
+ARM, i386/x86_64, PowerPC, and Nios II targets only.
 @end table
 
 @table @code
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 215680)
+++ gcc/doc/invoke.texi	(working copy)
@@ -12809,6 +12809,10 @@ 
 configuring GCC with the @option{--with-mode=}@var{state}
 configure option.
 
+You can also override the ARM and Thumb mode for each function
+by using the @code{target("thumb")} and @code{target("arm")} function attributes
+(@pxref{Function Attributes}) or pragmas (@pxref{Function Specific Option Pragmas}).
+
 @item -mtpcs-frame
 @opindex mtpcs-frame
 Generate a stack frame that is compliant with the Thumb Procedure Call
Index: gcc/testsuite/gcc.target/arm/attr_thumb.c
===================================================================
--- gcc/testsuite/gcc.target/arm/attr_thumb.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/attr_thumb.c	(working copy)
@@ -0,0 +1,9 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { scan-assembler ".thumb" } } */
+
+void __attribute__((target("thumb")))
+foo(int argc)
+{
+}
+