Patchwork [ARM] Scheduling VFP/NEON to core transfers (ping, RFC)

login
register
mail settings
Submitter Julian Brown
Date July 17, 2012, 2:14 p.m.
Message ID <20120717151438.29a2443e@octopus>
Download mbox | patch
Permalink /patch/171437/
State New
Headers show

Comments

Julian Brown - July 17, 2012, 2:14 p.m.
Hi,

Whilst looking at upstreaming some of CodeSourcery/Mentor's old
patches, I came across the following again, originally by Mark Shinwell
but last sent upstream by Andrew Stubbs:

http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html

I've attached a new version, re-tested but not re-benchmarked (I'm not
set up to do the latter just at the moment).

I spent a while being quite unconvinced that this patch was a
reasonable solution for the problem (of scheduling transfers between the
NEON or VFP and the core unit) -- either on Cortex-A8, which the patch
was originally supposed to target, or any other core version. So, this
is what I came up with to try to convince myself.

Handling of these transfers is unfortunately dealt with rather
haphazardly on various cores. I only looked in any detail at Cortex-A8
and Cortex-A9. FAOD, I think we're principally concerned here with
instructions like "movdi_vfp", which may be used to transfer either
"VFP" or "NEON" register values to and from core registers, but also
with movsf and friends for the "runfast"-mode case.

* On Cortex-A8, instructions with "f_2_r" (nor "r_2_f") type do not
  participate in scheduling at all. This was probably the root of the
  problem the patch was trying to fix originally: this means that
  there is nothing to stop instructions which use the vfp/neon-to-core
  result from being placed directly after the transfer. (A scheduler
  dump shows "nothing" for the unit utilisation for these types of
  instruction at present.)

* On Cortex-A9, the "f_2_r" and "r_2_f" types both use the (core) insn
  reservation "cortex_a9_fps", which produces its result after 2
  cycles. The A9 TRM on infocentre.arm.com says that VFP-to-core
  transfers take 3 cycles, and NEON-to-core transfers take 11 cycles.

* The cortex-r4f.md and cortex-m4-fpu.md scheduler descriptions seem to
  handle f_2_r and r_2_f sanely.

So, there are generally multiple DFA descriptions in play at any given
time, relating to different core units with different sets of registers,
and transferring values between those cores isn't really being
described properly to GCC's scheduler in a consistent way at present.

Fixing this "properly", e.g. fully describing the way the NEON
unit is decoupled from the core unit on the Cortex-A8, is probably
impractical. But, the only cases we're concerned with at present are
transfers of values from one unit to another. I think setting both
"type" and "neon_type" for transfer instructions is therefore the
right thing to do -- although given the above, and other restrictions
(i.e. the mutual exclusion between VFP and NEON units on Cortex-A9),
it's probably not a complete solution to the problem.

To expand:

* "type" should be set to f_2_r for MRC-type instructions because of
  the dependency on an earlier instruction setting the input register.

* "neon_type" should be set appropriately for MRC-type instructions
  because following NEON instructions will have dependencies on the
  transferred value.

* ...and vice-versa for MCR-type instructions.

Incidentally other patches, probably merged from our source base, have
introduced "neon_type" alongside "type" for a couple of instructions in
vfp.md already: namely, *movdi_vfp and *movhf_vfp_neon: whatever we
choose to do, we should be consistent.

So: OK to apply?

Thanks,

Julian

2011-02-22  Andrew Stubbs  <ams@codesourcery.com>
	    Mark Shinwell  <shinwell@codesourcery.com>
            Julian Brown  <julian@codesourcery.com>

    gcc/
    * config/arm/vfp.md (*arm_movsi_vfp, *thumb2_movsi_vfp)
    (*movdi_vfp_cortexa8, *movsf_vfp, *thumb2_movsf_vfp)
    (*movdf_vfp, *thumb2_movdf_vfp, *movsfcc_vfp)
    (*thumb2_movsfcc_vfp, *movdfcc_vfp, *thumb2_movdfcc_vfp): Add
    neon_type.
    * config/arm/arm.md (neon_type): Update comment.
Richard Earnshaw - July 17, 2012, 4:25 p.m.
On 17/07/12 15:14, Julian Brown wrote:
> Hi,
> 
> Whilst looking at upstreaming some of CodeSourcery/Mentor's old
> patches, I came across the following again, originally by Mark Shinwell
> but last sent upstream by Andrew Stubbs:
> 
> http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
> 
> I've attached a new version, re-tested but not re-benchmarked (I'm not
> set up to do the latter just at the moment).
> 
> I spent a while being quite unconvinced that this patch was a
> reasonable solution for the problem (of scheduling transfers between the
> NEON or VFP and the core unit) -- either on Cortex-A8, which the patch
> was originally supposed to target, or any other core version. So, this
> is what I came up with to try to convince myself.
> 
> Handling of these transfers is unfortunately dealt with rather
> haphazardly on various cores. I only looked in any detail at Cortex-A8
> and Cortex-A9. FAOD, I think we're principally concerned here with
> instructions like "movdi_vfp", which may be used to transfer either
> "VFP" or "NEON" register values to and from core registers, but also
> with movsf and friends for the "runfast"-mode case.
> 
> * On Cortex-A8, instructions with "f_2_r" (nor "r_2_f") type do not
>   participate in scheduling at all. This was probably the root of the
>   problem the patch was trying to fix originally: this means that
>   there is nothing to stop instructions which use the vfp/neon-to-core
>   result from being placed directly after the transfer. (A scheduler
>   dump shows "nothing" for the unit utilisation for these types of
>   instruction at present.)
> 
> * On Cortex-A9, the "f_2_r" and "r_2_f" types both use the (core) insn
>   reservation "cortex_a9_fps", which produces its result after 2
>   cycles. The A9 TRM on infocentre.arm.com says that VFP-to-core
>   transfers take 3 cycles, and NEON-to-core transfers take 11 cycles.
> 
> * The cortex-r4f.md and cortex-m4-fpu.md scheduler descriptions seem to
>   handle f_2_r and r_2_f sanely.
> 
> So, there are generally multiple DFA descriptions in play at any given
> time, relating to different core units with different sets of registers,
> and transferring values between those cores isn't really being
> described properly to GCC's scheduler in a consistent way at present.
> 
> Fixing this "properly", e.g. fully describing the way the NEON
> unit is decoupled from the core unit on the Cortex-A8, is probably
> impractical. But, the only cases we're concerned with at present are
> transfers of values from one unit to another. I think setting both
> "type" and "neon_type" for transfer instructions is therefore the
> right thing to do -- although given the above, and other restrictions
> (i.e. the mutual exclusion between VFP and NEON units on Cortex-A9),
> it's probably not a complete solution to the problem.
> 
> To expand:
> 
> * "type" should be set to f_2_r for MRC-type instructions because of
>   the dependency on an earlier instruction setting the input register.
> 
> * "neon_type" should be set appropriately for MRC-type instructions
>   because following NEON instructions will have dependencies on the
>   transferred value.
> 
> * ...and vice-versa for MCR-type instructions.
> 
> Incidentally other patches, probably merged from our source base, have
> introduced "neon_type" alongside "type" for a couple of instructions in
> vfp.md already: namely, *movdi_vfp and *movhf_vfp_neon: whatever we
> choose to do, we should be consistent.
> 
> So: OK to apply?
> 

This is OK.

I'll note for the record here, that I think having type, insn and
neon_type attributes all trying to do the same thing (that is, classify
the insn for scheduling purposes) is broken.  The correct fix, long
term, is to simplify these into one 'type' attribute that is then rich
enough to handle all the different classifications.  Subdividing a
classification should be done by splitting an existing class into two
entries in the same attribute, not adding some parallel alternative
attribute for the sub-division -- that way lies chaos.

However, that's rather a lot of work and this patch won't make that
problem any harder.

R.

> Thanks,
> 
> Julian
> 
> 2011-02-22  Andrew Stubbs  <ams@codesourcery.com>
> 	    Mark Shinwell  <shinwell@codesourcery.com>
>             Julian Brown  <julian@codesourcery.com>
> 
>     gcc/
>     * config/arm/vfp.md (*arm_movsi_vfp, *thumb2_movsi_vfp)
>     (*movdi_vfp_cortexa8, *movsf_vfp, *thumb2_movsf_vfp)
>     (*movdf_vfp, *thumb2_movdf_vfp, *movsfcc_vfp)
>     (*thumb2_movsfcc_vfp, *movdfcc_vfp, *thumb2_movdfcc_vfp): Add
>     neon_type.
>     * config/arm/arm.md (neon_type): Update comment.
> 
> 
> neon-type-vfplite-to-core-1.diff
> 
> 
> commit 44d6297b8e73eae42b3d6a0da7ca857f549049c1
> Author: Julian Brown <jbrown@build6-lucid-cs.sje.mentorg.com>
> Date:   Fri Jul 13 10:32:46 2012 -0700
> 
>     Set neon_type for vfp-to-core transfers.
> 
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 8888382..e9da56d 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -397,8 +397,6 @@
>  (define_attr "ldsched" "no,yes" (const (symbol_ref "arm_ld_sched")))
>  
>  ;; Classification of NEON instructions for scheduling purposes.
> -;; Do not set this attribute and the "type" attribute together in
> -;; any one instruction pattern.
>  (define_attr "neon_type"
>     "neon_int_1,\
>     neon_int_2,\
> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index 8369949..4a56d57 100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -78,6 +78,7 @@
>    "
>    [(set_attr "predicable" "yes")
>     (set_attr "type" "*,*,*,*,load1,store1,r_2_f,f_2_r,fcpys,f_loads,f_stores")
> +   (set_attr "neon_type" "*,*,*,*,*,*,neon_mcr,neon_mrc,neon_vmov,*,*")
>     (set_attr "insn" "mov,mov,mvn,mov,*,*,*,*,*,*,*")
>     (set_attr "pool_range"     "*,*,*,*,4096,*,*,*,*,1020,*")
>     (set_attr "neg_pool_range" "*,*,*,*,4084,*,*,*,*,1008,*")]
> @@ -120,6 +121,7 @@
>    "
>    [(set_attr "predicable" "yes")
>     (set_attr "type" "*,*,*,*,load1,load1,store1,store1,r_2_f,f_2_r,fcpys,f_loads,f_stores")
> +   (set_attr "neon_type" "*,*,*,*,*,*,*,*,neon_mcr,neon_mrc,neon_vmov,*,*")
>     (set_attr "insn" "mov,mov,mvn,mov,*,*,*,*,*,*,*,*,*")
>     (set_attr "pool_range"     "*,*,*,*,1020,4096,*,*,*,*,*,1020,*")
>     (set_attr "neg_pool_range" "*,*,*,*,   0,   0,*,*,*,*,*,1008,*")]
> @@ -212,6 +214,7 @@
>      }
>    "
>    [(set_attr "type" "*,*,*,*,load2,load2,store2,r_2_f,f_2_r,ffarithd,f_loadd,f_stored")
> +   (set_attr "neon_type" "*,*,*,*,*,*,*,neon_mcr_2_mcrr,neon_mrrc,neon_vmov,*,*")
>     (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
>                                 (eq_attr "alternative" "2") (const_int 12)
>                                 (eq_attr "alternative" "3") (const_int 16)
> @@ -370,6 +373,7 @@
>    [(set_attr "predicable" "yes")
>     (set_attr "type"
>       "r_2_f,f_2_r,fconsts,f_loads,f_stores,load1,store1,fcpys,*")
> +   (set_attr "neon_type" "neon_mcr,neon_mrc,*,*,*,*,*,neon_vmov,*")
>     (set_attr "insn" "*,*,*,*,*,*,*,*,mov")
>     (set_attr "pool_range" "*,*,*,1020,*,4096,*,*,*")
>     (set_attr "neg_pool_range" "*,*,*,1008,*,4080,*,*,*")]
> @@ -407,6 +411,7 @@
>    [(set_attr "predicable" "yes")
>     (set_attr "type"
>       "r_2_f,f_2_r,fconsts,f_loads,f_stores,load1,store1,fcpys,*")
> +   (set_attr "neon_type" "neon_mcr,neon_mrc,*,*,*,*,*,neon_vmov,*")
>     (set_attr "insn" "*,*,*,*,*,*,*,*,mov")
>     (set_attr "pool_range" "*,*,*,1020,*,4092,*,*,*")
>     (set_attr "neg_pool_range" "*,*,*,1008,*,0,*,*,*")]
> @@ -450,6 +455,7 @@
>    "
>    [(set_attr "type"
>       "r_2_f,f_2_r,fconstd,f_loadd,f_stored,load2,store2,ffarithd,*")
> +   (set_attr "neon_type" "neon_mcr_2_mcrr,neon_mrrc,*,*,*,*,*,neon_vmov,*")
>     (set (attr "length") (cond [(eq_attr "alternative" "5,6,8") (const_int 8)
>  			       (eq_attr "alternative" "7")
>  				(if_then_else
> @@ -493,6 +499,7 @@
>    "
>    [(set_attr "type"
>       "r_2_f,f_2_r,fconstd,f_loadd,f_stored,load2,store2,ffarithd,*")
> +   (set_attr "neon_type" "neon_mcr_2_mcrr,neon_mrrc,*,*,*,*,*,neon_vmov,*")
>     (set (attr "length") (cond [(eq_attr "alternative" "5,6,8") (const_int 8)
>  			       (eq_attr "alternative" "7")
>  				(if_then_else
> @@ -527,7 +534,8 @@
>     fmrs%D3\\t%0, %2\;fmrs%d3\\t%0, %1"
>     [(set_attr "conds" "use")
>      (set_attr "length" "4,4,8,4,4,8,4,4,8")
> -    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
> +    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
> +    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr,neon_mcr,neon_mcr,neon_mrc,neon_mrc,neon_mrc")]
>  )
>  
>  (define_insn "*thumb2_movsfcc_vfp"
> @@ -550,7 +558,8 @@
>     ite\\t%D3\;fmrs%D3\\t%0, %2\;fmrs%d3\\t%0, %1"
>     [(set_attr "conds" "use")
>      (set_attr "length" "6,6,10,6,6,10,6,6,10")
> -    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
> +    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
> +    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr,neon_mcr,neon_mcr,neon_mrc,neon_mrc,neon_mrc")]
>  )
>  
>  (define_insn "*movdfcc_vfp"
> @@ -573,7 +582,8 @@
>     fmrrd%D3\\t%Q0, %R0, %P2\;fmrrd%d3\\t%Q0, %R0, %P1"
>     [(set_attr "conds" "use")
>      (set_attr "length" "4,4,8,4,4,8,4,4,8")
> -    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
> +    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
> +    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mrrc,neon_mrrc,neon_mrrc")]
>  )
>  
>  (define_insn "*thumb2_movdfcc_vfp"
> @@ -596,7 +606,8 @@
>     ite\\t%D3\;fmrrd%D3\\t%Q0, %R0, %P2\;fmrrd%d3\\t%Q0, %R0, %P1"
>     [(set_attr "conds" "use")
>      (set_attr "length" "6,6,10,6,6,10,6,6,10")
> -    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
> +    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
> +    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mrrc,neon_mrrc,neon_mrrc")]
>  )
>  
>  
>

Patch

commit 44d6297b8e73eae42b3d6a0da7ca857f549049c1
Author: Julian Brown <jbrown@build6-lucid-cs.sje.mentorg.com>
Date:   Fri Jul 13 10:32:46 2012 -0700

    Set neon_type for vfp-to-core transfers.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 8888382..e9da56d 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -397,8 +397,6 @@ 
 (define_attr "ldsched" "no,yes" (const (symbol_ref "arm_ld_sched")))
 
 ;; Classification of NEON instructions for scheduling purposes.
-;; Do not set this attribute and the "type" attribute together in
-;; any one instruction pattern.
 (define_attr "neon_type"
    "neon_int_1,\
    neon_int_2,\
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 8369949..4a56d57 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -78,6 +78,7 @@ 
   "
   [(set_attr "predicable" "yes")
    (set_attr "type" "*,*,*,*,load1,store1,r_2_f,f_2_r,fcpys,f_loads,f_stores")
+   (set_attr "neon_type" "*,*,*,*,*,*,neon_mcr,neon_mrc,neon_vmov,*,*")
    (set_attr "insn" "mov,mov,mvn,mov,*,*,*,*,*,*,*")
    (set_attr "pool_range"     "*,*,*,*,4096,*,*,*,*,1020,*")
    (set_attr "neg_pool_range" "*,*,*,*,4084,*,*,*,*,1008,*")]
@@ -120,6 +121,7 @@ 
   "
   [(set_attr "predicable" "yes")
    (set_attr "type" "*,*,*,*,load1,load1,store1,store1,r_2_f,f_2_r,fcpys,f_loads,f_stores")
+   (set_attr "neon_type" "*,*,*,*,*,*,*,*,neon_mcr,neon_mrc,neon_vmov,*,*")
    (set_attr "insn" "mov,mov,mvn,mov,*,*,*,*,*,*,*,*,*")
    (set_attr "pool_range"     "*,*,*,*,1020,4096,*,*,*,*,*,1020,*")
    (set_attr "neg_pool_range" "*,*,*,*,   0,   0,*,*,*,*,*,1008,*")]
@@ -212,6 +214,7 @@ 
     }
   "
   [(set_attr "type" "*,*,*,*,load2,load2,store2,r_2_f,f_2_r,ffarithd,f_loadd,f_stored")
+   (set_attr "neon_type" "*,*,*,*,*,*,*,neon_mcr_2_mcrr,neon_mrrc,neon_vmov,*,*")
    (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
                                (eq_attr "alternative" "2") (const_int 12)
                                (eq_attr "alternative" "3") (const_int 16)
@@ -370,6 +373,7 @@ 
   [(set_attr "predicable" "yes")
    (set_attr "type"
      "r_2_f,f_2_r,fconsts,f_loads,f_stores,load1,store1,fcpys,*")
+   (set_attr "neon_type" "neon_mcr,neon_mrc,*,*,*,*,*,neon_vmov,*")
    (set_attr "insn" "*,*,*,*,*,*,*,*,mov")
    (set_attr "pool_range" "*,*,*,1020,*,4096,*,*,*")
    (set_attr "neg_pool_range" "*,*,*,1008,*,4080,*,*,*")]
@@ -407,6 +411,7 @@ 
   [(set_attr "predicable" "yes")
    (set_attr "type"
      "r_2_f,f_2_r,fconsts,f_loads,f_stores,load1,store1,fcpys,*")
+   (set_attr "neon_type" "neon_mcr,neon_mrc,*,*,*,*,*,neon_vmov,*")
    (set_attr "insn" "*,*,*,*,*,*,*,*,mov")
    (set_attr "pool_range" "*,*,*,1020,*,4092,*,*,*")
    (set_attr "neg_pool_range" "*,*,*,1008,*,0,*,*,*")]
@@ -450,6 +455,7 @@ 
   "
   [(set_attr "type"
      "r_2_f,f_2_r,fconstd,f_loadd,f_stored,load2,store2,ffarithd,*")
+   (set_attr "neon_type" "neon_mcr_2_mcrr,neon_mrrc,*,*,*,*,*,neon_vmov,*")
    (set (attr "length") (cond [(eq_attr "alternative" "5,6,8") (const_int 8)
 			       (eq_attr "alternative" "7")
 				(if_then_else
@@ -493,6 +499,7 @@ 
   "
   [(set_attr "type"
      "r_2_f,f_2_r,fconstd,f_loadd,f_stored,load2,store2,ffarithd,*")
+   (set_attr "neon_type" "neon_mcr_2_mcrr,neon_mrrc,*,*,*,*,*,neon_vmov,*")
    (set (attr "length") (cond [(eq_attr "alternative" "5,6,8") (const_int 8)
 			       (eq_attr "alternative" "7")
 				(if_then_else
@@ -527,7 +534,8 @@ 
    fmrs%D3\\t%0, %2\;fmrs%d3\\t%0, %1"
    [(set_attr "conds" "use")
     (set_attr "length" "4,4,8,4,4,8,4,4,8")
-    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
+    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
+    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr,neon_mcr,neon_mcr,neon_mrc,neon_mrc,neon_mrc")]
 )
 
 (define_insn "*thumb2_movsfcc_vfp"
@@ -550,7 +558,8 @@ 
    ite\\t%D3\;fmrs%D3\\t%0, %2\;fmrs%d3\\t%0, %1"
    [(set_attr "conds" "use")
     (set_attr "length" "6,6,10,6,6,10,6,6,10")
-    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
+    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
+    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr,neon_mcr,neon_mcr,neon_mrc,neon_mrc,neon_mrc")]
 )
 
 (define_insn "*movdfcc_vfp"
@@ -573,7 +582,8 @@ 
    fmrrd%D3\\t%Q0, %R0, %P2\;fmrrd%d3\\t%Q0, %R0, %P1"
    [(set_attr "conds" "use")
     (set_attr "length" "4,4,8,4,4,8,4,4,8")
-    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
+    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
+    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mrrc,neon_mrrc,neon_mrrc")]
 )
 
 (define_insn "*thumb2_movdfcc_vfp"
@@ -596,7 +606,8 @@ 
    ite\\t%D3\;fmrrd%D3\\t%Q0, %R0, %P2\;fmrrd%d3\\t%Q0, %R0, %P1"
    [(set_attr "conds" "use")
     (set_attr "length" "6,6,10,6,6,10,6,6,10")
-    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
+    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
+    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mrrc,neon_mrrc,neon_mrrc")]
 )