diff mbox

[01/10] target-arm: Make Neon helper routines use correct FP status

Message ID 1301668243-29886-2-git-send-email-peter.maydell@linaro.org
State New
Headers show

Commit Message

Peter Maydell April 1, 2011, 2:30 p.m. UTC
Make the Neon helper routines use the correct FP status from
the CPUEnv rather than using a dummy static one. This means
they will correctly handle denormals and NaNs and will set
FPSCR exception bits properly.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/helpers.h     |   22 +++++++++++-----------
 target-arm/neon_helper.c |   21 ++++++++++-----------
 target-arm/translate.c   |   42 ++++++++++++++++++++++--------------------
 3 files changed, 43 insertions(+), 42 deletions(-)

Comments

Blue Swirl April 1, 2011, 6:29 p.m. UTC | #1
On Fri, Apr 1, 2011 at 5:30 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> Make the Neon helper routines use the correct FP status from
> the CPUEnv rather than using a dummy static one. This means
> they will correctly handle denormals and NaNs and will set
> FPSCR exception bits properly.
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target-arm/helpers.h     |   22 +++++++++++-----------
>  target-arm/neon_helper.c |   21 ++++++++++-----------
>  target-arm/translate.c   |   42 ++++++++++++++++++++++--------------------
>  3 files changed, 43 insertions(+), 42 deletions(-)
>
> diff --git a/target-arm/helpers.h b/target-arm/helpers.h
> index bd6977c..e2260b6 100644
> --- a/target-arm/helpers.h
> +++ b/target-arm/helpers.h
> @@ -350,17 +350,17 @@ DEF_HELPER_2(neon_qneg_s8, i32, env, i32)
>  DEF_HELPER_2(neon_qneg_s16, i32, env, i32)
>  DEF_HELPER_2(neon_qneg_s32, i32, env, i32)
>
> -DEF_HELPER_2(neon_min_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_max_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_abd_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_add_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_sub_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_mul_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_ceq_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_cge_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_cgt_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_acge_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_acgt_f32, i32, i32, i32)
> +DEF_HELPER_3(neon_min_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_max_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_abd_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_add_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_sub_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_mul_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_ceq_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_cge_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_cgt_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_acge_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_acgt_f32, i32, env, i32, i32)
>
>  /* iwmmxt_helper.c */
>  DEF_HELPER_2(iwmmxt_maddsq, i64, i64, i64)
> diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
> index 002a9c1..97bc1e6 100644
> --- a/target-arm/neon_helper.c
> +++ b/target-arm/neon_helper.c
> @@ -18,8 +18,7 @@
>
>  #define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] = CPSR_Q
>
> -static float_status neon_float_status;
> -#define NFS &neon_float_status
> +#define NFS (&env->vfp.standard_fp_status)
>
>  /* Helper routines to perform bitwise copies between float and int.  */
>  static inline float32 vfp_itos(uint32_t i)
> @@ -1794,21 +1793,21 @@ uint32_t HELPER(neon_qneg_s32)(CPUState *env, uint32_t x)
>  }
>
>  /* NEON Float helpers.  */
> -uint32_t HELPER(neon_min_f32)(uint32_t a, uint32_t b)
> +uint32_t HELPER(neon_min_f32)(CPUState *env, uint32_t a, uint32_t b)

I didn't check, but if neon_helper.c is compiled like op_helper.c,
passing env should not be needed. If that is not the case, the
functions could be moved to op_helper.c.
Peter Maydell April 1, 2011, 10:33 p.m. UTC | #2
On 1 April 2011 19:29, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Fri, Apr 1, 2011 at 5:30 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> Make the Neon helper routines use the correct FP status from
>> the CPUEnv rather than using a dummy static one. This means
>> they will correctly handle denormals and NaNs and will set
>> FPSCR exception bits properly.

> I didn't check, but if neon_helper.c is compiled like op_helper.c,
> passing env should not be needed.

It isn't; see the comments when this patch was first posted.

> If that is not the case, the
> functions could be moved to op_helper.c.

Nobody seemed to have a coherent rule about when functions
should be in op_helper.c and use the global 'env' variable
and when they should be in some other file and have 'env'
passed as a parameter (or indeed why only op_helper.c
should be special in this way). Currently the ARM target has
both kinds of helper function. So I took the straightforward
approach of not moving code around wholesale, and just
passed in the env pointer, consistent with the way we already
handle most functions that talk to softfloat.

-- PMM
Blue Swirl April 3, 2011, 9:41 a.m. UTC | #3
On Sat, Apr 2, 2011 at 1:33 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 1 April 2011 19:29, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On Fri, Apr 1, 2011 at 5:30 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> Make the Neon helper routines use the correct FP status from
>>> the CPUEnv rather than using a dummy static one. This means
>>> they will correctly handle denormals and NaNs and will set
>>> FPSCR exception bits properly.
>
>> I didn't check, but if neon_helper.c is compiled like op_helper.c,
>> passing env should not be needed.
>
> It isn't; see the comments when this patch was first posted.
>
>> If that is not the case, the
>> functions could be moved to op_helper.c.
>
> Nobody seemed to have a coherent rule about when functions
> should be in op_helper.c and use the global 'env' variable
> and when they should be in some other file and have 'env'
> passed as a parameter (or indeed why only op_helper.c
> should be special in this way). Currently the ARM target has
> both kinds of helper function. So I took the straightforward
> approach of not moving code around wholesale, and just
> passed in the env pointer, consistent with the way we already
> handle most functions that talk to softfloat.

In general, helpers for the translated code belong to op_helper.c.
They can and should access global env directly for speed. If a helper
is used for both translated code and outside of it, a wrapper should
be added to do global env shuffling (or for example a copy without
shuffling added).
Peter Maydell April 3, 2011, 10:51 a.m. UTC | #4
On 3 April 2011 10:41, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Sat, Apr 2, 2011 at 1:33 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> Nobody seemed to have a coherent rule about when functions
>> should be in op_helper.c and use the global 'env' variable
>> and when they should be in some other file and have 'env'
>> passed as a parameter

> In general, helpers for the translated code belong to op_helper.c.
> They can and should access global env directly for speed. If a helper
> is used for both translated code and outside of it, a wrapper should
> be added to do global env shuffling (or for example a copy without
> shuffling added).

OK, we can do that, but at the moment "helper function not in
op_helper.c" is hugely in the majority so there's a lot of
code we'd be moving around:

$ grep -c HELPER target-arm/*.c
target-arm/helper.c:68
target-arm/iwmmxt_helper.c:59
target-arm/machine.c:0
target-arm/neon_helper.c:103
target-arm/op_helper.c:28
target-arm/translate.c:2

(ie just 10% or so of ARM helper functions are in op_helper.c)

...and this cleanup would basically amount to folding
neon_helper.c, iwmmxt_helper.c and bits of helper.c into
op_helper.c (and then removing the 'env' parameters, so
a big patch to translate.c as well, which I don't suppose
anybody maintaining an out-of-tree target-arm patchset will
thank us for :-)).

But I can submit a patch to do that if it's the right thing.

-- PMM
Blue Swirl April 3, 2011, 11:10 a.m. UTC | #5
On Sun, Apr 3, 2011 at 1:51 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 3 April 2011 10:41, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On Sat, Apr 2, 2011 at 1:33 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> Nobody seemed to have a coherent rule about when functions
>>> should be in op_helper.c and use the global 'env' variable
>>> and when they should be in some other file and have 'env'
>>> passed as a parameter
>
>> In general, helpers for the translated code belong to op_helper.c.
>> They can and should access global env directly for speed. If a helper
>> is used for both translated code and outside of it, a wrapper should
>> be added to do global env shuffling (or for example a copy without
>> shuffling added).
>
> OK, we can do that, but at the moment "helper function not in
> op_helper.c" is hugely in the majority so there's a lot of
> code we'd be moving around:
>
> $ grep -c HELPER target-arm/*.c
> target-arm/helper.c:68
> target-arm/iwmmxt_helper.c:59
> target-arm/machine.c:0
> target-arm/neon_helper.c:103
> target-arm/op_helper.c:28
> target-arm/translate.c:2
>
> (ie just 10% or so of ARM helper functions are in op_helper.c)
>
> ...and this cleanup would basically amount to folding
> neon_helper.c, iwmmxt_helper.c and bits of helper.c into
> op_helper.c (and then removing the 'env' parameters, so
> a big patch to translate.c as well, which I don't suppose
> anybody maintaining an out-of-tree target-arm patchset will
> thank us for :-)).

Alternatively those files could be compiled with HELPER_CFLAGS. In
either case, the code would have to be checked for 'env' usage and
adjusted.

> But I can submit a patch to do that if it's the right thing.

It's not so much about correctness, but performance. All generated
code already has access to global env, so passing it via helper
arguments requires extra instructions which can be avoided.
Peter Maydell April 3, 2011, 11:21 a.m. UTC | #6
On 3 April 2011 12:10, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Sun, Apr 3, 2011 at 1:51 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> But I can submit a patch to do that if it's the right thing.
>
> It's not so much about correctness, but performance. All generated
> code already has access to global env, so passing it via helper
> arguments requires extra instructions which can be avoided.

Yes; I meant "right thing" in the more general senses of "is best
practice for qemu code" and "causes my patches to be accepted" :-)

Anyway, how about I do a version of this patch which moves
the affected neon helpers to op_helper.c rather than adding
an env parameter (which would avoid changes that a cleanup
would just have to revert); but leave patch 10 as-is (that's
the one which is touching vfp helpers, but it is just cleanup
which isn't changing how they handle env) ?

Then I can do a separate patchset to move other helpers,
rather than tangling a code-cleanup patchset with Neon
correctness fixes.

-- PMM
Blue Swirl April 3, 2011, 11:52 a.m. UTC | #7
On Sun, Apr 3, 2011 at 2:21 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 3 April 2011 12:10, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On Sun, Apr 3, 2011 at 1:51 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> But I can submit a patch to do that if it's the right thing.
>>
>> It's not so much about correctness, but performance. All generated
>> code already has access to global env, so passing it via helper
>> arguments requires extra instructions which can be avoided.
>
> Yes; I meant "right thing" in the more general senses of "is best
> practice for qemu code" and "causes my patches to be accepted" :-)
>
> Anyway, how about I do a version of this patch which moves
> the affected neon helpers to op_helper.c rather than adding
> an env parameter (which would avoid changes that a cleanup
> would just have to revert); but leave patch 10 as-is (that's
> the one which is touching vfp helpers, but it is just cleanup
> which isn't changing how they handle env) ?
>
> Then I can do a separate patchset to move other helpers,
> rather than tangling a code-cleanup patchset with Neon
> correctness fixes.

Sounds OK to me, but this is entirely up to you and ARM maintainers.
Aurelien Jarno April 3, 2011, 3:12 p.m. UTC | #8
On Sun, Apr 03, 2011 at 12:21:49PM +0100, Peter Maydell wrote:
> On 3 April 2011 12:10, Blue Swirl <blauwirbel@gmail.com> wrote:
> > On Sun, Apr 3, 2011 at 1:51 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> >> But I can submit a patch to do that if it's the right thing.
> >
> > It's not so much about correctness, but performance. All generated
> > code already has access to global env, so passing it via helper
> > arguments requires extra instructions which can be avoided.
> 
> Yes; I meant "right thing" in the more general senses of "is best
> practice for qemu code" and "causes my patches to be accepted" :-)
> 
> Anyway, how about I do a version of this patch which moves
> the affected neon helpers to op_helper.c rather than adding
> an env parameter (which would avoid changes that a cleanup
> would just have to revert); but leave patch 10 as-is (that's
> the one which is touching vfp helpers, but it is just cleanup
> which isn't changing how they handle env) ?
> 
> Then I can do a separate patchset to move other helpers,
> rather than tangling a code-cleanup patchset with Neon
> correctness fixes.
> 
 
This solution looks fine for me. That said, I am not sure moving
everything to op_helper.c is the best solution. I would rather go for
compiling *_helper.c with HELPER_CFLAGS, which avoids having one big
file which is messy to edit, and long to compile.

I leave you choose what you prefer.
Peter Maydell April 3, 2011, 3:32 p.m. UTC | #9
On 3 April 2011 16:12, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On Sun, Apr 03, 2011 at 12:21:49PM +0100, Peter Maydell wrote:
>> Anyway, how about I do a version of this patch which moves
>> the affected neon helpers to op_helper.c rather than adding
>> an env parameter

>> Then I can do a separate patchset to move other helpers,
>> rather than tangling a code-cleanup patchset with Neon
>> correctness fixes.

> This solution looks fine for me. That said, I am not sure moving
> everything to op_helper.c is the best solution. I would rather go for
> compiling *_helper.c with HELPER_CFLAGS, which avoids having one big
> file which is messy to edit, and long to compile.

That sounds better, actually, and avoids moving too much
code around. Still leaves the choice of:
 * move VFP helpers from target-arm/helper.c to another file
 * compile all target-*/helper.c with HELPER_CFLAGS
 * arm-specific exception in Makefile.target

of which I'll go for option 1.

-- PMM
Aurelien Jarno April 3, 2011, 4:01 p.m. UTC | #10
On Sun, Apr 03, 2011 at 04:32:51PM +0100, Peter Maydell wrote:
> On 3 April 2011 16:12, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > On Sun, Apr 03, 2011 at 12:21:49PM +0100, Peter Maydell wrote:
> >> Anyway, how about I do a version of this patch which moves
> >> the affected neon helpers to op_helper.c rather than adding
> >> an env parameter
> 
> >> Then I can do a separate patchset to move other helpers,
> >> rather than tangling a code-cleanup patchset with Neon
> >> correctness fixes.
> 
> > This solution looks fine for me. That said, I am not sure moving
> > everything to op_helper.c is the best solution. I would rather go for
> > compiling *_helper.c with HELPER_CFLAGS, which avoids having one big
> > file which is messy to edit, and long to compile.
> 
> That sounds better, actually, and avoids moving too much
> code around. Still leaves the choice of:
>  * move VFP helpers from target-arm/helper.c to another file
>  * compile all target-*/helper.c with HELPER_CFLAGS

I am not sure you can do this one, as some functions there are not
called from the TCG code nor from code where env is in a fixed register.

>  * arm-specific exception in Makefile.target
> 
> of which I'll go for option 1.
> 
That's also my preferred option.
diff mbox

Patch

diff --git a/target-arm/helpers.h b/target-arm/helpers.h
index bd6977c..e2260b6 100644
--- a/target-arm/helpers.h
+++ b/target-arm/helpers.h
@@ -350,17 +350,17 @@  DEF_HELPER_2(neon_qneg_s8, i32, env, i32)
 DEF_HELPER_2(neon_qneg_s16, i32, env, i32)
 DEF_HELPER_2(neon_qneg_s32, i32, env, i32)
 
-DEF_HELPER_2(neon_min_f32, i32, i32, i32)
-DEF_HELPER_2(neon_max_f32, i32, i32, i32)
-DEF_HELPER_2(neon_abd_f32, i32, i32, i32)
-DEF_HELPER_2(neon_add_f32, i32, i32, i32)
-DEF_HELPER_2(neon_sub_f32, i32, i32, i32)
-DEF_HELPER_2(neon_mul_f32, i32, i32, i32)
-DEF_HELPER_2(neon_ceq_f32, i32, i32, i32)
-DEF_HELPER_2(neon_cge_f32, i32, i32, i32)
-DEF_HELPER_2(neon_cgt_f32, i32, i32, i32)
-DEF_HELPER_2(neon_acge_f32, i32, i32, i32)
-DEF_HELPER_2(neon_acgt_f32, i32, i32, i32)
+DEF_HELPER_3(neon_min_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_max_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_abd_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_add_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_sub_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_mul_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_ceq_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_cge_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_cgt_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_acge_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_acgt_f32, i32, env, i32, i32)
 
 /* iwmmxt_helper.c */
 DEF_HELPER_2(iwmmxt_maddsq, i64, i64, i64)
diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
index 002a9c1..97bc1e6 100644
--- a/target-arm/neon_helper.c
+++ b/target-arm/neon_helper.c
@@ -18,8 +18,7 @@ 
 
 #define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] = CPSR_Q
 
-static float_status neon_float_status;
-#define NFS &neon_float_status
+#define NFS (&env->vfp.standard_fp_status)
 
 /* Helper routines to perform bitwise copies between float and int.  */
 static inline float32 vfp_itos(uint32_t i)
@@ -1794,21 +1793,21 @@  uint32_t HELPER(neon_qneg_s32)(CPUState *env, uint32_t x)
 }
 
 /* NEON Float helpers.  */
-uint32_t HELPER(neon_min_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_min_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     float32 f0 = vfp_itos(a);
     float32 f1 = vfp_itos(b);
     return (float32_compare_quiet(f0, f1, NFS) == -1) ? a : b;
 }
 
-uint32_t HELPER(neon_max_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_max_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     float32 f0 = vfp_itos(a);
     float32 f1 = vfp_itos(b);
     return (float32_compare_quiet(f0, f1, NFS) == 1) ? a : b;
 }
 
-uint32_t HELPER(neon_abd_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_abd_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     float32 f0 = vfp_itos(a);
     float32 f1 = vfp_itos(b);
@@ -1817,24 +1816,24 @@  uint32_t HELPER(neon_abd_f32)(uint32_t a, uint32_t b)
                     : float32_sub(f1, f0, NFS));
 }
 
-uint32_t HELPER(neon_add_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_add_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     return vfp_stoi(float32_add(vfp_itos(a), vfp_itos(b), NFS));
 }
 
-uint32_t HELPER(neon_sub_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_sub_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     return vfp_stoi(float32_sub(vfp_itos(a), vfp_itos(b), NFS));
 }
 
-uint32_t HELPER(neon_mul_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_mul_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     return vfp_stoi(float32_mul(vfp_itos(a), vfp_itos(b), NFS));
 }
 
 /* Floating point comparisons produce an integer result.  */
 #define NEON_VOP_FCMP(name, cmp) \
-uint32_t HELPER(neon_##name)(uint32_t a, uint32_t b) \
+uint32_t HELPER(neon_##name)(CPUState *env, uint32_t a, uint32_t b) \
 { \
     if (float32_compare_quiet(vfp_itos(a), vfp_itos(b), NFS) cmp 0) \
         return ~0; \
@@ -1846,14 +1845,14 @@  NEON_VOP_FCMP(ceq_f32, ==)
 NEON_VOP_FCMP(cge_f32, >=)
 NEON_VOP_FCMP(cgt_f32, >)
 
-uint32_t HELPER(neon_acge_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_acge_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     float32 f0 = float32_abs(vfp_itos(a));
     float32 f1 = float32_abs(vfp_itos(b));
     return (float32_compare_quiet(f0, f1,NFS) >= 0) ? ~0 : 0;
 }
 
-uint32_t HELPER(neon_acgt_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_acgt_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     float32 f0 = float32_abs(vfp_itos(a));
     float32 f1 = float32_abs(vfp_itos(b));
diff --git a/target-arm/translate.c b/target-arm/translate.c
index f69912f..cf2440e 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -4519,56 +4519,56 @@  static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
         case 26: /* Floating point arithnetic.  */
             switch ((u << 2) | size) {
             case 0: /* VADD */
-                gen_helper_neon_add_f32(tmp, tmp, tmp2);
+                gen_helper_neon_add_f32(tmp, cpu_env, tmp, tmp2);
                 break;
             case 2: /* VSUB */
-                gen_helper_neon_sub_f32(tmp, tmp, tmp2);
+                gen_helper_neon_sub_f32(tmp, cpu_env, tmp, tmp2);
                 break;
             case 4: /* VPADD */
-                gen_helper_neon_add_f32(tmp, tmp, tmp2);
+                gen_helper_neon_add_f32(tmp, cpu_env, tmp, tmp2);
                 break;
             case 6: /* VABD */
-                gen_helper_neon_abd_f32(tmp, tmp, tmp2);
+                gen_helper_neon_abd_f32(tmp, cpu_env, tmp, tmp2);
                 break;
             default:
                 return 1;
             }
             break;
         case 27: /* Float multiply.  */
-            gen_helper_neon_mul_f32(tmp, tmp, tmp2);
+            gen_helper_neon_mul_f32(tmp, cpu_env, tmp, tmp2);
             if (!u) {
                 tcg_temp_free_i32(tmp2);
                 tmp2 = neon_load_reg(rd, pass);
                 if (size == 0) {
-                    gen_helper_neon_add_f32(tmp, tmp, tmp2);
+                    gen_helper_neon_add_f32(tmp, cpu_env, tmp, tmp2);
                 } else {
-                    gen_helper_neon_sub_f32(tmp, tmp2, tmp);
+                    gen_helper_neon_sub_f32(tmp, cpu_env, tmp2, tmp);
                 }
             }
             break;
         case 28: /* Float compare.  */
             if (!u) {
-                gen_helper_neon_ceq_f32(tmp, tmp, tmp2);
+                gen_helper_neon_ceq_f32(tmp, cpu_env, tmp, tmp2);
             } else {
                 if (size == 0)
-                    gen_helper_neon_cge_f32(tmp, tmp, tmp2);
+                    gen_helper_neon_cge_f32(tmp, cpu_env, tmp, tmp2);
                 else
-                    gen_helper_neon_cgt_f32(tmp, tmp, tmp2);
+                    gen_helper_neon_cgt_f32(tmp, cpu_env, tmp, tmp2);
             }
             break;
         case 29: /* Float compare absolute.  */
             if (!u)
                 return 1;
             if (size == 0)
-                gen_helper_neon_acge_f32(tmp, tmp, tmp2);
+                gen_helper_neon_acge_f32(tmp, cpu_env, tmp, tmp2);
             else
-                gen_helper_neon_acgt_f32(tmp, tmp, tmp2);
+                gen_helper_neon_acgt_f32(tmp, cpu_env, tmp, tmp2);
             break;
         case 30: /* Float min/max.  */
             if (size == 0)
-                gen_helper_neon_max_f32(tmp, tmp, tmp2);
+                gen_helper_neon_max_f32(tmp, cpu_env, tmp, tmp2);
             else
-                gen_helper_neon_min_f32(tmp, tmp, tmp2);
+                gen_helper_neon_min_f32(tmp, cpu_env, tmp, tmp2);
             break;
         case 31:
             if (size == 0)
@@ -5232,7 +5232,7 @@  static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
                                 gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
                             }
                         } else if (op & 1) {
-                            gen_helper_neon_mul_f32(tmp, tmp, tmp2);
+                            gen_helper_neon_mul_f32(tmp,  cpu_env, tmp, tmp2);
                         } else {
                             switch (size) {
                             case 0: gen_helper_neon_mul_u8(tmp, tmp, tmp2); break;
@@ -5250,13 +5250,15 @@  static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
                                 gen_neon_add(size, tmp, tmp2);
                                 break;
                             case 1:
-                                gen_helper_neon_add_f32(tmp, tmp, tmp2);
+                                gen_helper_neon_add_f32(tmp, cpu_env,
+                                                        tmp, tmp2);
                                 break;
                             case 4:
                                 gen_neon_rsb(size, tmp, tmp2);
                                 break;
                             case 5:
-                                gen_helper_neon_sub_f32(tmp, tmp2, tmp);
+                                gen_helper_neon_sub_f32(tmp,  cpu_env,
+                                                        tmp2, tmp);
                                 break;
                             default:
                                 abort();
@@ -5641,21 +5643,21 @@  static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
                             break;
                         case 24: case 27: /* Float VCGT #0, Float VCLE #0 */
                             tmp2 = tcg_const_i32(0);
-                            gen_helper_neon_cgt_f32(tmp, tmp, tmp2);
+                            gen_helper_neon_cgt_f32(tmp,  cpu_env, tmp, tmp2);
                             tcg_temp_free(tmp2);
                             if (op == 27)
                                 tcg_gen_not_i32(tmp, tmp);
                             break;
                         case 25: case 28: /* Float VCGE #0, Float VCLT #0 */
                             tmp2 = tcg_const_i32(0);
-                            gen_helper_neon_cge_f32(tmp, tmp, tmp2);
+                            gen_helper_neon_cge_f32(tmp,  cpu_env, tmp, tmp2);
                             tcg_temp_free(tmp2);
                             if (op == 28)
                                 tcg_gen_not_i32(tmp, tmp);
                             break;
                         case 26: /* Float VCEQ #0 */
                             tmp2 = tcg_const_i32(0);
-                            gen_helper_neon_ceq_f32(tmp, tmp, tmp2);
+                            gen_helper_neon_ceq_f32(tmp,  cpu_env, tmp, tmp2);
                             tcg_temp_free(tmp2);
                             break;
                         case 30: /* Float VABS */