diff mbox

[13/19] Add VSX ISA2.06 Multiply Add Instructions

Message ID 526949E1.3010405@gmail.com
State New
Headers show

Commit Message

Tom Musta Oct. 24, 2013, 4:25 p.m. UTC
This patch adds the VSX floating point multiply/add instructions
defined by V2.06 of the PowerPC ISA:

   - xsmaddadp,  xvmaddadp,  xvmaddasp
   - xsmaddmdp,  xvmaddmdp,  xvmaddmsp
   - xsmsubadp,  xvmsubadp,  xvmsubasp
   - xsmsubmdp,  xvmsubmdp,  xvmsubmsp
   - xsnmaddadp, xvnmaddadp, xvnmaddasp
   - xsnmaddmdp, xvnmaddmdp, xvnmaddmsp
   - xsnmsubadp, xvnmsubadp, xvnmsubasp
   - xsnmsubmdp, xvnmsubmdp, xvnmsubmsp

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
  target-ppc/fpu_helper.c |  106 +++++++++++++++++++++++++++++++++++++++++++++++
  target-ppc/helper.h     |   24 +++++++++++
  target-ppc/translate.c  |   48 +++++++++++++++++++++
  3 files changed, 178 insertions(+), 0 deletions(-)

Comments

Richard Henderson Oct. 24, 2013, 8:38 p.m. UTC | #1
On 10/24/2013 09:25 AM, Tom Musta wrote:
>                                                \
> +            ft0 = tp##_to_##btp(xa.fld[i], &env->fp_status);                  \
> +            ft1 = tp##_to_##btp(m->fld[i], &env->fp_status);                  \
> +            ft0 = btp##_mul(ft0, ft1, &env->fp_status);                       \
> +            if (unlikely(btp##_is_infinity(ft0) &&                            \
> +                         tp##_is_infinity(s->fld[i]) &&                       \
> +                         btp##_is_neg(ft0) cmp tp##_is_neg(s->fld[i]))) {     \
> +                xt.fld[i] = float64_to_##tp(                                  \
> +                              fload_invalid_op_excp(env,                      \
> +                                                     POWERPC_EXCP_FP_VXISI,   \
> +                                                     sfprf),                  \
> +                              &env->fp_status);                               \
> +            } else {                                                          \
> +                ft1 = tp##_to_##btp(s->fld[i], &env->fp_status);              \
> +                ft0 = btp##_##sum(ft0, ft1, &env->fp_status);                 \
> +                xt.fld[i] = btp##_to_##tp(ft0, &env->fp_status);              \
> +            }                                                                 \
> +            if (neg && likely(!tp##_is_any_nan(xt.fld[i]))) {                 \
> +                xt.fld[i] = tp##_chs(xt.fld[i]);                              \
> +            }                  

You want to be using tp##muladd instead of widening to 128 bits.

> +        s = &xt;                                                              \
> +    }                                                                         \
> +    else {                                                                    \
> +        m = &xt;                                                              \ 

Also be careful of the codingstyle.


r~
Tom Musta Oct. 25, 2013, 1:49 p.m. UTC | #2
On 10/24/2013 3:38 PM, Richard Henderson wrote:
> On 10/24/2013 09:25 AM, Tom Musta wrote:
>>                                                 \
<snip>

>> +                ft1 = tp##_to_##btp(s->fld[i], &env->fp_status);              \
>> +                ft0 = btp##_##sum(ft0, ft1, &env->fp_status);                 \
>> +                xt.fld[i] = btp##_to_##tp(ft0, &env->fp_status);              \
<snip>
> You want to be using tp##muladd instead of widening to 128 bits.

Thanks for the suggestion, Richard.  I will try it.

>
>> +        s = &xt;                                                              \
>> +    }                                                                         \
>> +    else {                                                                    \
>> +        m = &xt;                                                              \
>
> Also be careful of the codingstyle.

To be fixed in V2 (checkpatch.pl missed this one).
Tom Musta Oct. 25, 2013, 4:25 p.m. UTC | #3
On 10/24/2013 3:38 PM, Richard Henderson wrote:
> On 10/24/2013 09:25 AM, Tom Musta wrote:
>>                                                 \
>> +            ft0 = tp##_to_##btp(xa.fld[i], &env->fp_status);                  \
>> +            ft1 = tp##_to_##btp(m->fld[i], &env->fp_status);                  \
>> +            ft0 = btp##_mul(ft0, ft1, &env->fp_status);                       \
>> +            if (unlikely(btp##_is_infinity(ft0) &&                            \
>> +                         tp##_is_infinity(s->fld[i]) &&                       \
>> +                         btp##_is_neg(ft0) cmp tp##_is_neg(s->fld[i]))) {     \
>> +                xt.fld[i] = float64_to_##tp(                                  \
>> +                              fload_invalid_op_excp(env,                      \
>> +                                                     POWERPC_EXCP_FP_VXISI,   \
>> +                                                     sfprf),                  \
>> +                              &env->fp_status);                               \
>> +            } else {                                                          \
>> +                ft1 = tp##_to_##btp(s->fld[i], &env->fp_status);              \
>> +                ft0 = btp##_##sum(ft0, ft1, &env->fp_status);                 \
>> +                xt.fld[i] = btp##_to_##tp(ft0, &env->fp_status);              \
>> +            }                                                                 \
>> +            if (neg && likely(!tp##_is_any_nan(xt.fld[i]))) {                 \
>> +                xt.fld[i] = tp##_chs(xt.fld[i]);                              \
>> +            }
>
> You want to be using tp##muladd instead of widening to 128 bits.

I tried recoding xsmaddadp using float64_muladd.  The problem that I hit is the
boundary case where the intermediate product and the summand are infinities of
the opposite sign.  This is the case handled by the first "if" in the code
snippet above.  PowerPC has a dedicated FPSCR bit for this type of condition
(VXISI) as well as a general invalid operation bit (VX).  As far as I can tell,
the softfloat code only has the equivalent of the VX bit.   Thus the implementation
that I proposed is a more accurate representation of the Power ISA.

The VSX code was modeled after the existing fmadd FPU instruction.  I suspect
the author of that code wrote it this way for similar reasons.

I am inclined to keep my proposed implementation, which is consistent with
the existing PowerPC code.

Thoughts?
Richard Henderson Oct. 25, 2013, 4:42 p.m. UTC | #4
On 10/25/2013 09:25 AM, Tom Musta wrote:
> 
> I tried recoding xsmaddadp using float64_muladd.  The problem that I hit is the
> boundary case where the intermediate product and the summand are infinities of
> the opposite sign.  This is the case handled by the first "if" in the code
> snippet above.  PowerPC has a dedicated FPSCR bit for this type of condition
> (VXISI) as well as a general invalid operation bit (VX).  As far as I can tell,
> the softfloat code only has the equivalent of the VX bit.   Thus the
> implementation
> that I proposed is a more accurate representation of the Power ISA.
> 
> The VSX code was modeled after the existing fmadd FPU instruction.  I suspect
> the author of that code wrote it this way for similar reasons.
> 
> I am inclined to keep my proposed implementation, which is consistent with
> the existing PowerPC code.
> 
> Thoughts?

Hmm.  I won't object to your current implementation, since it does produce
correct results.

I believe that a better implementation could use float*_muladd, and check the
result for float_flag_invalid.  If set, compute the intermediate product so you
can figure out the VXISI setting.  But we'd expect that to be an unlikely path.


r~
Tom Musta Oct. 25, 2013, 5:13 p.m. UTC | #5
On 10/25/2013 11:42 AM, Richard Henderson wrote:
> I believe that a better implementation could use float*_muladd, and check the
> result for float_flag_invalid.  If set, compute the intermediate product so you
> can figure out the VXISI setting.  But we'd expect that to be an unlikely path.

Interesting thought.  I think I see a way to re-arrange the code.  Thanks, Richard.
Peter Maydell Oct. 25, 2013, 5:20 p.m. UTC | #6
On 25 October 2013 17:25, Tom Musta <tommusta@gmail.com> wrote:
> On 10/24/2013 3:38 PM, Richard Henderson wrote:
>> You want to be using tp##muladd instead of widening to 128 bits.
>
>
> I tried recoding xsmaddadp using float64_muladd.  The problem that I hit is
> the
> boundary case where the intermediate product and the summand are infinities
> of
> the opposite sign.  This is the case handled by the first "if" in the code
> snippet above.  PowerPC has a dedicated FPSCR bit for this type of condition
> (VXISI) as well as a general invalid operation bit (VX).  As far as I can
> tell,
> the softfloat code only has the equivalent of the VX bit.   Thus the
> implementation
> that I proposed is a more accurate representation of the Power ISA.

You could add the flag to the softfloat code -- this is what I did
for the somewhat ARM specific float_flag_output_denormal.

> The VSX code was modeled after the existing fmadd FPU instruction.  I
> suspect
> the author of that code wrote it this way for similar reasons.

I suspect it just predates the provision of fused multiply-add at
the softfloat level. It should ideally be rewritten to use the
softfloat functions.

Are you sure that doing the arithmetic with the softfloat 128 bit
float operations doesn't set the inexact flag anywhere it
shouldn't? (ie where the intermediate product is not exact in
128 bit format but the final result is exact in 64 or 32 bits).

-- PMM
Richard Henderson Oct. 25, 2013, 5:29 p.m. UTC | #7
On 10/25/2013 10:13 AM, Tom Musta wrote:
> On 10/25/2013 11:42 AM, Richard Henderson wrote:
>> I believe that a better implementation could use float*_muladd, and check the
>> result for float_flag_invalid.  If set, compute the intermediate product so you
>> can figure out the VXISI setting.  But we'd expect that to be an unlikely path.
> 
> Interesting thought.  I think I see a way to re-arrange the code.  Thanks,
> Richard.

Actually, you don't even have to compute the intermediate product.

The only way you can have VXISI for a*b+c is for

  isinf(c) && (isinf(a) || isinf(b))

since the intermediate product a*b is infinite precision, and thus cannot
overflow to inf unless one of the multiplicands is already inf.


r~
Richard Henderson Oct. 25, 2013, 5:34 p.m. UTC | #8
On 10/25/2013 10:20 AM, Peter Maydell wrote:
> Are you sure that doing the arithmetic with the softfloat 128 bit
> float operations doesn't set the inexact flag anywhere it
> shouldn't? (ie where the intermediate product is not exact in
> 128 bit format but the final result is exact in 64 or 32 bits).

The 128 bit multiply cannot given an inexact, and I believe that if the 128 bit
addition gives inexact then the 64-bit fma result would also have inexact.


r~
diff mbox

Patch

diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index 4e484a3..12e7abc 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -2178,3 +2178,109 @@  void helper_##op(CPUPPCState *env, uint32_t opcode)                     \
  VSX_TSQRT(xstsqrtdp, 1, float64, f64, -1022, 52)
  VSX_TSQRT(xvtsqrtdp, 2, float64, f64, -1022, 52)
  VSX_TSQRT(xvtsqrtsp, 4, float32, f32, -126, 23)
+
+/* VSX_MADD - VSX floating point muliply/add variations
+ *   op    - instruction mnemonic
+ *   nels  - number of elements (1, 2 or 4)
+ *   tp    - type (float32 or float64)
+ *   btp   - big (intermediate) type (float64 or float128)
+ *   fld   - vsr_t field (f32 or f64)
+ *   cmp   - comparision operation for testing INF - INF
+ *   sum   - sum operation (add or sub)
+ *   neg   - negate result (0 or 1)
+ *   afrm  - A form (1=A, 0=M)
+ *   sfprf - set FPRF
+ */
+#define VSX_MADD(op, nels, tp, btp, fld, cmp, sum, neg, afrm, sfprf)          \
+void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
+{                                                                             \
+    ppc_vsr_t xt, xa, xb;                                                     \
+    ppc_vsr_t *m, *s;                                                         \
+    int i;                                                                    \
+                                                                              \
+    if (afrm) {                                                               \
+        m = &xb;                                                              \
+        s = &xt;                                                              \
+    }                                                                         \
+    else {                                                                    \
+        m = &xt;                                                              \
+        s = &xb;                                                              \
+    }                                                                         \
+    getVSR(xA(opcode), &xa, env);                                             \
+    getVSR(xB(opcode), &xb, env);                                             \
+    getVSR(xT(opcode), &xt, env);                                             \
+                                                                              \
+    helper_reset_fpstatus(env);                                               \
+                                                                              \
+    for (i = 0; i < nels; i++) {                                              \
+        if (unlikely((tp##_is_infinity(xa.fld[i]) &&                          \
+                      tp##_is_zero(m->fld[i])) ||                             \
+                     (tp##_is_zero(xa.fld[i]) &&                              \
+                      tp##_is_infinity(m->fld[i])))) {                        \
+            xt.fld[i] = float64_to_##tp(                                      \
+                          fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXIMZ,   \
+                                              sfprf),                         \
+                          &env->fp_status);                                   \
+        } else {                                                              \
+            if (unlikely(tp##_is_signaling_nan(xa.fld[i]) ||                  \
+                         tp##_is_signaling_nan(m->fld[i]) ||                  \
+                         tp##_is_signaling_nan(s->fld[i]))) {                 \
+                fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, sfprf);    \
+            }                                                                 \
+            btp ft0, ft1;                                                     \
+                                                                              \
+            ft0 = tp##_to_##btp(xa.fld[i], &env->fp_status);                  \
+            ft1 = tp##_to_##btp(m->fld[i], &env->fp_status);                  \
+            ft0 = btp##_mul(ft0, ft1, &env->fp_status);                       \
+            if (unlikely(btp##_is_infinity(ft0) &&                            \
+                         tp##_is_infinity(s->fld[i]) &&                       \
+                         btp##_is_neg(ft0) cmp tp##_is_neg(s->fld[i]))) {     \
+                xt.fld[i] = float64_to_##tp(                                  \
+                              fload_invalid_op_excp(env,                      \
+                                                     POWERPC_EXCP_FP_VXISI,   \
+                                                     sfprf),                  \
+                              &env->fp_status);                               \
+            } else {                                                          \
+                ft1 = tp##_to_##btp(s->fld[i], &env->fp_status);              \
+                ft0 = btp##_##sum(ft0, ft1, &env->fp_status);                 \
+                xt.fld[i] = btp##_to_##tp(ft0, &env->fp_status);              \
+            }                                                                 \
+            if (neg && likely(!tp##_is_any_nan(xt.fld[i]))) {                 \
+                xt.fld[i] = tp##_chs(xt.fld[i]);                              \
+            }                                                                 \
+        }                                                                     \
+    }                                                                         \
+                                                                              \
+    putVSR(xT(opcode), &xt, env);                                             \
+    if (sfprf) {                                                              \
+        helper_compute_fprf(env, xt.fld[0], sfprf);                           \
+    }                                                                         \
+    helper_float_check_status(env);                                           \
+}
+
+VSX_MADD(xsmaddadp, 1, float64, float128, f64, !=, add, 0, 1, 1)
+VSX_MADD(xsmaddmdp, 1, float64, float128, f64, !=, add, 0, 0, 1)
+VSX_MADD(xsmsubadp, 1, float64, float128, f64, ==, sub, 0, 1, 1)
+VSX_MADD(xsmsubmdp, 1, float64, float128, f64, ==, sub, 0, 0, 1)
+VSX_MADD(xsnmaddadp, 1, float64, float128, f64, !=, add, 1, 1, 1)
+VSX_MADD(xsnmaddmdp, 1, float64, float128, f64, !=, add, 1, 0, 1)
+VSX_MADD(xsnmsubadp, 1, float64, float128, f64, ==, sub, 1, 1, 1)
+VSX_MADD(xsnmsubmdp, 1, float64, float128, f64, ==, sub, 1, 0, 1)
+
+VSX_MADD(xvmaddadp, 2, float64, float128, f64, !=, add, 0, 1, 0)
+VSX_MADD(xvmaddmdp, 2, float64, float128, f64, !=, add, 0, 0, 0)
+VSX_MADD(xvmsubadp, 2, float64, float128, f64, ==, sub, 0, 1, 0)
+VSX_MADD(xvmsubmdp, 2, float64, float128, f64, ==, sub, 0, 0, 0)
+VSX_MADD(xvnmaddadp, 2, float64, float128, f64, !=, add, 1, 1, 0)
+VSX_MADD(xvnmaddmdp, 2, float64, float128, f64, !=, add, 1, 0, 0)
+VSX_MADD(xvnmsubadp, 2, float64, float128, f64, ==, sub, 1, 1, 0)
+VSX_MADD(xvnmsubmdp, 2, float64, float128, f64, ==, sub, 1, 0, 0)
+
+VSX_MADD(xvmaddasp, 4, float32, float64, f32, !=, add, 0, 1, 0)
+VSX_MADD(xvmaddmsp, 4, float32, float64, f32, !=, add, 0, 0, 0)
+VSX_MADD(xvmsubasp, 4, float32, float64, f32, ==, sub, 0, 1, 0)
+VSX_MADD(xvmsubmsp, 4, float32, float64, f32, ==, sub, 0, 0, 0)
+VSX_MADD(xvnmaddasp, 4, float32, float64, f32, !=, add, 1, 1, 0)
+VSX_MADD(xvnmaddmsp, 4, float32, float64, f32, !=, add, 1, 0, 0)
+VSX_MADD(xvnmsubasp, 4, float32, float64, f32, ==, sub, 1, 1, 0)
+VSX_MADD(xvnmsubmsp, 4, float32, float64, f32, ==, sub, 1, 0, 0)
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index e1abada..15f1b95 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -260,6 +260,14 @@  DEF_HELPER_2(xssqrtdp, void, env, i32)
  DEF_HELPER_2(xsrsqrtedp, void, env, i32)
  DEF_HELPER_2(xstdivdp, void, env, i32)
  DEF_HELPER_2(xstsqrtdp, void, env, i32)
+DEF_HELPER_2(xsmaddadp, void, env, i32)
+DEF_HELPER_2(xsmaddmdp, void, env, i32)
+DEF_HELPER_2(xsmsubadp, void, env, i32)
+DEF_HELPER_2(xsmsubmdp, void, env, i32)
+DEF_HELPER_2(xsnmaddadp, void, env, i32)
+DEF_HELPER_2(xsnmaddmdp, void, env, i32)
+DEF_HELPER_2(xsnmsubadp, void, env, i32)
+DEF_HELPER_2(xsnmsubmdp, void, env, i32)

  DEF_HELPER_2(xvadddp, void, env, i32)
  DEF_HELPER_2(xvsubdp, void, env, i32)
@@ -270,6 +278,14 @@  DEF_HELPER_2(xvsqrtdp, void, env, i32)
  DEF_HELPER_2(xvrsqrtedp, void, env, i32)
  DEF_HELPER_2(xvtdivdp, void, env, i32)
  DEF_HELPER_2(xvtsqrtdp, void, env, i32)
+DEF_HELPER_2(xvmaddadp, void, env, i32)
+DEF_HELPER_2(xvmaddmdp, void, env, i32)
+DEF_HELPER_2(xvmsubadp, void, env, i32)
+DEF_HELPER_2(xvmsubmdp, void, env, i32)
+DEF_HELPER_2(xvnmaddadp, void, env, i32)
+DEF_HELPER_2(xvnmaddmdp, void, env, i32)
+DEF_HELPER_2(xvnmsubadp, void, env, i32)
+DEF_HELPER_2(xvnmsubmdp, void, env, i32)

  DEF_HELPER_2(xvaddsp, void, env, i32)
  DEF_HELPER_2(xvsubsp, void, env, i32)
@@ -280,6 +296,14 @@  DEF_HELPER_2(xvsqrtsp, void, env, i32)
  DEF_HELPER_2(xvrsqrtesp, void, env, i32)
  DEF_HELPER_2(xvtdivsp, void, env, i32)
  DEF_HELPER_2(xvtsqrtsp, void, env, i32)
+DEF_HELPER_2(xvmaddasp, void, env, i32)
+DEF_HELPER_2(xvmaddmsp, void, env, i32)
+DEF_HELPER_2(xvmsubasp, void, env, i32)
+DEF_HELPER_2(xvmsubmsp, void, env, i32)
+DEF_HELPER_2(xvnmaddasp, void, env, i32)
+DEF_HELPER_2(xvnmaddmsp, void, env, i32)
+DEF_HELPER_2(xvnmsubasp, void, env, i32)
+DEF_HELPER_2(xvnmsubmsp, void, env, i32)

  DEF_HELPER_2(efscfsi, i32, env, i32)
  DEF_HELPER_2(efscfui, i32, env, i32)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 6978fe0..3783e94 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7302,6 +7302,14 @@  GEN_VSX_HELPER_2(xssqrtdp, 0x16, 0x04, 0, PPC2_VSX)
  GEN_VSX_HELPER_2(xsrsqrtedp, 0x14, 0x04, 0, PPC2_VSX)
  GEN_VSX_HELPER_2(xstdivdp, 0x14, 0x07, 0, PPC2_VSX)
  GEN_VSX_HELPER_2(xstsqrtdp, 0x14, 0x06, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xsmaddadp, 0x04, 0x04, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xsmaddmdp, 0x04, 0x05, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xsmsubadp, 0x04, 0x06, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xsmsubmdp, 0x04, 0x07, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xsnmaddadp, 0x04, 0x14, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xsnmaddmdp, 0x04, 0x15, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xsnmsubadp, 0x04, 0x16, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xsnmsubmdp, 0x04, 0x17, 0, PPC2_VSX)

  GEN_VSX_HELPER_2(xvadddp, 0x00, 0x0C, 0, PPC2_VSX)
  GEN_VSX_HELPER_2(xvsubdp, 0x00, 0x0D, 0, PPC2_VSX)
@@ -7312,6 +7320,14 @@  GEN_VSX_HELPER_2(xvsqrtdp, 0x16, 0x0C, 0, PPC2_VSX)
  GEN_VSX_HELPER_2(xvrsqrtedp, 0x14, 0x0C, 0, PPC2_VSX)
  GEN_VSX_HELPER_2(xvtdivdp, 0x14, 0x0F, 0, PPC2_VSX)
  GEN_VSX_HELPER_2(xvtsqrtdp, 0x14, 0x0E, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvmaddadp, 0x04, 0x0C, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvmaddmdp, 0x04, 0x0D, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvmsubadp, 0x04, 0x0E, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvmsubmdp, 0x04, 0x0F, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvnmaddadp, 0x04, 0x1C, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvnmaddmdp, 0x04, 0x1D, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvnmsubadp, 0x04, 0x1E, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvnmsubmdp, 0x04, 0x1F, 0, PPC2_VSX)

  GEN_VSX_HELPER_2(xvaddsp, 0x00, 0x08, 0, PPC2_VSX)
  GEN_VSX_HELPER_2(xvsubsp, 0x00, 0x09, 0, PPC2_VSX)
@@ -7322,6 +7338,14 @@  GEN_VSX_HELPER_2(xvsqrtsp, 0x16, 0x08, 0, PPC2_VSX)
  GEN_VSX_HELPER_2(xvrsqrtesp, 0x14, 0x08, 0, PPC2_VSX)
  GEN_VSX_HELPER_2(xvtdivsp, 0x14, 0x0B, 0, PPC2_VSX)
  GEN_VSX_HELPER_2(xvtsqrtsp, 0x14, 0x0A, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvmaddasp, 0x04, 0x08, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvmaddmsp, 0x04, 0x09, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvmsubasp, 0x04, 0x0A, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvmsubmsp, 0x04, 0x0B, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvnmaddasp, 0x04, 0x18, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvnmaddmsp, 0x04, 0x19, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvnmsubasp, 0x04, 0x1A, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xvnmsubmsp, 0x04, 0x1B, 0, PPC2_VSX)

  #define VSX_LOGICAL(name, tcg_op)                                    \
  static void glue(gen_, name)(DisasContext * ctx)                     \
@@ -10014,6 +10038,14 @@  GEN_XX2FORM(xssqrtdp,  0x16, 0x04, PPC2_VSX),
  GEN_XX2FORM(xsrsqrtedp,  0x14, 0x04, PPC2_VSX),
  GEN_XX3FORM(xstdivdp,  0x14, 0x07, PPC2_VSX),
  GEN_XX2FORM(xstsqrtdp,  0x14, 0x06, PPC2_VSX),
+GEN_XX3FORM(xsmaddadp, 0x04, 0x04, PPC2_VSX),
+GEN_XX3FORM(xsmaddmdp, 0x04, 0x05, PPC2_VSX),
+GEN_XX3FORM(xsmsubadp, 0x04, 0x06, PPC2_VSX),
+GEN_XX3FORM(xsmsubmdp, 0x04, 0x07, PPC2_VSX),
+GEN_XX3FORM(xsnmaddadp, 0x04, 0x14, PPC2_VSX),
+GEN_XX3FORM(xsnmaddmdp, 0x04, 0x15, PPC2_VSX),
+GEN_XX3FORM(xsnmsubadp, 0x04, 0x16, PPC2_VSX),
+GEN_XX3FORM(xsnmsubmdp, 0x04, 0x17, PPC2_VSX),

  GEN_XX3FORM(xvadddp, 0x00, 0x0C, PPC2_VSX),
  GEN_XX3FORM(xvsubdp, 0x00, 0x0D, PPC2_VSX),
@@ -10024,6 +10056,14 @@  GEN_XX2FORM(xvsqrtdp,  0x16, 0x0C, PPC2_VSX),
  GEN_XX2FORM(xvrsqrtedp,  0x14, 0x0C, PPC2_VSX),
  GEN_XX3FORM(xvtdivdp, 0x14, 0x0F, PPC2_VSX),
  GEN_XX2FORM(xvtsqrtdp, 0x14, 0x0E, PPC2_VSX),
+GEN_XX3FORM(xvmaddadp, 0x04, 0x0C, PPC2_VSX),
+GEN_XX3FORM(xvmaddmdp, 0x04, 0x0D, PPC2_VSX),
+GEN_XX3FORM(xvmsubadp, 0x04, 0x0E, PPC2_VSX),
+GEN_XX3FORM(xvmsubmdp, 0x04, 0x0F, PPC2_VSX),
+GEN_XX3FORM(xvnmaddadp, 0x04, 0x1C, PPC2_VSX),
+GEN_XX3FORM(xvnmaddmdp, 0x04, 0x1D, PPC2_VSX),
+GEN_XX3FORM(xvnmsubadp, 0x04, 0x1E, PPC2_VSX),
+GEN_XX3FORM(xvnmsubmdp, 0x04, 0x1F, PPC2_VSX),

  GEN_XX3FORM(xvaddsp, 0x00, 0x08, PPC2_VSX),
  GEN_XX3FORM(xvsubsp, 0x00, 0x09, PPC2_VSX),
@@ -10034,6 +10074,14 @@  GEN_XX2FORM(xvsqrtsp, 0x16, 0x08, PPC2_VSX),
  GEN_XX2FORM(xvrsqrtesp, 0x14, 0x08, PPC2_VSX),
  GEN_XX3FORM(xvtdivsp, 0x14, 0x0B, PPC2_VSX),
  GEN_XX2FORM(xvtsqrtsp, 0x14, 0x0A, PPC2_VSX),
+GEN_XX3FORM(xvmaddasp, 0x04, 0x08, PPC2_VSX),
+GEN_XX3FORM(xvmaddmsp, 0x04, 0x09, PPC2_VSX),
+GEN_XX3FORM(xvmsubasp, 0x04, 0x0A, PPC2_VSX),
+GEN_XX3FORM(xvmsubmsp, 0x04, 0x0B, PPC2_VSX),
+GEN_XX3FORM(xvnmaddasp, 0x04, 0x18, PPC2_VSX),
+GEN_XX3FORM(xvnmaddmsp, 0x04, 0x19, PPC2_VSX),
+GEN_XX3FORM(xvnmsubasp, 0x04, 0x1A, PPC2_VSX),
+GEN_XX3FORM(xvnmsubmsp, 0x04, 0x1B, PPC2_VSX),

  #undef VSX_LOGICAL
  #define VSX_LOGICAL(name, opc2, opc3, fl2) \