Patchwork target-arm: fix SMMLA/SMMLS instructions

login
register
mail settings
Submitter Aurelien Jarno
Date Jan. 1, 2011, 6:25 p.m.
Message ID <1293906328-8984-1-git-send-email-aurelien@aurel32.net>
Download mbox | patch
Permalink /patch/77150/
State New
Headers show

Comments

Aurelien Jarno - Jan. 1, 2011, 6:25 p.m.
SMMLA and SMMLS are broken on both in normal and thumb mode, that is
both (different) implementations are wrong. They try to avoid a 64-bit
add for the rounding, which is not trivial if you want to support both
SMMLA and SMMLS with the same code.

The code below uses the same implementation for both modes, using the
code from the ARM manual. It also fixes the thumb decoding that was a
mix between normal and thumb mode.

This fixes the issues reported in
https://bugs.launchpad.net/qemu/+bug/629298

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-arm/translate.c |   96 +++++++++++++++++++++++++----------------------
 1 files changed, 51 insertions(+), 45 deletions(-)
Peter Maydell - Jan. 5, 2011, 11:15 a.m.
On 1 January 2011 18:25, Aurelien Jarno <aurelien@aurel32.net> wrote:
> SMMLA and SMMLS are broken on both in normal and thumb mode, that is
> both (different) implementations are wrong. They try to avoid a 64-bit
> add for the rounding, which is not trivial if you want to support both
> SMMLA and SMMLS with the same code.
>
> The code below uses the same implementation for both modes, using the
> code from the ARM manual. It also fixes the thumb decoding that was a
> mix between normal and thumb mode.
>
> This fixes the issues reported in
> https://bugs.launchpad.net/qemu/+bug/629298

I've tested this patch with my random-sequence-generator for
SMMLA/SMMLS/SMMUL for ARM and Thumb, and it does fix
the bug. I have a few minor nitpicks about some comments, though.

> -/* Round the top 32 bits of a 64-bit value.  */
> -static void gen_roundqd(TCGv a, TCGv b)
> +/* Add a to the msw of b. Mark inputs as dead */
> +static TCGv_i64 gen_addq_msw(TCGv_i64 a, TCGv b)
>  {
> -    tcg_gen_shri_i32(a, a, 31);
> -    tcg_gen_add_i32(a, a, b);
> +    TCGv_i64 tmp64 = tcg_temp_new_i64();
> +
> +    tcg_gen_extu_i32_i64(tmp64, b);
> +    dead_tmp(b);
> +    tcg_gen_shli_i64(tmp64, tmp64, 32);
> +    tcg_gen_add_i64(a, tmp64, a);
> +
> +    tcg_temp_free_i64(tmp64);
> +    return a;
> +}

Isn't this adding b to the msw of a, rather than the other
way round as the comment claims?

> +/* Subtract a from the msw of b. Mark inputs as dead. */

Ditto.

> @@ -6953,23 +6958,25 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
>                     tmp = load_reg(s, rm);
>                     tmp2 = load_reg(s, rs);
>                     if (insn & (1 << 20)) {
> -                        /* Signed multiply most significant [accumulate].  */
> +                        /* Signed multiply most significant [accumulate].
> +                           (SMMUL, SMLA, SMMLS) */

SMMLA, not SMLA.

-- PMM
Aurelien Jarno - Jan. 6, 2011, 3:50 p.m.
On Wed, Jan 05, 2011 at 11:15:15AM +0000, Peter Maydell wrote:
> On 1 January 2011 18:25, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > SMMLA and SMMLS are broken on both in normal and thumb mode, that is
> > both (different) implementations are wrong. They try to avoid a 64-bit
> > add for the rounding, which is not trivial if you want to support both
> > SMMLA and SMMLS with the same code.
> >
> > The code below uses the same implementation for both modes, using the
> > code from the ARM manual. It also fixes the thumb decoding that was a
> > mix between normal and thumb mode.
> >
> > This fixes the issues reported in
> > https://bugs.launchpad.net/qemu/+bug/629298
> 
> I've tested this patch with my random-sequence-generator for
> SMMLA/SMMLS/SMMUL for ARM and Thumb, and it does fix
> the bug. I have a few minor nitpicks about some comments, though.
> 
> > -/* Round the top 32 bits of a 64-bit value.  */
> > -static void gen_roundqd(TCGv a, TCGv b)
> > +/* Add a to the msw of b. Mark inputs as dead */
> > +static TCGv_i64 gen_addq_msw(TCGv_i64 a, TCGv b)
> >  {
> > -    tcg_gen_shri_i32(a, a, 31);
> > -    tcg_gen_add_i32(a, a, b);
> > +    TCGv_i64 tmp64 = tcg_temp_new_i64();
> > +
> > +    tcg_gen_extu_i32_i64(tmp64, b);
> > +    dead_tmp(b);
> > +    tcg_gen_shli_i64(tmp64, tmp64, 32);
> > +    tcg_gen_add_i64(a, tmp64, a);
> > +
> > +    tcg_temp_free_i64(tmp64);
> > +    return a;
> > +}
> 
> Isn't this adding b to the msw of a, rather than the other
> way round as the comment claims?

I think the comment is actually wrong in both way, as a shift is
applied, and thus lsw of b is used as the msw in the addition.
What about "Add a to (b << 32). Mark inputs as dead."?

> > +/* Subtract a from the msw of b. Mark inputs as dead. */
> 
> Ditto.

What about "subtract a from (b << 32). Mark inputs as dead.".

> > @@ -6953,23 +6958,25 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
> >                     tmp = load_reg(s, rm);
> >                     tmp2 = load_reg(s, rs);
> >                     if (insn & (1 << 20)) {
> > -                        /* Signed multiply most significant [accumulate].  */
> > +                        /* Signed multiply most significant [accumulate].
> > +                           (SMMUL, SMLA, SMMLS) */
> 
> SMMLA, not SMLA.
> 

I'll fix that in the next version.

Thanks for the review.
Peter Maydell - Jan. 6, 2011, 3:54 p.m.
On 6 January 2011 15:50, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On Wed, Jan 05, 2011 at 11:15:15AM +0000, Peter Maydell wrote:

>> Isn't this adding b to the msw of a, rather than the other
>> way round as the comment claims?
>
> I think the comment is actually wrong in both way, as a shift is
> applied, and thus lsw of b is used as the msw in the addition.

We add the whole of b, not the lsw of b, because it's only
32 bits to start with (ie "lsw of b" is a longwinded way of
saying "b").

> What about "Add a to (b << 32). Mark inputs as dead."?

To me "Add x to y" means "y = y + x". In this case that would
mean  "(b << 32) = (b << 32) + a", which is nonsensical.
"Add (b << 32) to a" or equivalently "add b to the msw of a"
makes more sense to me.

-- PMM
Aurelien Jarno - Jan. 6, 2011, 5:24 p.m.
On Thu, Jan 06, 2011 at 03:54:46PM +0000, Peter Maydell wrote:
> On 6 January 2011 15:50, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > On Wed, Jan 05, 2011 at 11:15:15AM +0000, Peter Maydell wrote:
> 
> >> Isn't this adding b to the msw of a, rather than the other
> >> way round as the comment claims?
> >
> > I think the comment is actually wrong in both way, as a shift is
> > applied, and thus lsw of b is used as the msw in the addition.
> 
> We add the whole of b, not the lsw of b, because it's only
> 32 bits to start with (ie "lsw of b" is a longwinded way of
> saying "b").
> 
> > What about "Add a to (b << 32). Mark inputs as dead."?
> 
> To me "Add x to y" means "y = y + x". In this case that would
> mean  "(b << 32) = (b << 32) + a", which is nonsensical.
> "Add (b << 32) to a" or equivalently "add b to the msw of a"
> makes more sense to me.
> 

Ok, will use that in the next version.

For the subtraction, how would you say a = (b << 32) - a ?
Peter Maydell - Jan. 6, 2011, 6:09 p.m.
On 6 January 2011 17:24, Aurelien Jarno <aurelien@aurel32.net> wrote:
> For the subtraction, how would you say a = (b << 32) - a ?

I think we should just say "Return (b << 32) - a" for that :-)
I can't think of a clean way of putting it in English.

-- PMM

Patch

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 2598268..3b30b66 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -287,11 +287,32 @@  static void gen_bfi(TCGv dest, TCGv base, TCGv val, int shift, uint32_t mask)
     tcg_gen_or_i32(dest, base, val);
 }
 
-/* Round the top 32 bits of a 64-bit value.  */
-static void gen_roundqd(TCGv a, TCGv b)
+/* Add a to the msw of b. Mark inputs as dead */
+static TCGv_i64 gen_addq_msw(TCGv_i64 a, TCGv b)
 {
-    tcg_gen_shri_i32(a, a, 31);
-    tcg_gen_add_i32(a, a, b);
+    TCGv_i64 tmp64 = tcg_temp_new_i64();
+
+    tcg_gen_extu_i32_i64(tmp64, b);
+    dead_tmp(b);
+    tcg_gen_shli_i64(tmp64, tmp64, 32);
+    tcg_gen_add_i64(a, tmp64, a);
+
+    tcg_temp_free_i64(tmp64);
+    return a;
+}
+
+/* Subtract a from the msw of b. Mark inputs as dead. */
+static TCGv_i64 gen_subq_msw(TCGv_i64 a, TCGv b)
+{
+    TCGv_i64 tmp64 = tcg_temp_new_i64();
+
+    tcg_gen_extu_i32_i64(tmp64, b);
+    dead_tmp(b);
+    tcg_gen_shli_i64(tmp64, tmp64, 32);
+    tcg_gen_sub_i64(a, tmp64, a);
+
+    tcg_temp_free_i64(tmp64);
+    return a;
 }
 
 /* FIXME: Most targets have native widening multiplication.
@@ -325,22 +346,6 @@  static TCGv_i64 gen_muls_i64_i32(TCGv a, TCGv b)
     return tmp1;
 }
 
-/* Signed 32x32->64 multiply.  */
-static void gen_imull(TCGv a, TCGv b)
-{
-    TCGv_i64 tmp1 = tcg_temp_new_i64();
-    TCGv_i64 tmp2 = tcg_temp_new_i64();
-
-    tcg_gen_ext_i32_i64(tmp1, a);
-    tcg_gen_ext_i32_i64(tmp2, b);
-    tcg_gen_mul_i64(tmp1, tmp1, tmp2);
-    tcg_temp_free_i64(tmp2);
-    tcg_gen_trunc_i64_i32(a, tmp1);
-    tcg_gen_shri_i64(tmp1, tmp1, 32);
-    tcg_gen_trunc_i64_i32(b, tmp1);
-    tcg_temp_free_i64(tmp1);
-}
-
 /* Swap low and high halfwords.  */
 static void gen_swap_half(TCGv var)
 {
@@ -6953,23 +6958,25 @@  static void disas_arm_insn(CPUState * env, DisasContext *s)
                     tmp = load_reg(s, rm);
                     tmp2 = load_reg(s, rs);
                     if (insn & (1 << 20)) {
-                        /* Signed multiply most significant [accumulate].  */
+                        /* Signed multiply most significant [accumulate].
+                           (SMMUL, SMLA, SMMLS) */
                         tmp64 = gen_muls_i64_i32(tmp, tmp2);
-                        if (insn & (1 << 5))
-                            tcg_gen_addi_i64(tmp64, tmp64, 0x80000000u);
-                        tcg_gen_shri_i64(tmp64, tmp64, 32);
-                        tmp = new_tmp();
-                        tcg_gen_trunc_i64_i32(tmp, tmp64);
-                        tcg_temp_free_i64(tmp64);
+
                         if (rd != 15) {
-                            tmp2 = load_reg(s, rd);
+                            tmp = load_reg(s, rd);
                             if (insn & (1 << 6)) {
-                                tcg_gen_sub_i32(tmp, tmp, tmp2);
+                                tmp64 = gen_subq_msw(tmp64, tmp);
                             } else {
-                                tcg_gen_add_i32(tmp, tmp, tmp2);
+                                tmp64 = gen_addq_msw(tmp64, tmp);
                             }
-                            dead_tmp(tmp2);
                         }
+                        if (insn & (1 << 5)) {
+                            tcg_gen_addi_i64(tmp64, tmp64, 0x80000000u);
+                        }
+                        tcg_gen_shri_i64(tmp64, tmp64, 32);
+                        tmp = new_tmp();
+                        tcg_gen_trunc_i64_i32(tmp, tmp64);
+                        tcg_temp_free_i64(tmp64);
                         store_reg(s, rn, tmp);
                     } else {
                         if (insn & (1 << 5))
@@ -7840,24 +7847,23 @@  static int disas_thumb2_insn(CPUState *env, DisasContext *s, uint16_t insn_hw1)
                     dead_tmp(tmp2);
                   }
                 break;
-            case 5: case 6: /* 32 * 32 -> 32msb */
-                gen_imull(tmp, tmp2);
-                if (insn & (1 << 5)) {
-                    gen_roundqd(tmp, tmp2);
-                    dead_tmp(tmp2);
-                } else {
-                    dead_tmp(tmp);
-                    tmp = tmp2;
-                }
+            case 5: case 6: /* 32 * 32 -> 32msb (SMMUL, SMMLA, SMMLS) */
+                tmp64 = gen_muls_i64_i32(tmp, tmp2);
                 if (rs != 15) {
-                    tmp2 = load_reg(s, rs);
-                    if (insn & (1 << 21)) {
-                        tcg_gen_add_i32(tmp, tmp, tmp2);
+                    tmp = load_reg(s, rs);
+                    if (insn & (1 << 20)) {
+                        tmp64 = gen_addq_msw(tmp64, tmp);
                     } else {
-                        tcg_gen_sub_i32(tmp, tmp2, tmp);
+                        tmp64 = gen_subq_msw(tmp64, tmp);
                     }
-                    dead_tmp(tmp2);
                 }
+                if (insn & (1 << 4)) {
+                    tcg_gen_addi_i64(tmp64, tmp64, 0x80000000u);
+                }
+                tcg_gen_shri_i64(tmp64, tmp64, 32);
+                tmp = new_tmp();
+                tcg_gen_trunc_i64_i32(tmp, tmp64);
+                tcg_temp_free_i64(tmp64);
                 break;
             case 7: /* Unsigned sum of absolute differences.  */
                 gen_helper_usad8(tmp, tmp, tmp2);