diff mbox series

[v2,1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host

Message ID 1553525566-14913-2-git-send-email-mateja.marjanovic@rt-rk.com
State New
Headers show
Series target/mips: Add support for MSA instructions on a big endian host | expand

Commit Message

Mateja Marjanovic March 25, 2019, 2:52 p.m. UTC
From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>

Load and store MSA instructions when executed on a MIPS
big endian CPU, didn't change the endianness, and were
behaving like on little endian.

Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
---
 target/mips/op_helper.c | 79 ++++++++++++++++++++++++++++++++++---------------
 1 file changed, 55 insertions(+), 24 deletions(-)

Comments

Aleksandar Markovic March 25, 2019, 9:21 p.m. UTC | #1
> From: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
> Subject: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host

Please split this patch into one for load instructions and another for
store instructions.

I don't think the variable "big_endian" is needed, you should better resolve
all differences wrt endian via "#if defined(HOST_WORDS_BIGENDIAN)",
like you did in other patches. It is important to be consistent.

Thanks,
Aleksandar
Mateja Marjanovic March 26, 2019, 6:33 a.m. UTC | #2
On 25.3.19. 22:21, Aleksandar Markovic wrote:
>> From: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
>> Subject: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host
> Please split this patch into one for load instructions and another for
> store instructions.
I will do that.
>
> I don't think the variable "big_endian" is needed, you should better resolve
> all differences wrt endian via "#if defined(HOST_WORDS_BIGENDIAN)",
> like you did in other patches. It is important to be consistent.

I think you can't have a preprocessing statement inside a macro,
so I did the thing that seemed most similar to that.

>
> Thanks,
> Aleksandar
Regards,
Mateja
Aleksandar Markovic March 26, 2019, 11:06 a.m. UTC | #3
> From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>
> Subject: Re: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host
> On 25.3.19. 22:21, Aleksandar Markovic wrote:
>> From: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
>> Subject: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host
>> Please split this patch into one for load instructions and another for
>> store instructions.
>
> I will do that.
>
>> I don't think the variable "big_endian" is needed, you should better resolve
>> all differences wrt endian via "#if defined(HOST_WORDS_BIGENDIAN)",
>> like you did in other patches. It is important to be consistent.
>
> I think you can't have a preprocessing statement inside a macro,
> so I did the thing that seemed most similar to that.
>

There should be some equivalent way of doing that involving an argument
to the macro, however, in the light of your recent work on MSA optimization,
let's split/demacro and unroll loops in load and store helpers.

After splitting helper for LD.B should look something like this:
(the code is just for demonstration purposes, doublecheck it)

void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr)
{
    wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
    wr_t wx;
    int i;

    MEMOP_IDX(DF_BYTE)
    for (i = 0; i < DF_ELEMENTS(DF_BYTE); i++) {
#if !defined(CONFIG_USER_ONLY)
        wx.b[i] = helper_ret_ldub_mmu(env, addr + (i << DF_BYTE), oi, GETPC());
#else
        wx.b[i] = cpu_ldub_data(env, addr + (i << DF_BYTE));
#endif
    }
    memcpy(pwd, &wx, sizeof(wr_t));
}

After unrolling:

void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr)
{
    wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
    wr_t wx;
    int i;

    MEMOP_IDX(DF_BYTE)

#if !defined(CONFIG_USER_ONLY)
    wx.b[0]  = helper_ret_ldub_mmu(env, addr, oi, GETPC());
    wx.b[1]  = helper_ret_ldub_mmu(env, addr + 1, oi, GETPC());
    wx.b[2]  = helper_ret_ldub_mmu(env, addr + 2, oi, GETPC());
    wx.b[3]  = helper_ret_ldub_mmu(env, addr + 3, oi, GETPC());
    wx.b[4]  = helper_ret_ldub_mmu(env, addr + 4, oi, GETPC());
    wx.b[5]  = helper_ret_ldub_mmu(env, addr + 5, oi, GETPC());
    wx.b[6]  = helper_ret_ldub_mmu(env, addr + 6, oi, GETPC());
    wx.b[7]  = helper_ret_ldub_mmu(env, addr + 7, oi, GETPC());
    wx.b[8]  = helper_ret_ldub_mmu(env, addr + 8, oi, GETPC());
    wx.b[9]  = helper_ret_ldub_mmu(env, addr + 9, oi, GETPC());
    wx.b[10] = helper_ret_ldub_mmu(env, addr + 10, oi, GETPC());
    wx.b[11] = helper_ret_ldub_mmu(env, addr + 11, oi, GETPC());
    wx.b[12] = helper_ret_ldub_mmu(env, addr + 12, oi, GETPC());
    wx.b[13] = helper_ret_ldub_mmu(env, addr + 13, oi, GETPC());
    wx.b[14] = helper_ret_ldub_mmu(env, addr + 14, oi, GETPC());
    wx.b[15] = helper_ret_ldub_mmu(env, addr + 15, oi, GETPC());
#else
    wx.b[0]  = cpu_ldub_data(env, addr);
    wx.b[1]  = cpu_ldub_data(env, addr + 1);
    wx.b[2]  = cpu_ldub_data(env, addr + 2);
    wx.b[3]  = cpu_ldub_data(env, addr + 3);
    wx.b[4]  = cpu_ldub_data(env, addr + 4);
    wx.b[5]  = cpu_ldub_data(env, addr + 5);
    wx.b[6]  = cpu_ldub_data(env, addr + 6);
    wx.b[7]  = cpu_ldub_data(env, addr + 7);
    wx.b[8]  = cpu_ldub_data(env, addr + 8);
    wx.b[9]  = cpu_ldub_data(env, addr + 9);
    wx.b[10] = cpu_ldub_data(env, addr + 10);
    wx.b[11] = cpu_ldub_data(env, addr + 11);
    wx.b[12] = cpu_ldub_data(env, addr + 12);
    wx.b[13] = cpu_ldub_data(env, addr + 13);
    wx.b[14] = cpu_ldub_data(env, addr + 14);
    wx.b[15] = cpu_ldub_data(env, addr + 15);
#endif
    }
    memcpy(pwd, &wx, sizeof(wr_t));
}


And then apply big-endian-related changes.

I think that would be more in the spirit of other MSA work.

Thanks,
Aleksandar
Aleksandar Markovic March 26, 2019, 11:26 a.m. UTC | #4
> From: Aleksandar Markovic
> Subject: Re: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host
> 
> > From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>
> > Subject: Re: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host
> > On 25.3.19. 22:21, Aleksandar Markovic wrote:
> >> From: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
> >> Subject: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host
> >> Please split this patch into one for load instructions and another for
> >> store instructions.
> >
> > I will do that.
> >
> >> I don't think the variable "big_endian" is needed, you should better resolve
> >> all differences wrt endian via "#if defined(HOST_WORDS_BIGENDIAN)",
> >> like you did in other patches. It is important to be consistent.
> >
> > I think you can't have a preprocessing statement inside a macro,
> > so I did the thing that seemed most similar to that.
> >
> 
> There should be some equivalent way of doing that involving an argument
> to the macro, however, in the light of your recent work on MSA optimization,
> let's split/demacro and unroll loops in load and store helpers.
> 
> After splitting helper for LD.B should look something like this:
> (the code is just for demonstration purposes, doublecheck it)
> 
> void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr)
> {
>     wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
>     wr_t wx;
>     int i;
> 
>     MEMOP_IDX(DF_BYTE)
>     for (i = 0; i < DF_ELEMENTS(DF_BYTE); i++) {
> #if !defined(CONFIG_USER_ONLY)
>         wx.b[i] = helper_ret_ldub_mmu(env, addr + (i << DF_BYTE), oi, GETPC());
> #else
>         wx.b[i] = cpu_ldub_data(env, addr + (i << DF_BYTE));
> #endif
>     }
>     memcpy(pwd, &wx, sizeof(wr_t));
> }
> 
> After unrolling:
> 
> void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr)
> {
>     wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
>     wr_t wx;
>     int i;
> 
>     MEMOP_IDX(DF_BYTE)
> 
> #if !defined(CONFIG_USER_ONLY)
>     wx.b[0]  = helper_ret_ldub_mmu(env, addr, oi, GETPC());
>     wx.b[1]  = helper_ret_ldub_mmu(env, addr + 1, oi, GETPC());
>     wx.b[2]  = helper_ret_ldub_mmu(env, addr + 2, oi, GETPC());
>     wx.b[3]  = helper_ret_ldub_mmu(env, addr + 3, oi, GETPC());
>     wx.b[4]  = helper_ret_ldub_mmu(env, addr + 4, oi, GETPC());
>     wx.b[5]  = helper_ret_ldub_mmu(env, addr + 5, oi, GETPC());
>     wx.b[6]  = helper_ret_ldub_mmu(env, addr + 6, oi, GETPC());
>     wx.b[7]  = helper_ret_ldub_mmu(env, addr + 7, oi, GETPC());
>     wx.b[8]  = helper_ret_ldub_mmu(env, addr + 8, oi, GETPC());
>     wx.b[9]  = helper_ret_ldub_mmu(env, addr + 9, oi, GETPC());
>     wx.b[10] = helper_ret_ldub_mmu(env, addr + 10, oi, GETPC());
>     wx.b[11] = helper_ret_ldub_mmu(env, addr + 11, oi, GETPC());
>     wx.b[12] = helper_ret_ldub_mmu(env, addr + 12, oi, GETPC());
>     wx.b[13] = helper_ret_ldub_mmu(env, addr + 13, oi, GETPC());
>     wx.b[14] = helper_ret_ldub_mmu(env, addr + 14, oi, GETPC());
>     wx.b[15] = helper_ret_ldub_mmu(env, addr + 15, oi, GETPC());
> #else
>     wx.b[0]  = cpu_ldub_data(env, addr);
>     wx.b[1]  = cpu_ldub_data(env, addr + 1);
>     wx.b[2]  = cpu_ldub_data(env, addr + 2);
>     wx.b[3]  = cpu_ldub_data(env, addr + 3);
>     wx.b[4]  = cpu_ldub_data(env, addr + 4);
>     wx.b[5]  = cpu_ldub_data(env, addr + 5);
>     wx.b[6]  = cpu_ldub_data(env, addr + 6);
>     wx.b[7]  = cpu_ldub_data(env, addr + 7);
>     wx.b[8]  = cpu_ldub_data(env, addr + 8);
>     wx.b[9]  = cpu_ldub_data(env, addr + 9);
>     wx.b[10] = cpu_ldub_data(env, addr + 10);
>     wx.b[11] = cpu_ldub_data(env, addr + 11);
>     wx.b[12] = cpu_ldub_data(env, addr + 12);
>     wx.b[13] = cpu_ldub_data(env, addr + 13);
>     wx.b[14] = cpu_ldub_data(env, addr + 14);
>     wx.b[15] = cpu_ldub_data(env, addr + 15);
> #endif
>     }
>     memcpy(pwd, &wx, sizeof(wr_t));
> }
> 
> 
> And then apply big-endian-related changes.
> 
> I think that would be more in the spirit of other MSA work.

The same approach (splitting helpers/unrolling loops) should be
good for COPY_S, COPY_U, INSERT as well.

Thanks,
Aleksandar
diff mbox series

Patch

diff --git a/target/mips/op_helper.c b/target/mips/op_helper.c
index 0f272a5..5441ab2 100644
--- a/target/mips/op_helper.c
+++ b/target/mips/op_helper.c
@@ -4371,18 +4371,37 @@  FOP_CONDN_S(sne,  (float32_lt(fst1, fst0, &env->active_fpu.fp_status)
 #define MEMOP_IDX(DF)
 #endif
 
-#define MSA_LD_DF(DF, TYPE, LD_INSN, ...)                               \
-void helper_msa_ld_ ## TYPE(CPUMIPSState *env, uint32_t wd,             \
-                            target_ulong addr)                          \
-{                                                                       \
-    wr_t *pwd = &(env->active_fpu.fpr[wd].wr);                          \
-    wr_t wx;                                                            \
-    int i;                                                              \
-    MEMOP_IDX(DF)                                                       \
-    for (i = 0; i < DF_ELEMENTS(DF); i++) {                             \
-        wx.TYPE[i] = LD_INSN(env, addr + (i << DF), ##__VA_ARGS__);     \
-    }                                                                   \
-    memcpy(pwd, &wx, sizeof(wr_t));                                     \
+#if defined(HOST_WORDS_BIGENDIAN)
+    bool big_endian = 1;
+#else
+    bool big_endian = 0;
+#endif
+
+#define MSA_LD_DF(DF, TYPE, LD_INSN, ...)                                   \
+void helper_msa_ld_ ## TYPE(CPUMIPSState *env, uint32_t wd,                 \
+                            target_ulong addr)                              \
+{                                                                           \
+    wr_t *pwd = &(env->active_fpu.fpr[wd].wr);                              \
+    wr_t wx;                                                                \
+    int i, k;                                                               \
+    MEMOP_IDX(DF)                                                           \
+    if (!big_endian) {                                                      \
+        for (i = 0; i < DF_ELEMENTS(DF); i++) {                             \
+            wx.TYPE[i] = LD_INSN(env, addr + (i << DF), ##__VA_ARGS__);     \
+        }                                                                   \
+    } else {                                                                \
+        for (i = 0; i < DF_ELEMENTS(DF); i++) {                             \
+            if (i < DF_ELEMENTS(DF) / 2) {                                  \
+                k = DF_ELEMENTS(DF) / 2 - i - 1;                            \
+                wx.TYPE[i] = LD_INSN(env, addr + (k << DF), ##__VA_ARGS__); \
+            } else {                                                        \
+                k = 3 * DF_ELEMENTS(DF) / 2 - i - 1;                        \
+                wx.TYPE[i] = LD_INSN(env, addr + (k << DF), ##__VA_ARGS__); \
+            }                                                               \
+        }                                                                   \
+    }                                                                       \
+                                                                            \
+    memcpy(pwd, &wx, sizeof(wr_t));                                         \
 }
 
 #if !defined(CONFIG_USER_ONLY)
@@ -4417,18 +4436,30 @@  static inline void ensure_writable_pages(CPUMIPSState *env,
 #endif
 }
 
-#define MSA_ST_DF(DF, TYPE, ST_INSN, ...)                               \
-void helper_msa_st_ ## TYPE(CPUMIPSState *env, uint32_t wd,             \
-                            target_ulong addr)                          \
-{                                                                       \
-    wr_t *pwd = &(env->active_fpu.fpr[wd].wr);                          \
-    int mmu_idx = cpu_mmu_index(env, false);				\
-    int i;                                                              \
-    MEMOP_IDX(DF)                                                       \
-    ensure_writable_pages(env, addr, mmu_idx, GETPC());                 \
-    for (i = 0; i < DF_ELEMENTS(DF); i++) {                             \
-        ST_INSN(env, addr + (i << DF), pwd->TYPE[i], ##__VA_ARGS__);    \
-    }                                                                   \
+#define MSA_ST_DF(DF, TYPE, ST_INSN, ...)                                    \
+void helper_msa_st_ ## TYPE(CPUMIPSState *env, uint32_t wd,                  \
+                            target_ulong addr)                               \
+{                                                                            \
+    wr_t *pwd = &(env->active_fpu.fpr[wd].wr);                               \
+    int mmu_idx = cpu_mmu_index(env, false);                                 \
+    int i, k;                                                                \
+    MEMOP_IDX(DF)                                                            \
+    ensure_writable_pages(env, addr, mmu_idx, GETPC());                      \
+    if (!big_endian) {                                                       \
+        for (i = 0; i < DF_ELEMENTS(DF); i++) {                              \
+            ST_INSN(env, addr + (i << DF), pwd->TYPE[i], ##__VA_ARGS__);     \
+        }                                                                    \
+    } else {                                                                 \
+        for (i = 0; i < DF_ELEMENTS(DF); i++) {                              \
+            if (i < DF_ELEMENTS(DF) / 2) {                                   \
+                k = DF_ELEMENTS(DF) / 2 - i - 1;                             \
+                ST_INSN(env, addr + (k << DF), pwd->TYPE[i], ##__VA_ARGS__); \
+            } else {                                                         \
+                k = 3 * DF_ELEMENTS(DF) / 2 - i - 1;                         \
+                ST_INSN(env, addr + (k << DF), pwd->TYPE[i], ##__VA_ARGS__); \
+            }                                                                \
+        }                                                                    \
+    }                                                                        \
 }
 
 #if !defined(CONFIG_USER_ONLY)