Patchwork [v2,04/10] target-arm: optimize vfp load/store multiple ops

login
register
mail settings
Submitter Juha.Riihimaki@nokia.com
Date Oct. 24, 2009, 12:19 p.m.
Message ID <1256386749-85299-5-git-send-email-juha.riihimaki@nokia.com>
Download mbox | patch
Permalink /patch/36839/
State New
Headers show

Comments

Juha.Riihimaki@nokia.com - Oct. 24, 2009, 12:19 p.m.
From: Juha Riihimäki <juha.riihimaki@nokia.com>

VFP load/store multiple instructions can be slightly optimized by
loading the register offset constant into a variable outside the
register loop and using the preloaded variable inside the loop instead
of reloading the offset value to a temporary variable on each loop
iteration. This causes less TCG ops to be generated for a VFP load/
store multiple instruction if there are more than one register
accessed, otherwise the amount of generated TCG ops is the same.

Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
---
 target-arm/translate.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)
Laurent Desnogues - Oct. 24, 2009, 5:36 p.m.
On Sat, Oct 24, 2009 at 2:19 PM,  <juha.riihimaki@nokia.com> wrote:
> From: Juha Riihimäki <juha.riihimaki@nokia.com>
>
> VFP load/store multiple instructions can be slightly optimized by
> loading the register offset constant into a variable outside the
> register loop and using the preloaded variable inside the loop instead
> of reloading the offset value to a temporary variable on each loop
> iteration. This causes less TCG ops to be generated for a VFP load/
> store multiple instruction if there are more than one register
> accessed, otherwise the amount of generated TCG ops is the same.
>
> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>

Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com>


Laurent

> ---
>  target-arm/translate.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 8cb1c0f..38fb833 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -3222,6 +3222,7 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
>                     offset = 8;
>                 else
>                     offset = 4;
> +                tmp = tcg_const_i32(offset);
>                 for (i = 0; i < n; i++) {
>                     if (insn & ARM_CP_RW_BIT) {
>                         /* load */
> @@ -3232,8 +3233,9 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
>                         gen_mov_F0_vreg(dp, rd + i);
>                         gen_vfp_st(s, dp, addr);
>                     }
> -                    tcg_gen_addi_i32(addr, addr, offset);
> +                    tcg_gen_add_i32(addr, addr, tmp);
>                 }
> +                tcg_temp_free_i32(tmp);
>                 if (insn & (1 << 21)) {
>                     /* writeback */
>                     if (insn & (1 << 24))
> --
> 1.6.5
>
>
>
>
Aurelien Jarno - Oct. 27, 2009, 8:42 a.m.
On Sat, Oct 24, 2009 at 03:19:03PM +0300, juha.riihimaki@nokia.com wrote:
> From: Juha Riihimäki <juha.riihimaki@nokia.com>
> 
> VFP load/store multiple instructions can be slightly optimized by
> loading the register offset constant into a variable outside the
> register loop and using the preloaded variable inside the loop instead
> of reloading the offset value to a temporary variable on each loop
> iteration. This causes less TCG ops to be generated for a VFP load/
> store multiple instruction if there are more than one register
> accessed, otherwise the amount of generated TCG ops is the same.

Same for this patch, it should not change the generated host code, so I
am not sure it really worth it.

> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
> ---
>  target-arm/translate.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 8cb1c0f..38fb833 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -3222,6 +3222,7 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
>                      offset = 8;
>                  else
>                      offset = 4;
> +                tmp = tcg_const_i32(offset);
>                  for (i = 0; i < n; i++) {
>                      if (insn & ARM_CP_RW_BIT) {
>                          /* load */
> @@ -3232,8 +3233,9 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
>                          gen_mov_F0_vreg(dp, rd + i);
>                          gen_vfp_st(s, dp, addr);
>                      }
> -                    tcg_gen_addi_i32(addr, addr, offset);
> +                    tcg_gen_add_i32(addr, addr, tmp);
>                  }
> +                tcg_temp_free_i32(tmp);
>                  if (insn & (1 << 21)) {
>                      /* writeback */
>                      if (insn & (1 << 24))
> -- 
> 1.6.5
> 
> 
>

Patch

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 8cb1c0f..38fb833 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -3222,6 +3222,7 @@  static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
                     offset = 8;
                 else
                     offset = 4;
+                tmp = tcg_const_i32(offset);
                 for (i = 0; i < n; i++) {
                     if (insn & ARM_CP_RW_BIT) {
                         /* load */
@@ -3232,8 +3233,9 @@  static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
                         gen_mov_F0_vreg(dp, rd + i);
                         gen_vfp_st(s, dp, addr);
                     }
-                    tcg_gen_addi_i32(addr, addr, offset);
+                    tcg_gen_add_i32(addr, addr, tmp);
                 }
+                tcg_temp_free_i32(tmp);
                 if (insn & (1 << 21)) {
                     /* writeback */
                     if (insn & (1 << 24))