diff mbox

[1/5] tcg/arm: fix TLB access in qemu-ld/st ops

Message ID 1349814652-22325-2-git-send-email-aurelien@aurel32.net
State New
Headers show

Commit Message

Aurelien Jarno Oct. 9, 2012, 8:30 p.m. UTC
The TCG arm backend considers likely that the offset to the TLB
entries does not exceed 12 bits for mem_index = 0. In practice this is
not true for at list the MIPS target.

The current patch fixes that by loading the bits 23-12 with a separate
instruction, and using loads with address writeback, independently of
the value of mem_idx. In total this allow a 24-bit offset, which is a
lot more than needed.

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-stable@nongnu.org
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/arm/tcg-target.c |   73 +++++++++++++++++++++++++-------------------------
 1 file changed, 37 insertions(+), 36 deletions(-)

Comments

Peter Maydell Oct. 9, 2012, 9:13 p.m. UTC | #1
On 9 October 2012 21:30, Aurelien Jarno <aurelien@aurel32.net> wrote:
> The TCG arm backend considers likely that the offset to the TLB
> entries does not exceed 12 bits for mem_index = 0. In practice this is
> not true for at list the MIPS target.
>
> The current patch fixes that by loading the bits 23-12 with a separate
> instruction, and using loads with address writeback, independently of
> the value of mem_idx. In total this allow a 24-bit offset, which is a
> lot more than needed.

Would probably be good to assert() that the bits above 24 are zero,
though.

> Cc: Andrzej Zaborowski <balrogg@gmail.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/arm/tcg-target.c |   73 +++++++++++++++++++++++++-------------------------
>  1 file changed, 37 insertions(+), 36 deletions(-)
>
> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
> index 737200e..6cde512 100644
> --- a/tcg/arm/tcg-target.c
> +++ b/tcg/arm/tcg-target.c
> @@ -624,6 +624,19 @@ static inline void tcg_out_ld32_12(TCGContext *s, int cond,
>                          (rn << 16) | (rd << 12) | ((-im) & 0xfff));
>  }
>
> +/* Offset pre-increment with base writeback.  */
> +static inline void tcg_out_ld32_12wb(TCGContext *s, int cond,
> +                                     int rd, int rn, tcg_target_long im)
> +{

Worth asserting that rd != rn ? (that's an UNPREDICTABLE case
for ldr with writeback)

> +    if (im >= 0) {
> +        tcg_out32(s, (cond << 28) | 0x05b00000 |
> +                       (rn << 16) | (rd << 12) | (im & 0xfff));
> +    } else {
> +        tcg_out32(s, (cond << 28) | 0x05300000 |
> +                       (rn << 16) | (rd << 12) | ((-im) & 0xfff));
> +    }

you could avoid the if() by just setting the U bit using "(im >= 0) << 23"
Clearer? Dunno.

Looks OK otherwise (though I haven't tested it.)

Random aside: we emit a pretty long string of COND_EQ predicated
code in these functions, especially in the 64 bit address and
byteswapped cases. It might be interesting to see if performance
on an A9, say, was any better if we just did a conditional branch
over it instead... Probably borderline to no-effect, though.

-- PMM
Laurent Desnogues Oct. 10, 2012, 10 a.m. UTC | #2
On Tue, Oct 9, 2012 at 11:13 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 9 October 2012 21:30, Aurelien Jarno <aurelien@aurel32.net> wrote:
>> The TCG arm backend considers likely that the offset to the TLB
>> entries does not exceed 12 bits for mem_index = 0. In practice this is
>> not true for at list the MIPS target.
>>
>> The current patch fixes that by loading the bits 23-12 with a separate
>> instruction, and using loads with address writeback, independently of
>> the value of mem_idx. In total this allow a 24-bit offset, which is a
>> lot more than needed.
>
> Would probably be good to assert() that the bits above 24 are zero,
> though.
>
>> Cc: Andrzej Zaborowski <balrogg@gmail.com>
>> Cc: Peter Maydell <peter.maydell@linaro.org>
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
>> ---
>>  tcg/arm/tcg-target.c |   73 +++++++++++++++++++++++++-------------------------
>>  1 file changed, 37 insertions(+), 36 deletions(-)
>>
>> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
>> index 737200e..6cde512 100644
>> --- a/tcg/arm/tcg-target.c
>> +++ b/tcg/arm/tcg-target.c
>> @@ -624,6 +624,19 @@ static inline void tcg_out_ld32_12(TCGContext *s, int cond,
>>                          (rn << 16) | (rd << 12) | ((-im) & 0xfff));
>>  }
>>
>> +/* Offset pre-increment with base writeback.  */
>> +static inline void tcg_out_ld32_12wb(TCGContext *s, int cond,
>> +                                     int rd, int rn, tcg_target_long im)
>> +{
>
> Worth asserting that rd != rn ? (that's an UNPREDICTABLE case
> for ldr with writeback)
>
>> +    if (im >= 0) {
>> +        tcg_out32(s, (cond << 28) | 0x05b00000 |
>> +                       (rn << 16) | (rd << 12) | (im & 0xfff));
>> +    } else {
>> +        tcg_out32(s, (cond << 28) | 0x05300000 |
>> +                       (rn << 16) | (rd << 12) | ((-im) & 0xfff));
>> +    }
>
> you could avoid the if() by just setting the U bit using "(im >= 0) << 23"
> Clearer? Dunno.

You also have to negate the im value so it's not enough to just
change the opcode.


Laurent

> Looks OK otherwise (though I haven't tested it.)
>
> Random aside: we emit a pretty long string of COND_EQ predicated
> code in these functions, especially in the 64 bit address and
> byteswapped cases. It might be interesting to see if performance
> on an A9, say, was any better if we just did a conditional branch
> over it instead... Probably borderline to no-effect, though.
>
> -- PMM
>
Peter Maydell Oct. 10, 2012, 10:08 a.m. UTC | #3
On 10 October 2012 11:00, Laurent Desnogues <laurent.desnogues@gmail.com> wrote:
> On Tue, Oct 9, 2012 at 11:13 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 9 October 2012 21:30, Aurelien Jarno <aurelien@aurel32.net> wrote:
>>> +    if (im >= 0) {
>>> +        tcg_out32(s, (cond << 28) | 0x05b00000 |
>>> +                       (rn << 16) | (rd << 12) | (im & 0xfff));
>>> +    } else {
>>> +        tcg_out32(s, (cond << 28) | 0x05300000 |
>>> +                       (rn << 16) | (rd << 12) | ((-im) & 0xfff));
>>> +    }
>>
>> you could avoid the if() by just setting the U bit using "(im >= 0) << 23"
>> Clearer? Dunno.
>
> You also have to negate the im value so it's not enough to just
> change the opcode.

Doh, of course. Forget what I wrote, then :-)

-- PMM
diff mbox

Patch

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 737200e..6cde512 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -624,6 +624,19 @@  static inline void tcg_out_ld32_12(TCGContext *s, int cond,
                         (rn << 16) | (rd << 12) | ((-im) & 0xfff));
 }
 
+/* Offset pre-increment with base writeback.  */
+static inline void tcg_out_ld32_12wb(TCGContext *s, int cond,
+                                     int rd, int rn, tcg_target_long im)
+{
+    if (im >= 0) {
+        tcg_out32(s, (cond << 28) | 0x05b00000 |
+                       (rn << 16) | (rd << 12) | (im & 0xfff));
+    } else {
+        tcg_out32(s, (cond << 28) | 0x05300000 |
+                       (rn << 16) | (rd << 12) | ((-im) & 0xfff));
+    }
+}
+
 static inline void tcg_out_st32_12(TCGContext *s, int cond,
                 int rd, int rn, tcg_target_long im)
 {
@@ -1056,7 +1069,7 @@  static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 {
     int addr_reg, data_reg, data_reg2, bswap;
 #ifdef CONFIG_SOFTMMU
-    int mem_index, s_bits;
+    int mem_index, s_bits, tlb_offset;
     TCGReg argreg;
 # if TARGET_LONG_BITS == 64
     int addr_reg2;
@@ -1096,19 +1109,14 @@  static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
                     TCG_REG_R0, TCG_REG_R8, CPU_TLB_SIZE - 1);
     tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_R0, TCG_AREG0,
                     TCG_REG_R0, SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS));
-    /* In the
-     *  ldr r1 [r0, #(offsetof(CPUArchState, tlb_table[mem_index][0].addr_read))]
-     * below, the offset is likely to exceed 12 bits if mem_index != 0 and
-     * not exceed otherwise, so use an
-     *  add r0, r0, #(mem_index * sizeof *CPUArchState.tlb_table)
-     * before.
-     */
-    if (mem_index)
+    /* We assume that the offset is contained within 24 bits.  */
+    tlb_offset = offsetof(CPUArchState, tlb_table[mem_index][0].addr_read);
+    if (tlb_offset > 0xfff) {
         tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R0, TCG_REG_R0,
-                        (mem_index << (TLB_SHIFT & 1)) |
-                        ((16 - (TLB_SHIFT >> 1)) << 8));
-    tcg_out_ld32_12(s, COND_AL, TCG_REG_R1, TCG_REG_R0,
-                    offsetof(CPUArchState, tlb_table[0][0].addr_read));
+                        0xa00 | (tlb_offset >> 12));
+        tlb_offset &= 0xfff;
+    }
+    tcg_out_ld32_12wb(s, COND_AL, TCG_REG_R1, TCG_REG_R0, tlb_offset);
     tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, TCG_REG_R1,
                     TCG_REG_R8, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
     /* Check alignment.  */
@@ -1116,15 +1124,14 @@  static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
         tcg_out_dat_imm(s, COND_EQ, ARITH_TST,
                         0, addr_reg, (1 << s_bits) - 1);
 #  if TARGET_LONG_BITS == 64
-    /* XXX: possibly we could use a block data load or writeback in
-     * the first access.  */
-    tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0,
-                    offsetof(CPUArchState, tlb_table[0][0].addr_read) + 4);
+    /* XXX: possibly we could use a block data load in the first access.  */
+    tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0, 4);
     tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0,
                     TCG_REG_R1, addr_reg2, SHIFT_IMM_LSL(0));
 #  endif
     tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0,
-                    offsetof(CPUArchState, tlb_table[0][0].addend));
+                    offsetof(CPUTLBEntry, addend)
+                    - offsetof(CPUTLBEntry, addr_read));
 
     switch (opc) {
     case 0:
@@ -1273,7 +1280,7 @@  static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
 {
     int addr_reg, data_reg, data_reg2, bswap;
 #ifdef CONFIG_SOFTMMU
-    int mem_index, s_bits;
+    int mem_index, s_bits, tlb_offset;
     TCGReg argreg;
 # if TARGET_LONG_BITS == 64
     int addr_reg2;
@@ -1310,19 +1317,14 @@  static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
                     TCG_REG_R0, TCG_REG_R8, CPU_TLB_SIZE - 1);
     tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_R0,
                     TCG_AREG0, TCG_REG_R0, SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS));
-    /* In the
-     *  ldr r1 [r0, #(offsetof(CPUArchState, tlb_table[mem_index][0].addr_write))]
-     * below, the offset is likely to exceed 12 bits if mem_index != 0 and
-     * not exceed otherwise, so use an
-     *  add r0, r0, #(mem_index * sizeof *CPUArchState.tlb_table)
-     * before.
-     */
-    if (mem_index)
+    /* We assume that the offset is contained within 24 bits.  */
+    tlb_offset = offsetof(CPUArchState, tlb_table[mem_index][0].addr_write);
+    if (tlb_offset > 0xfff) {
         tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R0, TCG_REG_R0,
-                        (mem_index << (TLB_SHIFT & 1)) |
-                        ((16 - (TLB_SHIFT >> 1)) << 8));
-    tcg_out_ld32_12(s, COND_AL, TCG_REG_R1, TCG_REG_R0,
-                    offsetof(CPUArchState, tlb_table[0][0].addr_write));
+                        0xa00 | (tlb_offset >> 12));
+        tlb_offset &= 0xfff;
+    }
+    tcg_out_ld32_12wb(s, COND_AL, TCG_REG_R1, TCG_REG_R0, tlb_offset);
     tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, TCG_REG_R1,
                     TCG_REG_R8, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
     /* Check alignment.  */
@@ -1330,15 +1332,14 @@  static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
         tcg_out_dat_imm(s, COND_EQ, ARITH_TST,
                         0, addr_reg, (1 << s_bits) - 1);
 #  if TARGET_LONG_BITS == 64
-    /* XXX: possibly we could use a block data load or writeback in
-     * the first access.  */
-    tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0,
-                    offsetof(CPUArchState, tlb_table[0][0].addr_write) + 4);
+    /* XXX: possibly we could use a block data load in the first access.  */
+    tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0, 4);
     tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0,
                     TCG_REG_R1, addr_reg2, SHIFT_IMM_LSL(0));
 #  endif
     tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0,
-                    offsetof(CPUArchState, tlb_table[0][0].addend));
+                    offsetof(CPUTLBEntry, addend)
+                    - offsetof(CPUTLBEntry, addr_write));
 
     switch (opc) {
     case 0: