diff mbox series

[v2,23/23] cpu-defs.h: Reduce generated code size by inverting MMU_INDEX()

Message ID 20230806121732.91853-24-deller@gmx.de
State New
Headers show
Series Introduce MMU_INDEX() | expand

Commit Message

Helge Deller Aug. 6, 2023, 12:17 p.m. UTC
The MMU is placed within CPUNegativeOffsetState, which means the
smallest negative offsets are at the end of the struct (see comment for
struct CPUTLB).

But in target/cpu.h usually MMU indexes in the range 0-8 are used, which
means that the negative offsets are bigger than if MMU indexes 9-15
would have been used.

This patch inverts the given MMU index, so that the MMU indices now
count down from (MMU_USER_IDX-1) to 0 and thus the tcg will see smaller
negative offsets.

When looking at the generated code, for every memory-access in the guest
the x86-64 tcg generated up to now:

IN:
0x000ebdf5:  8b 04 24               movl     (%esp), %eax

OUT:
...
0x003619:  48 23 bd 10 ff ff ff     andq     -0xf0(%rbp), %rdi
0x003620:  48 03 bd 18 ff ff ff     addq     -0xe8(%rbp), %rdi
...

With the smaller negative offset it will now instead generate:

OUT:
...
0x003499:  48 23 7d c0              andq     -0x40(%rbp), %rdi
0x00349d:  48 03 7d c8              addq     -0x38(%rbp), %rdi

So, every memory acces in the guest now saves 6 bytes (=2 * 3)
of instruction code in the fast path.

Overall, this patch reduces the generated instruction size by ~3%
and may improve overall performance.

Signed-off-by: Helge Deller <deller@gmx.de>
---
 include/exec/cpu-defs.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--
2.41.0
diff mbox series

Patch

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 07bcdd38b2..7ba0481bc4 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -62,8 +62,13 @@ 

 /*
  * MMU_INDEX() helper to specify MMU index.
+ *
+ * Inverse the number here to count downwards from NB_MMU_MODES-1 to 0.  Since
+ * the MMU is placed within CPUNegativeOffsetState, this makes the negative
+ * offsets smaller for which the tcg backend will generate shorter instruction
+ * sequencies to access the MMU.
  */
-#define MMU_INDEX(n)    (n)
+#define MMU_INDEX(n)    (NB_MMU_MODES - 1 - (n))

 #if defined(CONFIG_SOFTMMU) && defined(CONFIG_TCG)
 #include "exec/tlb-common.h"