Message ID | 1377550812-908-6-git-send-email-rth@twiddle.net |
---|---|
State | New |
Headers | show |
Il 26/08/2013 23:00, Richard Henderson ha scritto: > Allow the code that tcg generates to be less obtuse, passing in > the return address directly instead of computing it in the helper. > > Maintain the old entrance point unchanged as an alternate entry point. > > Delete the helper_st*_cmmu prototypes; the implementations did not exist. > > Signed-off-by: Richard Henderson <rth@twiddle.net> Something that can be done on top of this patch: what about moving the "-1" to helper_ret_*? It is common to pretty much all the targets (except ARM has -2), and it would allow some simplifications. For example I played with return address helpers on 32-bit PPC, and you could use a li rN, retaddr mtlr rN b st_trampoline[i] sequence instead of one of li rN, retaddr mtlr rN bl st_trampoline[i] b retaddr or li rN, retaddr mtlr rN addi rN, rN, -1 b st_trampoline[i] Paolo
On 08/26/2013 03:26 PM, Paolo Bonzini wrote: > Something that can be done on top of this patch: what about moving the > "-1" to helper_ret_*? It is common to pretty much all the targets > (except ARM has -2), and it would allow some simplifications. I suppose so, yes. > li rN, retaddr > mtlr rN > b st_trampoline[i] > > sequence instead of one of > > li rN, retaddr > mtlr rN > bl st_trampoline[i] > b retaddr This sort of thing is very difficult to evaluate, because of the cpu's return address prediction stack. I have so far avoided it. The only cpus that I believe can make good use of tail calls into the memory helpers are those with predicated stores and calls, i.e. arm and ia64. r~
On 26 August 2013 22:00, Richard Henderson <rth@twiddle.net> wrote: > Allow the code that tcg generates to be less obtuse, passing in > the return address directly instead of computing it in the helper. > +uint8_t helper_ret_ldb_mmu(CPUArchState *env, target_ulong addr, > + int mmu_idx, uintptr_t retaddr); > uint8_t helper_ldb_mmu(CPUArchState *env, target_ulong addr, int mmu_idx); I thought the reason we did it this way round was to avoid having so many arguments to helpers that we overflowed registers and into the stack on some calling conventions? Or does this not make much difference in practice? thanks -- PMM
On Mon, Aug 26, 2013 at 03:34:15PM -0700, Richard Henderson wrote: > On 08/26/2013 03:26 PM, Paolo Bonzini wrote: > > Something that can be done on top of this patch: what about moving the > > "-1" to helper_ret_*? It is common to pretty much all the targets > > (except ARM has -2), and it would allow some simplifications. > > I suppose so, yes. > > > li rN, retaddr > > mtlr rN > > b st_trampoline[i] > > > > sequence instead of one of > > > > li rN, retaddr > > mtlr rN > > bl st_trampoline[i] > > b retaddr > > This sort of thing is very difficult to evaluate, because of the > cpu's return address prediction stack. I have so far avoided it. > > The only cpus that I believe can make good use of tail calls into > the memory helpers are those with predicated stores and calls, i.e. > arm and ia64. > On the other hand calling the helper is the exception more than the rule (that's why they have been moved at the end of the TB), so we should not look to much at generating fast code, but rather small code in order to use the caches (both TB and CPU caches) more efficiently. Therefore even on x86, if we move the -1 at the helper level, it should be possible to use a tail call for the stores, something like: mov %r14,%rdi mov %ebx,%edx xor %ecx,%ecx lea -0x10f(%rip),%r8 # 0x7f2541a6f69a pushq %r8 jmpq 0x7f25526757a0 Instead of: mov %r14,%rdi mov %ebx,%edx xor %ecx,%ecx lea -0x10f(%rip),%r8 # 0x7f2541a6f69a callq 0x7f25526757a0 jmpq 0x7f2541a6f69b
On Tue, Aug 27, 2013 at 12:26:00AM +0100, Peter Maydell wrote: > On 26 August 2013 22:00, Richard Henderson <rth@twiddle.net> wrote: > > Allow the code that tcg generates to be less obtuse, passing in > > the return address directly instead of computing it in the helper. > > > +uint8_t helper_ret_ldb_mmu(CPUArchState *env, target_ulong addr, > > + int mmu_idx, uintptr_t retaddr); > > > uint8_t helper_ldb_mmu(CPUArchState *env, target_ulong addr, int mmu_idx); > > I thought the reason we did it this way round was to avoid having > so many arguments to helpers that we overflowed registers and > into the stack on some calling conventions? Or does this not make > much difference in practice? > That was indeed the idea, but I think it's a good idea to provide this alternative for architectures supporting enough arguments in registers.
On 08/27/2013 03:46 AM, Aurelien Jarno wrote: > On the other hand calling the helper is the exception more than the > rule (that's why they have been moved at the end of the TB), so we > should not look to much at generating fast code, but rather small code > in order to use the caches (both TB and CPU caches) more efficiently. > > Therefore even on x86, if we move the -1 at the helper level, it should > be possible to use a tail call for the stores, something like: > > mov %r14,%rdi > mov %ebx,%edx > xor %ecx,%ecx > lea -0x10f(%rip),%r8 # 0x7f2541a6f69a > pushq %r8 > jmpq 0x7f25526757a0 > > Instead of: > > mov %r14,%rdi > mov %ebx,%edx > xor %ecx,%ecx > lea -0x10f(%rip),%r8 # 0x7f2541a6f69a > callq 0x7f25526757a0 > jmpq 0x7f2541a6f69b Fair enough. I'll have a go at some follow-ups then. r~
On Tue, Aug 27, 2013 at 07:53:56AM -0700, Richard Henderson wrote: > On 08/27/2013 03:46 AM, Aurelien Jarno wrote: > > On the other hand calling the helper is the exception more than the > > rule (that's why they have been moved at the end of the TB), so we > > should not look to much at generating fast code, but rather small code > > in order to use the caches (both TB and CPU caches) more efficiently. > > > > Therefore even on x86, if we move the -1 at the helper level, it should > > be possible to use a tail call for the stores, something like: > > > > mov %r14,%rdi > > mov %ebx,%edx > > xor %ecx,%ecx > > lea -0x10f(%rip),%r8 # 0x7f2541a6f69a > > pushq %r8 > > jmpq 0x7f25526757a0 > > > > Instead of: > > > > mov %r14,%rdi > > mov %ebx,%edx > > xor %ecx,%ecx > > lea -0x10f(%rip),%r8 # 0x7f2541a6f69a > > callq 0x7f25526757a0 > > jmpq 0x7f2541a6f69b > > Fair enough. I'll have a go at some follow-ups then. > I think this can also be done in a second time. Do you want to create a version 3, or should I just process the current pull request and you will provide additional patches later?
On 08/27/2013 08:43 AM, Aurelien Jarno wrote: > I think this can also be done in a second time. Do you want to create a > version 3, or should I just process the current pull request and you > will provide additional patches later? I think it might be better to provide follow-up patches. r~
diff --git a/include/exec/softmmu_defs.h b/include/exec/softmmu_defs.h index 1f25e33..e55e717 100644 --- a/include/exec/softmmu_defs.h +++ b/include/exec/softmmu_defs.h @@ -9,29 +9,41 @@ #ifndef SOFTMMU_DEFS_H #define SOFTMMU_DEFS_H +uint8_t helper_ret_ldb_mmu(CPUArchState *env, target_ulong addr, + int mmu_idx, uintptr_t retaddr); +uint16_t helper_ret_ldw_mmu(CPUArchState *env, target_ulong addr, + int mmu_idx, uintptr_t retaddr); +uint32_t helper_ret_ldl_mmu(CPUArchState *env, target_ulong addr, + int mmu_idx, uintptr_t retaddr); +uint64_t helper_ret_ldq_mmu(CPUArchState *env, target_ulong addr, + int mmu_idx, uintptr_t retaddr); + +void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val, + int mmu_idx, uintptr_t retaddr); +void helper_ret_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val, + int mmu_idx, uintptr_t retaddr); +void helper_ret_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val, + int mmu_idx, uintptr_t retaddr); +void helper_ret_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val, + int mmu_idx, uintptr_t retaddr); + uint8_t helper_ldb_mmu(CPUArchState *env, target_ulong addr, int mmu_idx); -void helper_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val, - int mmu_idx); uint16_t helper_ldw_mmu(CPUArchState *env, target_ulong addr, int mmu_idx); -void helper_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val, - int mmu_idx); uint32_t helper_ldl_mmu(CPUArchState *env, target_ulong addr, int mmu_idx); -void helper_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val, - int mmu_idx); uint64_t helper_ldq_mmu(CPUArchState *env, target_ulong addr, int mmu_idx); -void helper_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val, - int mmu_idx); + +void helper_stb_mmu(CPUArchState *env, target_ulong addr, + uint8_t val, int mmu_idx); +void helper_stw_mmu(CPUArchState *env, target_ulong addr, + uint16_t val, int mmu_idx); +void helper_stl_mmu(CPUArchState *env, target_ulong addr, + uint32_t val, int mmu_idx); +void helper_stq_mmu(CPUArchState *env, target_ulong addr, + uint64_t val, int mmu_idx); uint8_t helper_ldb_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx); -void helper_stb_cmmu(CPUArchState *env, target_ulong addr, uint8_t val, -int mmu_idx); uint16_t helper_ldw_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx); -void helper_stw_cmmu(CPUArchState *env, target_ulong addr, uint16_t val, - int mmu_idx); uint32_t helper_ldl_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx); -void helper_stl_cmmu(CPUArchState *env, target_ulong addr, uint32_t val, - int mmu_idx); uint64_t helper_ldq_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx); -void helper_stq_cmmu(CPUArchState *env, target_ulong addr, uint64_t val, - int mmu_idx); -#endif + +#endif /* SOFTMMU_DEFS_H */ diff --git a/include/exec/softmmu_template.h b/include/exec/softmmu_template.h index 8584902..7d8bcb5 100644 --- a/include/exec/softmmu_template.h +++ b/include/exec/softmmu_template.h @@ -78,15 +78,18 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env, } /* handle all cases except unaligned access which span two pages */ +#ifdef SOFTMMU_CODE_ACCESS +static +#endif DATA_TYPE -glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr, - int mmu_idx) +glue(glue(helper_ret_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, + target_ulong addr, int mmu_idx, + uintptr_t retaddr) { DATA_TYPE res; int index; target_ulong tlb_addr; hwaddr ioaddr; - uintptr_t retaddr; /* test if there is match for unaligned or IO access */ /* XXX: could done more in memory macro in a non portable way */ @@ -98,13 +101,11 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr, /* IO access */ if ((addr & (DATA_SIZE - 1)) != 0) goto do_unaligned_access; - retaddr = GETPC_EXT(); ioaddr = env->iotlb[mmu_idx][index]; res = glue(io_read, SUFFIX)(env, ioaddr, addr, retaddr); } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) { /* slow unaligned access (it spans two pages or IO) */ do_unaligned_access: - retaddr = GETPC_EXT(); #ifdef ALIGNED_ONLY do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr); #endif @@ -115,7 +116,6 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr, uintptr_t addend; #ifdef ALIGNED_ONLY if ((addr & (DATA_SIZE - 1)) != 0) { - retaddr = GETPC_EXT(); do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr); } #endif @@ -124,8 +124,6 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr, (addr + addend)); } } else { - /* the page is not in the TLB : fill it */ - retaddr = GETPC_EXT(); #ifdef ALIGNED_ONLY if ((addr & (DATA_SIZE - 1)) != 0) do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr); @@ -136,6 +134,14 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr, return res; } +DATA_TYPE +glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr, + int mmu_idx) +{ + return glue(glue(helper_ret_ld, SUFFIX), MMUSUFFIX)(env, addr, mmu_idx, + GETPC_EXT()); +} + /* handle all unaligned cases */ static DATA_TYPE glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, @@ -214,13 +220,13 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env, io_mem_write(mr, physaddr, val, 1 << SHIFT); } -void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, - target_ulong addr, DATA_TYPE val, - int mmu_idx) +void +glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, + target_ulong addr, DATA_TYPE val, + int mmu_idx, uintptr_t retaddr) { hwaddr ioaddr; target_ulong tlb_addr; - uintptr_t retaddr; int index; index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1); @@ -231,12 +237,10 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, /* IO access */ if ((addr & (DATA_SIZE - 1)) != 0) goto do_unaligned_access; - retaddr = GETPC_EXT(); ioaddr = env->iotlb[mmu_idx][index]; glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr); } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) { do_unaligned_access: - retaddr = GETPC_EXT(); #ifdef ALIGNED_ONLY do_unaligned_access(env, addr, 1, mmu_idx, retaddr); #endif @@ -247,7 +251,6 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, uintptr_t addend; #ifdef ALIGNED_ONLY if ((addr & (DATA_SIZE - 1)) != 0) { - retaddr = GETPC_EXT(); do_unaligned_access(env, addr, 1, mmu_idx, retaddr); } #endif @@ -257,7 +260,6 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, } } else { /* the page is not in the TLB : fill it */ - retaddr = GETPC_EXT(); #ifdef ALIGNED_ONLY if ((addr & (DATA_SIZE - 1)) != 0) do_unaligned_access(env, addr, 1, mmu_idx, retaddr); @@ -267,6 +269,14 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, } } +void +glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr, + DATA_TYPE val, int mmu_idx) +{ + glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(env, addr, val, mmu_idx, + GETPC_EXT()); +} + /* handles all unaligned cases */ static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
Allow the code that tcg generates to be less obtuse, passing in the return address directly instead of computing it in the helper. Maintain the old entrance point unchanged as an alternate entry point. Delete the helper_st*_cmmu prototypes; the implementations did not exist. Signed-off-by: Richard Henderson <rth@twiddle.net> --- include/exec/softmmu_defs.h | 46 ++++++++++++++++++++++++++--------------- include/exec/softmmu_template.h | 42 +++++++++++++++++++++++-------------- 2 files changed, 55 insertions(+), 33 deletions(-)