Patchwork TCG: Improve tb_phys_hash_func()

login
register
mail settings
Submitter Aurelien Jarno
Date Dec. 29, 2010, 9:27 p.m.
Message ID <1293658044-10244-1-git-send-email-aurelien@aurel32.net>
Download mbox | patch
Permalink /patch/76948/
State New
Headers show

Comments

Aurelien Jarno - Dec. 29, 2010, 9:27 p.m.
Most of emulated CPU have instructions aligned on 16 or 32 bits, while
on others GCC tries to align the target jump location. This means that
1/2 or 3/4 of tb_phys_hash entries are never used.

Update the hash function tb_phys_hash_func() to ignore the two lowest
bits of the address. This brings a 6% speed-up when booting a MIPS
image.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 exec-all.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
Blue Swirl - Dec. 30, 2010, 5:55 p.m.
On Wed, Dec 29, 2010 at 9:27 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> Most of emulated CPU have instructions aligned on 16 or 32 bits, while
> on others GCC tries to align the target jump location. This means that
> 1/2 or 3/4 of tb_phys_hash entries are never used.
>
> Update the hash function tb_phys_hash_func() to ignore the two lowest
> bits of the address. This brings a 6% speed-up when booting a MIPS
> image.

Nice! The beginning of functions may be aligned to 16 bytes. Would it
change the performance figures if one or two more bits were ignored?
Aurelien Jarno - Dec. 31, 2010, 7:46 p.m.
On Thu, Dec 30, 2010 at 05:55:38PM +0000, Blue Swirl wrote:
> On Wed, Dec 29, 2010 at 9:27 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > Most of emulated CPU have instructions aligned on 16 or 32 bits, while
> > on others GCC tries to align the target jump location. This means that
> > 1/2 or 3/4 of tb_phys_hash entries are never used.
> >
> > Update the hash function tb_phys_hash_func() to ignore the two lowest
> > bits of the address. This brings a 6% speed-up when booting a MIPS
> > image.
> 
> Nice! The beginning of functions may be aligned to 16 bytes. Would it
> change the performance figures if one or two more bits were ignored?
> 

It makes a noticeable difference on how the TBs are dispatched in the
hash table, but only by a few percents (slightly more on ppc). I am not
able to measure any speed improvement, it is all in the noise.

My guess is that compilers align functions to 16 bytes, but not jump in
loops, which are far more numerous that functions starts.
Aurelien Jarno - Dec. 31, 2010, 7:55 p.m.
On Fri, Dec 31, 2010 at 08:46:02PM +0100, Aurelien Jarno wrote:
> On Thu, Dec 30, 2010 at 05:55:38PM +0000, Blue Swirl wrote:
> > On Wed, Dec 29, 2010 at 9:27 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > > Most of emulated CPU have instructions aligned on 16 or 32 bits, while
> > > on others GCC tries to align the target jump location. This means that
> > > 1/2 or 3/4 of tb_phys_hash entries are never used.
> > >
> > > Update the hash function tb_phys_hash_func() to ignore the two lowest
> > > bits of the address. This brings a 6% speed-up when booting a MIPS
> > > image.
> > 
> > Nice! The beginning of functions may be aligned to 16 bytes. Would it
> > change the performance figures if one or two more bits were ignored?
> > 
> 
> It makes a noticeable difference on how the TBs are dispatched in the
> hash table, but only by a few percents (slightly more on ppc). I am not

Here I meant how TBs are dispatched after my patch has been applied.

Patch

diff --git a/exec-all.h b/exec-all.h
index 6821b17..a4b75bd 100644
--- a/exec-all.h
+++ b/exec-all.h
@@ -177,7 +177,7 @@  static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
 
 static inline unsigned int tb_phys_hash_func(tb_page_addr_t pc)
 {
-    return pc & (CODE_GEN_PHYS_HASH_SIZE - 1);
+    return (pc >> 2) & (CODE_GEN_PHYS_HASH_SIZE - 1);
 }
 
 TranslationBlock *tb_alloc(target_ulong pc);