Re: [PATCH] tcg, tci: Add TCG and interpreter for bytecode (virtual machine)

Submitted by TeLeMan on Oct. 23, 2009, 1:31 a.m.

Details

Message ID a38b25540910221831u1f43a337s8863b93930f275e9@mail.gmail.com
State New
Headers show

Commit Message

TeLeMan Oct. 23, 2009, 1:31 a.m.
Tested i386-softmmu only. Now tci can run windows xp sp2 and its speed
is about 6 times slower than jit.
--
SUN OF A BEACH
Subject: [PATCH 1/5] tci: fix op_sar_iXX and op_ext16s_iXX

---
 tcg/tci.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

Comments

Stefan Weil Oct. 23, 2009, 6:58 p.m.
TeLeMan schrieb:
> Tested i386-softmmu only. Now tci can run windows xp sp2 and its speed
> is about 6 times slower than jit.
> --
> SUN OF A BEACH

Great. Many thanks for the fixes, enhancements and for the testing, too.

Is patch 4 (call handling) needed, or is it an optimization?
If it is needed, the tcg disassembler has to be extended as well.

And did patch 5 (inline) speed up the code? I had expected
that static functions don't need inline, because the compiler
can optimize them anyway.

Regards,
Stefan
TeLeMan Oct. 24, 2009, 3:23 a.m.
On Sat, Oct 24, 2009 at 02:58, Stefan Weil <weil@mail.berlios.de> wrote:
> Is patch 4 (call handling) needed, or is it an optimization?
> If it is needed, the tcg disassembler has to be extended as well.
In fact tci has no stack and robber registers and doesn't need
simulate the CPU work. I am trying to remove tcg_reg_alloc() in
tcg_reg_alloc_op() & tcg_reg_alloc_call() and access the temporary
variables directly in tci.

> And did patch 5 (inline) speed up the code? I had expected
> that static functions don't need inline, because the compiler
> can optimize them anyway.
You are right, patch 5 is not needed.
Stuart Brady Oct. 26, 2009, 7:08 p.m.
On Sat, Oct 24, 2009 at 11:23:43AM +0800, TeLeMan wrote:
> On Sat, Oct 24, 2009 at 02:58, Stefan Weil <weil@mail.berlios.de> wrote:
> > Is patch 4 (call handling) needed, or is it an optimization?
> > If it is needed, the tcg disassembler has to be extended as well.
> 
> In fact tci has no stack and robber registers and doesn't need
> simulate the CPU work. I am trying to remove tcg_reg_alloc() in
> tcg_reg_alloc_op() & tcg_reg_alloc_call() and access the temporary
> variables directly in tci.

'Doesn't need' doesn't necessarily mean 'is better without', though.
Perhaps it's best for TCI to reflect the behaviour of other TCG targets
where possible?  (You can then compare the code that is generated with
different numbers of registers, and different constraints, etc.)

Cheers,

Patch hide | download patch | download mbox

diff --git a/tcg/tci.c b/tcg/tci.c
index e467b3a..81c415c 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -206,7 +206,7 @@  static uint16_t tci_read_r16(uint8_t **tb_ptr)
 }
 
 /* Read indexed register (16 bit signed) from bytecode. */
-static uint16_t tci_read_r16s(uint8_t **tb_ptr)
+static int16_t tci_read_r16s(uint8_t **tb_ptr)
 {
     uint16_t value = tci_read_reg16s(**tb_ptr);
     *tb_ptr += 1;
@@ -549,7 +549,7 @@  unsigned long tcg_qemu_tb_exec(uint8_t *tb_ptr)
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(&tb_ptr);
             t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, (t1 >> t2) | (t1 & (1UL << 31)));
+            tci_write_reg32(t0, ((int32_t)t1 >> t2));
             break;
 #ifdef TCG_TARGET_HAS_rot_i32
         case INDEX_op_rotl_i32:
@@ -794,7 +794,7 @@  unsigned long tcg_qemu_tb_exec(uint8_t *tb_ptr)
             t0 = *tb_ptr++;
             t1 = tci_read_ri64(&tb_ptr);
             t2 = tci_read_ri64(&tb_ptr);
-            tci_write_reg64(t0, (t1 >> t2) | (t1 & (1ULL << 63)));
+            tci_write_reg64(t0, ((int64_t)t1 >> t2));
             break;
 #ifdef TCG_TARGET_HAS_rot_i64
         case INDEX_op_rotl_i64: