diff mbox

tcg, tci: Add TCG and interpreter for bytecode (virtual machine)

Message ID 1254156621-28559-1-git-send-email-weil@mail.berlios.de
State Superseded
Headers show

Commit Message

Stefan Weil Sept. 28, 2009, 4:50 p.m. UTC
This is an initial and highly experimental version of
a new code generator which does not create native code
(so it will not depend on the host running qemu when it
is finished). It creates bytecode which is interpreted.

The current version can run the bios code, a simple
hello world and the linux-user testcode for the
i386 and x86_64 targets on x86_64 hosts.

Some additional information is in tcg/bytecode/README
which is part of the patch.

Please send patches / enhancements based on my latest
version from git://repo.or.cz/qemu/ar7.git.

Signed-off-by: Stefan Weil <stefan@mail.berlios.de>
---
 Makefile.target           |    1 +
 configure                 |   24 +-
 dyngen-exec.h             |    4 +-
 target-i386/exec.h        |    4 +
 tcg/bytecode/README       |   64 +++
 tcg/bytecode/tcg-target.c |  937 ++++++++++++++++++++++++++++++++++++++++++++
 tcg/bytecode/tcg-target.h |  151 +++++++
 tcg/tcg.c                 |   16 +-
 tcg/tcg.h                 |    4 +-
 tcg/tci.c                 |  952 +++++++++++++++++++++++++++++++++++++++++++++
 10 files changed, 2148 insertions(+), 9 deletions(-)
 create mode 100644 tcg/bytecode/README
 create mode 100644 tcg/bytecode/tcg-target.c
 create mode 100644 tcg/bytecode/tcg-target.h
 create mode 100644 tcg/tci.c

Comments

Blue Swirl Sept. 28, 2009, 6:42 p.m. UTC | #1
On Mon, Sep 28, 2009 at 7:50 PM, Stefan Weil <weil@mail.berlios.de> wrote:
> This is an initial and highly experimental version of
> a new code generator which does not create native code
> (so it will not depend on the host running qemu when it
> is finished). It creates bytecode which is interpreted.
>
> The current version can run the bios code, a simple
> hello world and the linux-user testcode for the
> i386 and x86_64 targets on x86_64 hosts.
>
> Some additional information is in tcg/bytecode/README
> which is part of the patch.
>
> Please send patches / enhancements based on my latest
> version from git://repo.or.cz/qemu/ar7.git.

Nice. When the interpreter works, it should be unconditional for
targets that are not supported by TCG.

> +QEMU can create native code for some hosts (arm, hppa, i386, ppc, ppc64,
> +sparc, x86_64). For others, inoffical host support was written (s390).

unofficial

> +static const int tcg_target_call_oarg_regs[2] = {
> +    // TODO: ppc64 only uses one register. Why do others use two?

To pass a 64 bit return value in two 32 bit registers?

> +static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {

> +#if TCG_TARGET_NB_REGS == 16 || TCG_TARGET_NB_REGS == 32

> +    "r16",

Table overflow.
Stefan Weil Sept. 28, 2009, 9:33 p.m. UTC | #2
Blue Swirl schrieb:
> On Mon, Sep 28, 2009 at 7:50 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>   
>> This is an initial and highly experimental version of
>> a new code generator which does not create native code
>> (so it will not depend on the host running qemu when it
>> is finished). It creates bytecode which is interpreted.
>>
>> The current version can run the bios code, a simple
>> hello world and the linux-user testcode for the
>> i386 and x86_64 targets on x86_64 hosts.
>>
>> Some additional information is in tcg/bytecode/README
>> which is part of the patch.
>>
>> Please send patches / enhancements based on my latest
>> version from git://repo.or.cz/qemu/ar7.git.
>>     
>
> Nice. When the interpreter works, it should be unconditional for
> targets that are not supported by TCG.
>
>   
>> +QEMU can create native code for some hosts (arm, hppa, i386, ppc, ppc64,
>> +sparc, x86_64). For others, inoffical host support was written (s390).
>>     
>
> unofficial
>
>   
>> +static const int tcg_target_call_oarg_regs[2] = {
>> +    // TODO: ppc64 only uses one register. Why do others use two?
>>     
>
> To pass a 64 bit return value in two 32 bit registers?
>
>   
>> +static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
>>     
>
>   
>> +#if TCG_TARGET_NB_REGS == 16 || TCG_TARGET_NB_REGS == 32
>>     
>
>   
>> +    "r16",
>>     
>
> Table overflow.
>
>   

Thanks. Fixed in git://repo.or.cz/qemu/ar7.git
Stuart Brady Oct. 4, 2009, 6:27 p.m. UTC | #3
On Mon, Sep 28, 2009 at 06:50:21PM +0200, Stefan Weil wrote:
> Please send patches / enhancements based on my latest
> version from git://repo.or.cz/qemu/ar7.git.

Just bug reports for now, unfortunately...

exec-all.h defines 'USE_DIRECT_JUMP' based on the host architecture,
which is not relevant under TCI.  Also, a tb_set_jmp_target1()
implementation is needed for TCI.

I think tcg_out64() belongs in tcg.c along with tcg_out8/16/32().
There are places where the x86_64, ppc64 and s390 TCG targets call
out32() twice, which should be replaced with a single call to out64().

tcg_out16(), tcg_out32() and tcg_out64(), etc. rely on the code buffer
pointer being aligned on those architectures that require alignment...
We'd need definitions of these in terms of stw_p(), stl_p() and stq_p()
for TCI...  Is there any reason to preserve the old definitions?

(BTW, there's no 'ldul' macro, so perhaps one should be added, although
the names in that set of macros and functions are rather unclear, so
perhaps some more descriptive naming could be used for new code.)

Cheers,
Stefan Weil Oct. 11, 2009, 4:33 p.m. UTC | #4
Stuart Brady schrieb:
> On Mon, Sep 28, 2009 at 06:50:21PM +0200, Stefan Weil wrote:
>> Please send patches / enhancements based on my latest
>> version from git://repo.or.cz/qemu/ar7.git.
>
> Just bug reports for now, unfortunately...
>
> exec-all.h defines 'USE_DIRECT_JUMP' based on the host architecture,
> which is not relevant under TCI. Also, a tb_set_jmp_target1()
> implementation is needed for TCI.
Done. Tested with mips / mipsel host.
>
> I think tcg_out64() belongs in tcg.c along with tcg_out8/16/32().
> There are places where the x86_64, ppc64 and s390 TCG targets call
> out32() twice, which should be replaced with a single call to out64().
This is still to be done.
>
> tcg_out16(), tcg_out32() and tcg_out64(), etc. rely on the code buffer
> pointer being aligned on those architectures that require alignment...
> We'd need definitions of these in terms of stw_p(), stl_p() and stq_p()
> for TCI... Is there any reason to preserve the old definitions?
>
> (BTW, there's no 'ldul' macro, so perhaps one should be added, although
> the names in that set of macros and functions are rather unclear, so
> perhaps some more descriptive naming could be used for new code.)
>
> Cheers,
Here is a status update:

A basic test of tcg + interpreter which includes system emulation
(pc bios) and linux user mode (simple hello) now works for
i386, x86_64, mips and mipsel hosts with i386 and x86_64 guests.
Endianess and 32/64 bit issues should be fixed now.
Alignment is still to be done (my test platforms don't need it).

So it is finally possible again to run qemu on mips hosts,
and any 32 or 64 bit host should be able to run qemu, too.
Of course the interpreted code is not fast - in my test on
an emulated mips malta machine it is even terribly slow.

git://repo.or.cz/qemu/ar7.git contains the latest changes.

If the maintainers want to integrate tci in the official qemu,
I can prepare the patches needed to add bytecode generator
and interpreter and tcg host support for any host.

Cheers,

Stefan Weil
malc Oct. 11, 2009, 4:55 p.m. UTC | #5
On Sun, 11 Oct 2009, Stefan Weil wrote:

> Stuart Brady schrieb:
> > On Mon, Sep 28, 2009 at 06:50:21PM +0200, Stefan Weil wrote:
> >> Please send patches / enhancements based on my latest
> >> version from git://repo.or.cz/qemu/ar7.git.
> >
> > Just bug reports for now, unfortunately...
> >
> > exec-all.h defines 'USE_DIRECT_JUMP' based on the host architecture,
> > which is not relevant under TCI. Also, a tb_set_jmp_target1()
> > implementation is needed for TCI.
> Done. Tested with mips / mipsel host.
> >
> > I think tcg_out64() belongs in tcg.c along with tcg_out8/16/32().
> > There are places where the x86_64, ppc64 and s390 TCG targets call
> > out32() twice, which should be replaced with a single call to out64().
> This is still to be done.
> >
> > tcg_out16(), tcg_out32() and tcg_out64(), etc. rely on the code buffer
> > pointer being aligned on those architectures that require alignment...
> > We'd need definitions of these in terms of stw_p(), stl_p() and stq_p()
> > for TCI... Is there any reason to preserve the old definitions?
> >
> > (BTW, there's no 'ldul' macro, so perhaps one should be added, although
> > the names in that set of macros and functions are rather unclear, so
> > perhaps some more descriptive naming could be used for new code.)
> >
> > Cheers,
> Here is a status update:
> 
> A basic test of tcg + interpreter which includes system emulation
> (pc bios) and linux user mode (simple hello) now works for
> i386, x86_64, mips and mipsel hosts with i386 and x86_64 guests.
> Endianess and 32/64 bit issues should be fixed now.
> Alignment is still to be done (my test platforms don't need it).
> 
> So it is finally possible again to run qemu on mips hosts,
> and any 32 or 64 bit host should be able to run qemu, too.
> Of course the interpreted code is not fast - in my test on
> an emulated mips malta machine it is even terribly slow.
> 
> git://repo.or.cz/qemu/ar7.git contains the latest changes.
> 
> If the maintainers want to integrate tci in the official qemu,
> I can prepare the patches needed to add bytecode generator
> and interpreter and tcg host support for any host.
> 

I'm, at the very least, will definitely look at it.
malc Oct. 11, 2009, 9:04 p.m. UTC | #6
On Sun, 11 Oct 2009, Stefan Weil wrote:

> Stuart Brady schrieb:
> > On Mon, Sep 28, 2009 at 06:50:21PM +0200, Stefan Weil wrote:
> >> Please send patches / enhancements based on my latest
> >> version from git://repo.or.cz/qemu/ar7.git.
> >
> > Just bug reports for now, unfortunately...

[..snip..]

> git://repo.or.cz/qemu/ar7.git contains the latest changes.
> 
> If the maintainers want to integrate tci in the official qemu,
> I can prepare the patches needed to add bytecode generator
> and interpreter and tcg host support for any host.

I've looked at the code, and the problem with helpers is essentially
unsolved, if only things were as simple as casting things to
`helper_function' and expecting it will work, not that it's _that_
much more complex, but stil..
Stuart Brady Oct. 11, 2009, 11:17 p.m. UTC | #7
On Sun, Oct 11, 2009 at 06:33:57PM +0200, Stefan Weil wrote:
> Stuart Brady schrieb:
> > exec-all.h defines 'USE_DIRECT_JUMP' based on the host architecture,
> > which is not relevant under TCI. Also, a tb_set_jmp_target1()
> > implementation is needed for TCI.

> Done. Tested with mips / mipsel host.

I think the #ifdef CONFIG_TCG_INTERPRETER sections belong before
the arch-specific sections. :-)

> Alignment is still to be done (my test platforms don't need it).

I will do this at some point if nobody beats me to it. :-)

BTW, I'm wondering about certain places that require host-specific
code... for example:

 * qemu-lock.h, which is used by cpu_exec() for the tb lock and by
   cpu_unlink_tb() for non-NPTL hosts.

 * The generic cpu_get_real_ticks() doesn't seem great... I suppose
   the MIPS version using clock() might be more appropriate?

Apart from this, I suppose user mode emulation simply has to depend
on host-specific code for signal handling... :-(

I gather INDEX_op_call handling could be improved?  Presumably that's
really just a matter of casting to the right type of function pointer
when making each call?

Cheers,
malc Oct. 11, 2009, 11:50 p.m. UTC | #8
yOn Mon, 12 Oct 2009, Stuart Brady wrote:

> On Sun, Oct 11, 2009 at 06:33:57PM +0200, Stefan Weil wrote:
> > Stuart Brady schrieb:
> > > exec-all.h defines 'USE_DIRECT_JUMP' based on the host architecture,
> > > which is not relevant under TCI. Also, a tb_set_jmp_target1()
> > > implementation is needed for TCI.
> 
> > Done. Tested with mips / mipsel host.
> 
> I think the #ifdef CONFIG_TCG_INTERPRETER sections belong before
> the arch-specific sections. :-)
> 
> > Alignment is still to be done (my test platforms don't need it).
> 
> I will do this at some point if nobody beats me to it. :-)
> 
> BTW, I'm wondering about certain places that require host-specific
> code... for example:
> 
>  * qemu-lock.h, which is used by cpu_exec() for the tb lock and by
>    cpu_unlink_tb() for non-NPTL hosts.
> 
>  * The generic cpu_get_real_ticks() doesn't seem great... I suppose
>    the MIPS version using clock() might be more appropriate?
> 
> Apart from this, I suppose user mode emulation simply has to depend
> on host-specific code for signal handling... :-(
> 
> I gather INDEX_op_call handling could be improved?  Presumably that's
> really just a matter of casting to the right type of function pointer
> when making each call?

No it's not, given that there are ABIs with argument alignment and
the fact that information about the types doesn't exist at the point
of op_call things just aren't that simple.
Stuart Brady Oct. 12, 2009, 6:46 a.m. UTC | #9
On Mon, Oct 12, 2009 at 03:50:31AM +0400, malc wrote:
> yOn Mon, 12 Oct 2009, Stuart Brady wrote:
> 
> No it's not, given that there are ABIs with argument alignment and
> the fact that information about the types doesn't exist at the point
> of op_call things just aren't that simple.

Huh?  If you're using a function pointer with the *correct* type,
I hardly see how that's not sufficient.

Yes, I did notice that the information doesn't exist at that point,
(in fact I did so before the previous time that you pointed this out).

Cheers,
malc Oct. 12, 2009, 12:33 p.m. UTC | #10
On Mon, 12 Oct 2009, Stuart Brady wrote:

> On Mon, Oct 12, 2009 at 03:50:31AM +0400, malc wrote:
> > yOn Mon, 12 Oct 2009, Stuart Brady wrote:
> > 
> > No it's not, given that there are ABIs with argument alignment and
> > the fact that information about the types doesn't exist at the point
> > of op_call things just aren't that simple.
> 
> Huh?  If you're using a function pointer with the *correct* type,
> I hardly see how that's not sufficient.

Let me give you an example:

tcg$ cat << eof | gcc-4.4.1 -O -o - -S -x c -
> unsigned long long moo1 (int dummy, unsigned long long a)
> {
>     return ((unsigned long) (a >> 32)) | (a << 32);
> }
> 
> unsigned long long moo2 (int dummy1, int dummy2, unsigned long long a)
> {
>     return ((unsigned long) (a >> 32)) | (a << 32);
> }
> eof
        .file   ""
        .gnu_attribute 4, 1
        .gnu_attribute 8, 1
        .gnu_attribute 12, 2
        .section        ".text"
        .align 2
        .globl moo1
        .type   moo1, @function
moo1:
        mr 3,6
        mr 4,5
        blr
        .size   moo1, .-moo1
        .align 2
        .globl moo2
        .type   moo2, @function
moo2:
        mr 3,6
        mr 4,5
        blr
        .size   moo2, .-moo2
        .ident  "GCC: (GNU) 4.4.1"
        .section        .note.GNU-stack,"",@progbits

tcg$ uname -mo
ppc GNU/Linux

I hope it's self explanatory.

> 
> Yes, I did notice that the information doesn't exist at that point,
> (in fact I did so before the previous time that you pointed this out).
> 
> Cheers,
>
malc Oct. 12, 2009, 12:55 p.m. UTC | #11
On Mon, 12 Oct 2009, malc wrote:

> On Mon, 12 Oct 2009, Stuart Brady wrote:
> 
> > On Mon, Oct 12, 2009 at 03:50:31AM +0400, malc wrote:
> > > yOn Mon, 12 Oct 2009, Stuart Brady wrote:
> > > 
> > > No it's not, given that there are ABIs with argument alignment and
> > > the fact that information about the types doesn't exist at the point
> > > of op_call things just aren't that simple.
> > 
> > Huh?  If you're using a function pointer with the *correct* type,
> > I hardly see how that's not sufficient.
> 
> Let me give you an example:
> 
[..snip overcomplication..]

> tcg$ uname -mo
> ppc GNU/Linux
> 
> I hope it's self explanatory.
> 

~$ gcc-4.4.1 -O2 -o - -S -x c - <<eof
> unsigned long long moo1 (int dummy, unsigned long long a)
> {
>     asm ("# dummy is %0" : : "r" (dummy));
>     return a;
> }
> 
> unsigned long long moo2 (int dummy1, int dummy2, unsigned long long a)
> {
>     asm ("# dummy1 is %0, dummy2 is %1" : : "r" (dummy1), "r" (dummy2));
>     return a;
> }
>       .file   ""
        .gnu_attribute 4, 1
        .gnu_attribute 8, 1
        .gnu_attribute 12, 2
        .section        ".text"
        .align 2
        .globl moo1
        .type   moo1, @function
moo1:
#APP
 # 3 "<stdin>" 1
        # dummy is 3
 # 0 "" 2
#NO_APP
        mr 3,5
        mr 4,6
        blr
        .size   moo1, .-moo1
        .align 2
        .globl moo2
        .type   moo2, @function
moo2:
#APP
 # 9 "<stdin>" 1
        # dummy1 is 3, dummy2 is 4
 # 0 "" 2
#NO_APP
        mr 3,5
        mr 4,6
        blr
        .size   moo2, .-moo2
        .ident  "GCC: (GNU) 4.4.1"
        .section        .note.GNU-stack,"",@progbits

(BTW 64 bit values are returned in r3 r4 pair under SV4 ABI rules)

And a quote from SVR4abippc.pdf itself:

The following algorithm specifies where argument data is passed for the
C language. For this purpose, consider the arguments as ordered from
left (First argument) to right, although the order of evaluation of the
arguments is unspecified. In this algorithm, fr contains the number of
the next available Floating-point register, gr contains the number of
the next available general purpose register, and starg is the address
of the next available stack argument word.

INITIALIZE:
         Set fr=1, gr=3, and starg to the address of parameter word 1.
SCAN:
         If there are no more arguments, terminate. Otherwise, select
         one of the following depending on the type of the next
         argument:

         DOUBLE_OR_FLOAT:
              If fr>8 (that is, there are no more available
              Flating-point registers), go to OTHER. Otherwise, load
              the argument value into Floating-point register fr, set
              fr to fr+1, and go to SCAN.

         SIMPLE_ARG:
              A SIMPLE_ARG is one of the following:
              
              .  One of the simple integer types no more than 32 bits
                 wide (char, short, int, long, enum), or

              .  A pointer to an object of any type, or

              .  A struct, union, or long double, any of which shall
                 be treated as a pointer to the object, or to a copy
                 of the object where necessary to enforce
                 call-by-value semantics. Only if the caller can
                 ascertain that the object is "constant" can it pass a
                 pointer to the object itself.
              


        If gr>10, go to OTHER. Otherwise, load the argument value into
        general register gr, set gr to gr+1, and go to SCAN. Values
        shorter than 32 bits are sign-extended or zero-extended,
        depending on whether they are signed or unsigned.

     LONG_LONG:
        Note that implementations are not required to support a long
        long data type, but if they do, the following treatment is
        required.  If gr>9, go to OTHER. If gr is even, set gr to
        gr+1. Load the lower- addressed word of the long long into gr
        and the higher-addressed word into gr+1, set gr to gr+2, and
        go to SCAN.
Stuart Brady Oct. 12, 2009, 5:39 p.m. UTC | #12
On Mon, Oct 12, 2009 at 04:33:10PM +0400, malc wrote:
> On Mon, 12 Oct 2009, Stuart Brady wrote:
> 
> > Huh?  If you're using a function pointer with the *correct* type,
> > I hardly see how that's not sufficient.
> 
> Let me give you an example:
> 
[snip]

*Mind boggles* :-)

You seem to be ignoring the words "correct type" in all of my sentences.
My suggestion was to give op_call handling the information it that needs
iff TCI is used.  Yes, the code in tcg_gen_callN() is clearly not
appropriate for TCI, although I'd somehow missed the fact that it may
also need to be changed in future to support other new TCG back-ends.

I'm quite aware that functions must be invoked using the *correct type*
of function pointer, though.  Perhaps I wasn't specific enough in what
I said, but please, I'm not completely clueless... :-)

Cheers,
malc Oct. 12, 2009, 8:07 p.m. UTC | #13
On Mon, 12 Oct 2009, Stuart Brady wrote:

> On Mon, Oct 12, 2009 at 04:33:10PM +0400, malc wrote:
> > On Mon, 12 Oct 2009, Stuart Brady wrote:
> > 
> > > Huh?  If you're using a function pointer with the *correct* type,
> > > I hardly see how that's not sufficient.
> > 
> > Let me give you an example:
> > 
> [snip]
> 
> *Mind boggles* :-)
> 
> You seem to be ignoring the words "correct type" in all of my sentences.
> My suggestion was to give op_call handling the information it that needs
> iff TCI is used.  Yes, the code in tcg_gen_callN() is clearly not
> appropriate for TCI, although I'd somehow missed the fact that it may
> also need to be changed in future to support other new TCG back-ends.
> 
> I'm quite aware that functions must be invoked using the *correct type*
> of function pointer, though.  Perhaps I wasn't specific enough in what
> I said, but please, I'm not completely clueless... :-)

Are you thinking of a giant switch here, or what?
Stuart Brady Oct. 12, 2009, 8:30 p.m. UTC | #14
On Tue, Oct 13, 2009 at 12:07:01AM +0400, malc wrote:
> On Mon, 12 Oct 2009, Stuart Brady wrote:
> 
> > I'm quite aware that functions must be invoked using the *correct type*
> > of function pointer, though.  Perhaps I wasn't specific enough in what
> > I said, but please, I'm not completely clueless... :-)
> 
> Are you thinking of a giant switch here, or what?

Yes.

Cheers,
Stefan Weil Oct. 26, 2009, 3:26 p.m. UTC | #15
malc schrieb:
>
>
> On Sun, 11 Oct 2009, Stefan Weil wrote:
>> Stuart Brady schrieb:
>>> On Mon, Sep 28, 2009 at 06:50:21PM +0200, Stefan Weil wrote:
>>>> Please send patches / enhancements based on my latest
>>>> version from git://repo.or.cz/qemu/ar7.git.
>>> Just bug reports for now, unfortunately...
>
> [..snip..]
>
>> git://repo.or.cz/qemu/ar7.git contains the latest changes.
>>
>> If the maintainers want to integrate tci in the official qemu,
>> I can prepare the patches needed to add bytecode generator
>> and interpreter and tcg host support for any host.
>
> I've looked at the code, and the problem with helpers is essentially
> unsolved, if only things were as simple as casting things to
> `helper_function' and expecting it will work, not that it's _that_
> much more complex, but still..

Hello Malc,

I read your comments on the possible ABI problems.

Nevertheless, as far as I could see all existing TCG hosts simply
use fixed registers when calling helper functions.

So the same approach should work for TCI, too.

At least it is possible to run complete operating systems
using this approach.

What do you think would be needed to get a first stage of
TCI integrated in QEMU master?

Regards,
Stefan
malc Oct. 26, 2009, 3:52 p.m. UTC | #16
On Mon, 26 Oct 2009, Stefan Weil wrote:

> malc schrieb:
> >
> >
> > On Sun, 11 Oct 2009, Stefan Weil wrote:
> >> Stuart Brady schrieb:
> >>> On Mon, Sep 28, 2009 at 06:50:21PM +0200, Stefan Weil wrote:
> >>>> Please send patches / enhancements based on my latest
> >>>> version from git://repo.or.cz/qemu/ar7.git.
> >>> Just bug reports for now, unfortunately...
> >
> > [..snip..]
> >
> >> git://repo.or.cz/qemu/ar7.git contains the latest changes.
> >>
> >> If the maintainers want to integrate tci in the official qemu,
> >> I can prepare the patches needed to add bytecode generator
> >> and interpreter and tcg host support for any host.
> >
> > I've looked at the code, and the problem with helpers is essentially
> > unsolved, if only things were as simple as casting things to
> > `helper_function' and expecting it will work, not that it's _that_
> > much more complex, but still..
> 
> Hello Malc,
> 
> I read your comments on the possible ABI problems.
> 
> Nevertheless, as far as I could see all existing TCG hosts simply
> use fixed registers when calling helper functions.

The problem, as i explained earlier, is that TCI doesn't take types
of the parameters into considerations and this will break SVR4 PPC ABI
for instance, since long long arguments are "aligned". Please read
this for details:
http://marc.info/?l=qemu-devel&m=125535217403861&w=2

> 
> So the same approach should work for TCI, too.
> 
> At least it is possible to run complete operating systems
> using this approach.
> 
> What do you think would be needed to get a first stage of
> TCI integrated in QEMU master?
> 

Fixing the aforementioned problem first, then convincing someone
to commit it. I'd guess that having qemu/tci run on something we
do not have TCG support for would imrpove the odds.
diff mbox

Patch

diff --git a/Makefile.target b/Makefile.target
index 1039bfc..f865389 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -33,6 +33,7 @@  all: $(PROGS)
 # cpu emulator library
 libobj-y = exec.o translate-all.o cpu-exec.o translate.o
 libobj-y += tcg/tcg.o
+libobj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o
 libobj-$(CONFIG_SOFTFLOAT) += fpu/softfloat.o
 libobj-$(CONFIG_NOSOFTFLOAT) += fpu/softfloat-native.o
 libobj-y += op_helper.o helper.o
diff --git a/configure b/configure
index a6f6601..eef0928 100755
--- a/configure
+++ b/configure
@@ -195,6 +195,7 @@  linux_aio=""
 
 gprof="no"
 debug_tcg="no"
+tcg_interpreter="no"
 debug="no"
 strip_opt="yes"
 bigendian="no"
@@ -471,6 +472,10 @@  for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-tcg-interpreter) tcg_interpreter="no"
+  ;;
+  --enable-tcg-interpreter) tcg_interpreter="yes"
+  ;;
   --enable-profiler) profiler="yes"
   ;;
   --enable-cocoa)
@@ -673,6 +678,7 @@  echo "  --disable-bluez          disable bluez stack connectivity"
 echo "  --enable-bluez           enable bluez stack connectivity"
 echo "  --disable-kvm            disable KVM acceleration support"
 echo "  --enable-kvm             enable KVM acceleration support"
+echo "  --enable-tcg-interpreter enable TCG with bytecode interpreter TCI"
 echo "  --disable-nptl           disable usermode NPTL support"
 echo "  --enable-nptl            enable usermode NPTL support"
 echo "  --enable-system          enable all system emulation targets"
@@ -1764,6 +1770,7 @@  echo "IO thread         $io_thread"
 echo "Linux AIO support $linux_aio"
 echo "Install blobs     $blobs"
 echo "KVM support       $kvm"
+echo "TCG interpreter   $tcg_interpreter"
 echo "fdt support       $fdt"
 echo "preadv support    $preadv"
 echo "fdatasync         $fdatasync"
@@ -1796,8 +1803,14 @@  case "$cpu" in
     ARCH=arm
   ;;
   *)
-    echo "Unsupported CPU = $cpu"
-    exit 1
+    if test "$tcg_interpreter" = "yes" ; then
+        echo "Unsupported CPU = $cpu, will use TCG with TCI (experimental)"
+        ARCH=unknown
+        echo "CONFIG_TCG_INTERPRETER=y" >> $config_host_mak
+    else
+        echo "Unsupported CPU = $cpu, try --enable-tcg-interpreter"
+        exit 1
+    fi
   ;;
 esac
 echo "ARCH=$ARCH" >> $config_host_mak
@@ -1959,6 +1972,9 @@  fi
 if test "$fdt" = "yes" ; then
   echo "CONFIG_FDT=y" >> $config_host_mak
 fi
+if test "$tcg_interpreter" = "yes" ; then
+  echo "CONFIG_TCG_INTERPRETER=y" >> $config_host_mak
+fi
 if test "$need_offsetof" = "yes" ; then
   echo "CONFIG_NEED_OFFSETOF=y" >> $config_host_mak
 fi
@@ -2331,7 +2347,9 @@  fi
 cflags=""
 ldflags=""
 
-if test "$ARCH" = "sparc64" ; then
+if test "$tcg_interpreter" = "yes"; then
+  cflags="-I\$(SRC_PATH)/tcg/bytecode $cflags"
+elif test "$ARCH" = "sparc64" ; then
   cflags="-I\$(SRC_PATH)/tcg/sparc $cflags"
 else
   cflags="-I\$(SRC_PATH)/tcg/\$(ARCH) $cflags"
diff --git a/dyngen-exec.h b/dyngen-exec.h
index 86e61c3..d5620ca 100644
--- a/dyngen-exec.h
+++ b/dyngen-exec.h
@@ -48,7 +48,9 @@  extern int fprintf(FILE *, const char *, ...);
 extern int fputs(const char *, FILE *);
 extern int printf(const char *, ...);
 
-#if defined(__i386__)
+#if defined(CONFIG_TCG_INTERPRETER)
+/* The TCG interpreter does not use special registers. */
+#elif defined(__i386__)
 #define AREG0 "ebp"
 #define AREG1 "ebx"
 #define AREG2 "esi"
diff --git a/target-i386/exec.h b/target-i386/exec.h
index 864d030..3f796af 100644
--- a/target-i386/exec.h
+++ b/target-i386/exec.h
@@ -28,7 +28,11 @@ 
 
 #include "cpu-defs.h"
 
+#if defined(CONFIG_TCG_INTERPRETER)
+extern struct CPUX86State *env;
+#else
 register struct CPUX86State *env asm(AREG0);
+#endif
 
 #include "qemu-common.h"
 #include "qemu-log.h"
diff --git a/tcg/bytecode/README b/tcg/bytecode/README
new file mode 100644
index 0000000..d755f85
--- /dev/null
+++ b/tcg/bytecode/README
@@ -0,0 +1,64 @@ 
+TCG Interpreter (TCI) - Stefan Weil.
+
+1) Introduction
+
+TCG (Tiny Code Generator) is a code generator which translates
+code fragments ("basic blocks") from target code (any of the
+targets supported by QEMU) to a code representation which
+can be run on a host.
+
+QEMU can create native code for some hosts (arm, hppa, i386, ppc, ppc64,
+sparc, x86_64). For others, inoffical host support was written (s390).
+
+By adding a code generator for some virtual machine and using an
+interpreter for the generated bytecode, it is possible to
+support (almost) any host.
+
+This is what TCI (Tiny Code Interpreter) does.
+
+2) Implementation
+
+Like each TCG host frontend, TCI implements the code generator in
+tcg-target.c, tcg-target.h, both in directory tcg/bytecode.
+
+The additional file tcg/tci.c adds the interpreter.
+
+The bytecode consists of opcodes (same numeric values as those used by
+TCG) and arguments of variable size and number.
+
+3) Usage
+
+For hosts with native TCG, the interpreter TCI is enabled by
+
+        configure --enable-tcg-interpreter
+
+The only difference from running qemu with TCI to running without TCI
+should be speed. Especially during development of TCI, it was very
+useful to compare runs with and without TCI. Create /tmp/qemu.log by
+
+        qemu -d in_asm,op_opt,cpu -singlestep
+
+once with interpreter and once without interpreter and compare the resulting
+qemu.log files. This is also useful to see the effects of additional
+registers or additional opcodes (it is easy to modify the virtual machine).
+It can also be used to verify native TCGs.
+
+4) Todo list
+
+* TCI is still highly experimental. It is written and tested on a x86_64 host
+  running i386 guests. Host / guest combinations will fail (missing
+  byte swap) with the current version. 32 bit hosts need code fixes, too.
+
+* A lot of TCG codes are either missing in the code generator and/or
+  in the interpreter. These codes raise a runtime exception, so it is
+  possible to see which parts of the code need to be fixed.
+
+* The pseudo code is not optimized and rather ugly (no clear rules how
+  arguments are written).
+
+* A better disassembler for the pseudo code would be nice (a very primitive
+  disassembler is included in tcg-target.c).
+
+* It might be useful to have a runtime option which selects the native TCG
+  or TCI, so qemu would have to include two TCGs. Today, selecting TCI
+  is a configure option, so you need two compilations of qemu.
diff --git a/tcg/bytecode/tcg-target.c b/tcg/bytecode/tcg-target.c
new file mode 100644
index 0000000..b6929ea
--- /dev/null
+++ b/tcg/bytecode/tcg-target.c
@@ -0,0 +1,937 @@ 
+/*
+ * Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2008 Fabrice Bellard
+ * Copyright (c) 2009 Stefan Weil
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/* TODO list:
+ * - Add support for constant parameters.
+ */
+
+/* Marker for missing code. */
+#define TODO() \
+    fprintf(stderr, "TODO %s:%u: %s()\n", __FILE__, __LINE__, __FUNCTION__); \
+    tcg_abort()
+
+/* Trace message to see program flow. */
+#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
+#define TRACE() \
+    fprintf(stderr, "TCG %s:%u: %s()\n", __FILE__, __LINE__, __FUNCTION__)
+#else
+#define TRACE() ((void)0)
+#endif
+
+/* Single bit n. */
+#define BIT(n) (1 << (n))
+
+/* Bitfield n...m (in 32 bit value). */
+#define BITS(n, m) (((0xffffffffU << (31 - n)) >> (31 - n + m)) << m)
+
+/* Used for function call generation. */
+#define TCG_REG_CALL_STACK              TCG_REG_R4
+#define TCG_TARGET_STACK_ALIGN          16
+#define TCG_TARGET_CALL_STACK_OFFSET    0
+
+/* TODO: documentation. */
+static uint8_t *tb_ret_addr;
+
+/* TODO: documentation. */
+static const TCGTargetOpDef tcg_target_op_defs[] = {
+    { INDEX_op_exit_tb, { } },
+    { INDEX_op_goto_tb, { } },
+    { INDEX_op_call, { "ri" } },
+    { INDEX_op_jmp, { "ri" } },
+    { INDEX_op_br, { } },
+
+    { INDEX_op_mov_i32, { "r", "r" } },
+    { INDEX_op_mov_i64, { "r", "r" } },
+    { INDEX_op_movi_i32, { "r" } },
+    { INDEX_op_movi_i64, { "r" } },
+
+    { INDEX_op_ld8u_i32, { "r", "r" } },
+    { INDEX_op_ld8s_i32, { "r", "r" } },
+    { INDEX_op_ld16u_i32, { "r", "r" } },
+    { INDEX_op_ld16s_i32, { "r", "r" } },
+    { INDEX_op_ld_i32, { "r", "r" } },
+    { INDEX_op_st8_i32, { "r", "ri" } },
+    { INDEX_op_st8_i64, { "r", "ri" } },
+    { INDEX_op_st16_i32, { "r", "ri" } },
+    { INDEX_op_st16_i64, { "r", "ri" } },
+    { INDEX_op_st_i32, { "r", "ri" } },
+    { INDEX_op_st_i64, { "r", "ri" } },
+    { INDEX_op_st32_i64, { "r", "ri" } },
+
+    { INDEX_op_ld8u_i64, { "r", "r" } },
+    { INDEX_op_ld8s_i64, { "r", "r" } },
+    { INDEX_op_ld16u_i64, { "r", "r" } },
+    { INDEX_op_ld16s_i64, { "r", "r" } },
+    { INDEX_op_ld32u_i64, { "r", "r" } },
+    { INDEX_op_ld32s_i64, { "r", "r" } },
+    { INDEX_op_ld_i64, { "r", "r" } },
+
+    { INDEX_op_add_i32, { "r", "ri", "ri" } },
+    { INDEX_op_mul_i32, { "r", "ri", "ri" } },
+#if defined(TCG_TARGET_HAS_div_i32)
+    { INDEX_op_div_i32, { "r", "r", "r" } },
+    { INDEX_op_divu_i32, { "r", "r", "r" } },
+    { INDEX_op_rem_i32, { "r", "r", "r" } },
+    { INDEX_op_remu_i32, { "r", "r", "r" } },
+#else
+    { INDEX_op_div2_i32, { "r", "r", "0", "1", "r" } },
+    { INDEX_op_divu2_i32, { "r", "r", "0", "1", "r" } },
+#endif
+    { INDEX_op_sub_i32, { "r", "ri", "ri" } },
+    { INDEX_op_and_i32, { "r", "ri", "ri" } },
+    { INDEX_op_or_i32, { "r", "ri", "ri" } },
+    { INDEX_op_xor_i32, { "r", "ri", "ri" } },
+
+    { INDEX_op_shl_i32, { "r", "ri", "ri" } },
+    { INDEX_op_shr_i32, { "r", "ri", "ri" } },
+    { INDEX_op_sar_i32, { "r", "r", "ri" } },
+
+    { INDEX_op_brcond_i32, { "r", "ri" } },
+    { INDEX_op_brcond_i64, { "r", "ri" } },
+
+#if defined(TCG_TARGET_HAS_neg_i32)
+    { INDEX_op_neg_i32, { "r", "r" } },
+#endif
+#if defined(TCG_TARGET_HAS_not_i32)
+    { INDEX_op_not_i32, { "r", "r" } },
+#endif
+
+#if TCG_TARGET_REG_BITS == 64
+    { INDEX_op_add_i64, { "r", "ri", "ri" } },
+    { INDEX_op_sub_i64, { "r", "ri", "ri" } },
+    { INDEX_op_and_i64, { "r", "ri", "ri" } },
+    { INDEX_op_or_i64, { "r", "ri", "ri" } },
+    { INDEX_op_xor_i64, { "r", "ri", "ri" } },
+
+    { INDEX_op_shl_i64, { "r", "ri", "ri" } },
+    { INDEX_op_shr_i64, { "r", "ri", "ri" } },
+    { INDEX_op_sar_i64, { "r", "ri", "ri" } },
+
+    { INDEX_op_mul_i64, { "r", "ri", "ri" } },
+#if defined(TCG_TARGET_HAS_div_i64)
+    { INDEX_op_div_i64, { "r", "r", "r" } },
+    { INDEX_op_divu_i64, { "r", "r", "r" } },
+    { INDEX_op_rem_i64, { "r", "r", "r" } },
+    { INDEX_op_remu_i64, { "r", "r", "r" } },
+#else
+    { INDEX_op_div2_i64, { "r", "r", "0", "1", "r" } },
+    { INDEX_op_divu2_i64, { "r", "r", "0", "1", "r" } },
+#endif
+
+#ifdef TCG_TARGET_HAS_not_i64
+    { INDEX_op_not_i64, { "r", "r" } },
+#endif
+#ifdef TCG_TARGET_HAS_neg_i64
+    { INDEX_op_neg_i64, { "r", "r" } },
+#endif
+#endif /* TCG_TARGET_REG_BITS == 64 */
+
+    { INDEX_op_qemu_ld8u, { "r", "L" } },
+    { INDEX_op_qemu_ld8s, { "r", "L" } },
+    { INDEX_op_qemu_ld16u, { "r", "L" } },
+    { INDEX_op_qemu_ld16s, { "r", "L" } },
+    { INDEX_op_qemu_ld32u, { "r", "L" } },
+    { INDEX_op_qemu_ld32s, { "r", "L" } },
+    { INDEX_op_qemu_ld64, { "r", "L" } },
+
+    { INDEX_op_qemu_st8, { "S", "S" } },
+    { INDEX_op_qemu_st16, { "S", "S" } },
+    { INDEX_op_qemu_st32, { "S", "S" } },
+    { INDEX_op_qemu_st64, { "S", "S", "S" } },
+
+#if defined(TCG_TARGET_HAS_ext8s_i32)
+    { INDEX_op_ext8s_i32, { "r", "r" } },
+#endif
+#if defined(TCG_TARGET_HAS_ext16s_i32)
+    { INDEX_op_ext16s_i32, { "r", "r" } },
+#endif
+#if defined(TCG_TARGET_HAS_ext8s_i64)
+    { INDEX_op_ext8s_i64, { "r", "r" } },
+#endif
+#if defined(TCG_TARGET_HAS_ext16s_i64)
+    { INDEX_op_ext16s_i64, { "r", "r" } },
+#endif
+#if defined(TCG_TARGET_HAS_ext32s_i64)
+    { INDEX_op_ext32s_i64, { "r", "r" } },
+#endif
+
+    { -1 },
+};
+
+static const int tcg_target_reg_alloc_order[] = {
+    TCG_REG_R0,
+    TCG_REG_R1,
+    TCG_REG_R2,
+    TCG_REG_R3,
+    //~ TCG_REG_R4,     // used for TCG_REG_CALL_STACK
+    TCG_REG_R5,
+    TCG_REG_R6,
+    TCG_REG_R7,
+};
+
+static const int tcg_target_call_iarg_regs[] = {
+    TCG_REG_R0,
+    TCG_REG_R1,
+    TCG_REG_R2,
+    TCG_REG_R3,
+    //~ TCG_REG_R4,     // used for TCG_REG_CALL_STACK
+    TCG_REG_R5,
+    TCG_REG_R6,
+    TCG_REG_R7,
+};
+
+static const int tcg_target_call_oarg_regs[2] = {
+    // TODO: ppc64 only uses one register. Why do others use two?
+    TCG_REG_R0,
+    TCG_REG_R1,
+};
+
+#ifndef NDEBUG
+static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
+    "r00",
+    "r01",
+    "r02",
+    "r03",
+    "r04",
+    "r05",
+    "r06",
+    "r07",
+#if TCG_TARGET_NB_REGS == 16 || TCG_TARGET_NB_REGS == 32
+    "r08",
+    "r09",
+    "r10",
+    "r11",
+    "r12",
+    "r13",
+    "r14",
+    "r15",
+    "r16",
+#endif
+#if TCG_TARGET_NB_REGS == 32
+    "r17",
+    "r18",
+    "r19",
+    "r20",
+    "r21",
+    "r22",
+    "r23",
+    "r24",
+    "r25",
+    "r26",
+    "r27",
+    "r28",
+    "r29",
+    "r30",
+    "r31"
+#endif
+};
+#endif
+
+static void flush_icache_range(unsigned long start, unsigned long stop)
+{
+    TRACE();
+}
+
+static void patch_reloc(uint8_t *code_ptr, int type,
+                        tcg_target_long value, tcg_target_long addend)
+{
+    TRACE();
+    switch (type) {
+    /* TODO: 32 bit relocation is missing. */
+    case 8:
+        *(uint64_t *)code_ptr = value;
+        break;
+    default:
+        TODO();
+    }
+}
+
+/* Parse target specific constraints. */
+static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+{
+    const char *ct_str = *pct_str;
+    switch (ct_str[0]) {
+    case 'r':
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set32(ct->u.regs, 0, BIT(TCG_TARGET_NB_REGS) - 1);
+        break;
+    case 'L':                   /* qemu_ld constraint */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set32(ct->u.regs, 0, BIT(TCG_TARGET_NB_REGS) - 1);
+#if 0 // TODO: do we need this? How should it look like?
+        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R3);
+#ifdef CONFIG_SOFTMMU
+        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R4);
+#endif
+#endif
+        break;
+    case 'S':                   /* qemu_st constraint */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set32(ct->u.regs, 0, BIT(TCG_TARGET_NB_REGS) - 1);
+#if 0 // TODO: do we need this? How should it look like?
+        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R3);
+#ifdef CONFIG_SOFTMMU
+        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R4);
+        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R5);
+#endif
+#endif
+        break;
+    default:
+        return -1;
+    }
+    ct_str++;
+    *pct_str = ct_str;
+    return 0;
+}
+
+void tci_disas(uint8_t opc)
+{
+#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
+    const TCGOpDef *def = &tcg_op_defs[opc];
+    fprintf(stderr, "TCG %s %u, %u, %u\n",
+            def->name, def->nb_oargs, def->nb_iargs, def->nb_cargs);
+#endif
+}
+
+static void tcg_disas3(TCGContext *s, uint8_t c, const TCGArg *args)
+{
+#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
+    char buf[128];
+    TCGArg arg;
+    FILE *outfile = stderr;
+    const TCGOpDef *def = &tcg_op_defs[c];
+    int nb_oargs, nb_iargs, nb_cargs;
+    int i, k;
+    if (c == INDEX_op_debug_insn_start) {
+        uint64_t pc;
+#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
+        pc = ((uint64_t)args[1] << 32) | args[0];
+#else
+        pc = args[0];
+#endif
+        fprintf(outfile, " ---- 0x%" PRIx64, pc);
+        nb_oargs = def->nb_oargs;
+        nb_iargs = def->nb_iargs;
+        nb_cargs = def->nb_cargs;
+    } else if (c == INDEX_op_call) {
+        TCGArg arg;
+
+        /* variable number of arguments */
+        arg = *args++;
+        nb_oargs = arg >> 16;
+        nb_iargs = arg & 0xffff;
+        nb_cargs = def->nb_cargs;
+
+        fprintf(outfile, " %s ", def->name);
+
+#if 0 /* TODO: code does not work (crash), need better code for disassembly. */
+        /* function name */
+        fprintf(outfile, "%s",
+                tcg_get_arg_str_idx(s, buf, sizeof(buf), args[nb_oargs + nb_iargs - 1]));
+        /* flags */
+        fprintf(outfile, ",$0x%" TCG_PRIlx,
+                args[nb_oargs + nb_iargs]);
+        /* nb out args */
+        fprintf(outfile, ",$%d", nb_oargs);
+        for(i = 0; i < nb_oargs; i++) {
+            fprintf(outfile, ",");
+            fprintf(outfile, "%s",
+                    tcg_get_arg_str_idx(s, buf, sizeof(buf), args[i]));
+        }
+        for(i = 0; i < (nb_iargs - 1); i++) {
+            fprintf(outfile, ",");
+            if (args[nb_oargs + i] == TCG_CALL_DUMMY_ARG) {
+                fprintf(outfile, "<dummy>");
+            } else {
+                fprintf(outfile, "%s",
+                        tcg_get_arg_str_idx(s, buf, sizeof(buf), args[nb_oargs + i]));
+            }
+        }
+#endif
+    } else if (c == INDEX_op_movi_i32
+#if TCG_TARGET_REG_BITS == 64
+               || c == INDEX_op_movi_i64
+#endif
+               ) {
+        tcg_target_ulong val;
+        TCGHelperInfo *th;
+
+        nb_oargs = def->nb_oargs;
+        nb_iargs = def->nb_iargs;
+        nb_cargs = def->nb_cargs;
+        fprintf(outfile, " %s %s,$", def->name,
+                tcg_get_arg_str_idx(s, buf, sizeof(buf), args[0]));
+        val = args[1];
+        th = tcg_find_helper(s, val);
+        if (th) {
+            fprintf(outfile, "%s", th->name);
+        } else {
+            if (c == INDEX_op_movi_i32) {
+                fprintf(outfile, "0x%x", (uint32_t)val);
+            } else {
+                fprintf(outfile, "0x%" PRIx64 , (uint64_t)val);
+            }
+        }
+    } else {
+        fprintf(outfile, " %s ", def->name);
+        if (c == INDEX_op_nopn) {
+            /* variable number of arguments */
+            nb_cargs = *args;
+            nb_oargs = 0;
+            nb_iargs = 0;
+        } else {
+            nb_oargs = def->nb_oargs;
+            nb_iargs = def->nb_iargs;
+            nb_cargs = def->nb_cargs;
+        }
+
+        k = 0;
+        for(i = 0; i < nb_oargs; i++) {
+            fprintf(outfile, "%s%s", (k != 0) ? "," : "",
+                    tcg_get_arg_str_idx(s, buf, sizeof(buf), args[k]));
+            k++;
+        }
+        for(i = 0; i < nb_iargs; i++) {
+            fprintf(outfile, "%s%s", (k != 0) ? "," : "",
+                    tcg_get_arg_str_idx(s, buf, sizeof(buf), args[k]));
+            k++;
+        }
+        if (c == INDEX_op_brcond_i32
+#if TCG_TARGET_REG_BITS == 32
+            || c == INDEX_op_brcond2_i32
+#elif TCG_TARGET_REG_BITS == 64
+            || c == INDEX_op_brcond_i64
+#endif
+            ) {
+            if (args[k] < ARRAY_SIZE(cond_name) && cond_name[args[k]]) {
+                fprintf(outfile, ",%s", cond_name[args[k++]]);
+            } else {
+                fprintf(outfile, ",$0x%" TCG_PRIlx, args[k++]);
+            }
+            i = 1;
+        } else {
+            i = 0;
+        }
+        for(; i < nb_cargs; i++) {
+            arg = args[k];
+            fprintf(outfile, "%s$0x%" TCG_PRIlx,  (k != 0) ? "," : "", arg);
+            k++;
+        }
+    }
+    fprintf(stderr, " %u, %u, %u\n",
+            def->nb_oargs, def->nb_iargs, def->nb_cargs);
+#endif
+}
+
+#ifdef CONFIG_SOFTMMU
+/* Write value (native size). */
+static void tcg_out_i(TCGContext *s, tcg_target_ulong v)
+{
+    *(tcg_target_ulong *)s->code_ptr = v;
+    s->code_ptr += sizeof(tcg_target_ulong);
+}
+#endif
+
+/* Write 64 bit value. */
+static void tcg_out64(TCGContext *s, uint64_t v)
+{
+    *(uint64_t *)s->code_ptr = v;
+    s->code_ptr += sizeof(v);
+}
+
+/* Write opcode. */
+static void tcg_out_op_t(TCGContext *s, uint8_t op)
+{
+    tcg_out8(s, op);
+}
+
+/* Write register. */
+static void tcg_out_r(TCGContext *s, TCGArg t0)
+{
+    assert(t0 < TCG_TARGET_NB_REGS);
+    tcg_out8(s, t0);
+}
+
+/* Write register or constant (32 bit). */
+static void tcg_out_ri32(TCGContext *s, int const_arg, TCGArg arg)
+{
+    tcg_out8(s, const_arg);
+    if (const_arg) {
+        //~ assert(arg == (uint32_t)arg);
+        tcg_out32(s, arg);
+    } else {
+        tcg_out_r(s, arg);
+    }
+}
+
+/* Write register or constant (64 bit). */
+static void tcg_out_ri64(TCGContext *s, int const_arg, TCGArg arg)
+{
+    tcg_out8(s, const_arg);
+    if (const_arg) {
+        tcg_out64(s, arg);
+    } else {
+        tcg_out_r(s, arg);
+    }
+}
+
+static void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val)
+{
+    TCGArg args[2] = { reg, val };
+    tcg_disas3(s, INDEX_op_add_i32, args);
+    TODO();
+}
+
+static void tcg_out_ld(TCGContext *s, TCGType type, int ret, int arg1,
+                       tcg_target_long arg2)
+{
+    TCGArg args[3] = { ret, arg1, arg2 };
+    if (type == TCG_TYPE_I32) {
+        tcg_disas3(s, INDEX_op_ld_i32, args);
+        tcg_out_op_t(s, INDEX_op_ld_i32);
+        tcg_out_r(s, ret);
+        tcg_out_r(s, arg1);
+        tcg_out32(s, arg2);
+    } else {
+        assert(type == TCG_TYPE_I64);
+        tcg_disas3(s, INDEX_op_ld_i64, args);
+        tcg_out_op_t(s, INDEX_op_ld_i64);
+        tcg_out_r(s, ret);
+        tcg_out_r(s, arg1);
+        assert(arg2 == (uint32_t)arg2);
+        tcg_out32(s, arg2);
+    }
+}
+
+static void tcg_out_mov(TCGContext *s, int ret, int arg)
+{
+    assert(ret != arg);
+    TCGArg args[2] = { ret, arg };
+#if TCG_TARGET_REG_BITS == 32
+    tcg_disas3(s, INDEX_op_mov_i32, args);
+    tcg_out_op_t(s, INDEX_op_mov_i32);
+#else
+    tcg_disas3(s, INDEX_op_mov_i64, args);
+    tcg_out_op_t(s, INDEX_op_mov_i64);
+#endif
+    tcg_out_r(s, ret);
+    tcg_out_r(s, arg);
+}
+
+static void tcg_out_movi(TCGContext *s, TCGType type,
+                         int t0, tcg_target_long arg)
+{
+    TCGArg args[2] = { t0, arg };
+    uint32_t arg32 = arg;
+    if (type == TCG_TYPE_I32 || arg == arg32) {
+        tcg_disas3(s, INDEX_op_movi_i32, args);
+        tcg_out_op_t(s, INDEX_op_movi_i32);
+        tcg_out_r(s, t0);
+        tcg_out32(s, arg32);
+    } else {
+        assert(type == TCG_TYPE_I64);
+        tcg_disas3(s, INDEX_op_movi_i64, args);
+        tcg_out_op_t(s, INDEX_op_movi_i64);
+        tcg_out_r(s, t0);
+        tcg_out64(s, arg);
+    }
+}
+
+static void tcg_out_op(TCGContext *s, int opc, const TCGArg *args,
+                       const int *const_args)
+{
+    TCGLabel *label;
+    tcg_disas3(s, opc, args);
+    switch (opc) {
+    case INDEX_op_exit_tb:
+        tcg_out_op_t(s, opc);
+        tcg_out64(s, args[0]);
+        break;
+    case INDEX_op_goto_tb:
+        tcg_out_op_t(s, opc);
+        if (s->tb_jmp_offset) {
+            /* Direct jump method. */
+            assert(args[0] < ARRAY_SIZE(s->tb_jmp_offset));
+            s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
+            tcg_out32(s, 0);
+        } else {
+            /* Indirect jump method. */
+            TODO();
+        }
+        assert(args[0] < ARRAY_SIZE(s->tb_next_offset));
+        s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
+        break;
+    case INDEX_op_br:
+        tcg_out_op_t(s, opc);
+        label = &s->labels[args[0]];
+        if (label->has_value) {
+            tcg_out64(s, label->u.value);
+        } else {
+            tcg_out_reloc(s, s->code_ptr, 8, args[0], 0);
+            tcg_out64(s, 0);
+        }
+        break;
+    case INDEX_op_call:
+        tcg_out_op_t(s, opc);
+        tcg_out_ri64(s, const_args[0], args[0]);
+        break;
+    case INDEX_op_jmp:
+        TODO();
+        break;
+    case INDEX_op_movi_i32:
+        TODO();
+        break;
+    case INDEX_op_ld8u_i32:
+    case INDEX_op_ld8s_i32:
+    case INDEX_op_ld16u_i32:
+    case INDEX_op_ld16s_i32:
+    case INDEX_op_ld_i32:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        assert(args[2] == (uint32_t)args[2]);
+        tcg_out32(s, args[2]);
+        break;
+    case INDEX_op_st8_i32:
+    case INDEX_op_st16_i32:
+    case INDEX_op_st_i32:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        //~ assert(const_args[2]);
+        assert(args[2] == (uint32_t)args[2]);
+        tcg_out32(s, args[2]);
+        break;
+    case INDEX_op_add_i32:
+    case INDEX_op_sub_i32:
+    case INDEX_op_and_i32:
+    case INDEX_op_or_i32:
+    case INDEX_op_xor_i32:
+    case INDEX_op_mul_i32:
+    case INDEX_op_shl_i32:
+    case INDEX_op_shr_i32:
+    case INDEX_op_sar_i32:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_ri32(s, const_args[1], args[1]);
+        tcg_out_ri32(s, const_args[2], args[2]);
+        break;
+
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_mov_i64:
+    case INDEX_op_movi_i64:
+        TODO();
+        break;
+    case INDEX_op_ld8u_i64:
+    case INDEX_op_ld8s_i64:
+    case INDEX_op_ld16u_i64:
+    case INDEX_op_ld16s_i64:
+    case INDEX_op_ld32u_i64:
+    case INDEX_op_ld32s_i64:
+    case INDEX_op_ld_i64:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        assert(args[2] == (uint32_t)args[2]);
+        tcg_out32(s, args[2]);
+        break;
+    case INDEX_op_st8_i64:
+    case INDEX_op_st16_i64:
+    case INDEX_op_st32_i64:
+    case INDEX_op_st_i64:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        assert(args[2] == (uint32_t)args[2]);
+        tcg_out32(s, args[2]);
+        break;
+    case INDEX_op_add_i64:
+    case INDEX_op_sub_i64:
+    case INDEX_op_mul_i64:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_ri64(s, const_args[1], args[1]);
+        tcg_out_ri64(s, const_args[2], args[2]);
+        break;
+#ifdef TCG_TARGET_HAS_div_i64
+    case INDEX_op_div_i64:
+    case INDEX_op_divu_i64:
+    case INDEX_op_rem_i64:
+    case INDEX_op_remu_i64:
+        TODO();
+        break;
+#else
+    case INDEX_op_div2_i64:
+    case INDEX_op_divu2_i64:
+        TODO();
+        break;
+#endif
+    case INDEX_op_and_i64:
+    case INDEX_op_or_i64:
+    case INDEX_op_xor_i64:
+    case INDEX_op_shl_i64:
+    case INDEX_op_shr_i64:
+    case INDEX_op_sar_i64:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_ri64(s, const_args[1], args[1]);
+        tcg_out_ri64(s, const_args[2], args[2]);
+        break;
+    case INDEX_op_brcond_i64:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_ri64(s, const_args[1], args[1]);
+        tcg_out8(s, args[2]);           /* condition */
+        label = &s->labels[args[3]];
+        if (label->has_value) {
+            tcg_out64(s, label->u.value);   /* label index */
+        } else {
+            tcg_out_reloc(s, s->code_ptr, 8, args[3], 0);
+            tcg_out64(s, 0);
+        }
+        break;
+#ifdef TCG_TARGET_HAS_not_i64
+    case INDEX_op_not_i64:
+        TODO();
+        break;
+#endif
+#ifdef TCG_TARGET_HAS_neg_i64
+    case INDEX_op_neg_i64:
+        TODO();
+        break;
+#endif
+#endif /* TCG_TARGET_REG_BITS == 64 */
+
+#if defined(TCG_TARGET_HAS_div_i32)
+    case INDEX_op_div_i32:
+    case INDEX_op_divu_i32:
+    case INDEX_op_rem_i32:
+    case INDEX_op_remu_i32:
+    case INDEX_op_mulu2_i32:
+        TODO();
+        break;
+#endif
+#if 0
+    case INDEX_op_add2_i32:
+        TODO();
+        break;
+    case INDEX_op_sub2_i32:
+        TODO();
+        break;
+#endif
+    case INDEX_op_brcond_i32:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_ri32(s, const_args[1], args[1]);
+        tcg_out8(s, args[2]);           /* condition */
+        label = &s->labels[args[3]];
+        if (label->has_value) {
+            tcg_out64(s, label->u.value);   /* label index */
+        } else {
+            tcg_out_reloc(s, s->code_ptr, 8, args[3], 0);
+            tcg_out64(s, 0);
+        }
+        break;
+#if 0
+    case INDEX_op_brcond2_i32:
+        TODO();
+        break;
+#endif
+#if defined(TCG_TARGET_HAS_neg_i32)
+    case INDEX_op_neg_i32:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        break;
+#endif
+#if defined(TCG_TARGET_HAS_not_i32)
+    case INDEX_op_not_i32:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        break;
+#endif
+    case INDEX_op_qemu_ld8u:
+    case INDEX_op_qemu_ld8s:
+    case INDEX_op_qemu_ld16u:
+    case INDEX_op_qemu_ld16s:
+    case INDEX_op_qemu_ld32u:
+    case INDEX_op_qemu_ld32s:
+    case INDEX_op_qemu_ld64:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+#ifdef CONFIG_SOFTMMU
+        tcg_out_i(s, args[2]);
+#endif
+        break;
+    case INDEX_op_qemu_st8:
+    case INDEX_op_qemu_st16:
+    case INDEX_op_qemu_st32:
+    case INDEX_op_qemu_st64:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+#ifdef CONFIG_SOFTMMU
+        tcg_out_i(s, args[2]);
+#endif
+        break;
+#if defined(TCG_TARGET_HAS_ext8s_i32)
+    case INDEX_op_ext8s_i32:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        break;
+#endif
+#if defined(TCG_TARGET_HAS_ext16s_i32)
+    case INDEX_op_ext16s_i32:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        break;
+#endif
+#if defined(TCG_TARGET_HAS_ext32s_i64)
+    case INDEX_op_ext32s_i64:
+        tcg_out_op_t(s, opc);
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        break;
+#endif
+    case INDEX_op_end:
+        TODO();
+        break;
+    default:
+        //~ tcg_dump_ops(s, stderr);
+        TODO();
+        tcg_abort();
+    }
+}
+
+static void tcg_out_st(TCGContext *s, TCGType type, int arg, int arg1,
+                       tcg_target_long arg2)
+{
+    TCGArg args[3] = { arg, arg1, arg2 };
+    if (type == TCG_TYPE_I32) {
+        tcg_disas3(s, INDEX_op_st_i32, args);
+        tcg_out_op_t(s, INDEX_op_st_i32);
+        tcg_out_r(s, arg);
+        tcg_out_r(s, arg1);
+        tcg_out32(s, arg2);
+    } else {
+        assert(type == TCG_TYPE_I64);
+        tcg_disas3(s, INDEX_op_st_i64, args);
+        tcg_out_op_t(s, INDEX_op_st_i64);
+        tcg_out_r(s, arg);
+        tcg_out_r(s, arg1);
+        tcg_out32(s, arg2);
+    }
+}
+
+/* Test if a constant matches the constraint. */
+static int tcg_target_const_match(tcg_target_long val,
+                                  const TCGArgConstraint *arg_ct)
+{
+    /* No need to return 0 or 1, 0 or != 0 is good enough. */
+    return arg_ct->ct & TCG_CT_CONST;
+}
+
+/* Maximum number of register used for input function arguments. */
+static int tcg_target_get_call_iarg_regs_count(int flags)
+{
+    return ARRAY_SIZE(tcg_target_call_iarg_regs);
+}
+
+void tcg_target_init(TCGContext *s)
+{
+    TRACE();
+
+    /* The current code uses uint8_t for tcg operations. */
+    assert(ARRAY_SIZE(tcg_op_defs) <= UINT8_MAX);
+
+    /* Registers available for 32 bit operations. */
+    tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, BIT(TCG_TARGET_NB_REGS) - 1);
+    /* Registers available for 64 bit operations. */
+    tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, BIT(TCG_TARGET_NB_REGS) - 1);
+    /* TODO: Which registers should be set here? */
+    tcg_regset_set32(tcg_target_call_clobber_regs, 0,
+                     BIT(TCG_REG_R0) |
+                     BIT(TCG_REG_R1) |
+                     BIT(TCG_REG_R2) |
+                     BIT(TCG_REG_R3) |
+                     BIT(TCG_REG_R4) |
+                     BIT(TCG_REG_R5) |
+                     BIT(TCG_REG_R6) |
+                     BIT(TCG_REG_R7));
+    /* TODO: Reserved registers. */
+    tcg_regset_clear(s->reserved_regs);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_R4);
+    //~ tcg_regset_set_reg(s->reserved_regs, TCG_REG_R1);
+    tcg_add_target_add_op_defs(tcg_target_op_defs);
+}
+
+/* Generate global QEMU prologue and epilogue code. */
+void tcg_target_qemu_prologue(TCGContext *s)
+{
+    TRACE();
+    tb_ret_addr = s->code_ptr;
+}
+
+/*
+----------------
+IN:
+0xfffffff0:  ljmp   $0xf000,$0xe05b
+
+OP after la:
+ movi_i32 tmp0,$0xf000
+ movi_i32 tmp1,$0xe05b
+ movi_i32 tmp13,$0xffff
+ and_i32 tmp0,tmp0,tmp13
+ st_i32 tmp0,env,$0x50
+ movi_i32 tmp13,$0x4
+ shl_i32 tmp0,tmp0,tmp13
+ st_i32 tmp0,env,$0x54
+ mov_i32 tmp0,tmp1
+ st_i32 tmp0,env,$0x20
+ exit_tb $0x0
+ end
+
+ movi_i32 env,$0xf000 1, 0, 1
+ and_i32 env,env,tmp65530 1, 2, 0
+ st_i32 env,tmp2,$0x50 0, 2, 1
+ shl_i32 env,env,cc_tmp 1, 2, 0
+ st_i32 env,tmp2,$0x54 0, 2, 1
+ movi_i32 env,$0xe05b 1, 0, 1
+ st_i32 env,tmp2,$0x20 0, 2, 1
+ exit_tb $0x0 0, 0, 1
+
+TCG movi_i32 1, 0, 1
+TCG and_i32 1, 2, 0
+TCG st_i32 0, 2, 1
+TCG shl_i32 1, 2, 0
+TCG st_i32 0, 2, 1
+TCG movi_i32 1, 0, 1
+TCG st_i32 0, 2, 1
+TCG exit_tb 0, 0, 1
+*/
diff --git a/tcg/bytecode/tcg-target.h b/tcg/bytecode/tcg-target.h
new file mode 100644
index 0000000..0140a74
--- /dev/null
+++ b/tcg/bytecode/tcg-target.h
@@ -0,0 +1,151 @@ 
+/*
+ * Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2008 Fabrice Bellard
+ * Copyright (c) 2009 Stefan Weil
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/*
+ * This code implements a TCG which does not generate machine code for some
+ * real target machine but which generates virtual machine code for an
+ * interpreter. Interpreted pseudo code is slow, but it works on any host.
+ *
+ * Some remarks might help in understanding the code:
+ *
+ * "target" or "TCG target" is the machine which runs the generated code.
+ * This is different to the usual meaning in QEMU where "target" is the
+ * emulated machine. So normally QEMU host is identical to TCG target.
+ * Here the TCG target is a virtual machine, but this virtual machine must
+ * use the same word size like the real machine.
+ * Therefore, we need both 32 and 64 bit virtual machines (interpreter).
+ */
+
+#if !defined(TCG_TARGET_H)
+#define TCG_TARGET_H
+
+#include "config-host.h"
+
+#define TCG_TARGET_INTERPRETER 1
+
+#ifdef CONFIG_DEBUG_TCG
+/* Enable debug output. */
+#define CONFIG_DEBUG_TCG_INTERPRETER
+#endif
+
+#if !defined(HOST_LONG_BITS)
+#error HOST_LONG_BITS is undefined
+#endif
+
+/* Target word size (must be identical to pointer size). */
+#if HOST_LONG_BITS == 32
+# define TCG_TARGET_REG_BITS 32
+# error Untested code, please test and fix it!
+#elif HOST_LONG_BITS == 64
+# define TCG_TARGET_REG_BITS 64
+#else
+# error Only 32 or 64 bit long support for host
+#endif
+
+#if defined(HOST_WORDS_BIGENDIAN) != defined(TARGET_WORDS_BIGENDIAN)
+# error Code (qemu_st, qemu_ld) needs bswap, please fix it!
+#endif
+
+#if defined(HOST_WORDS_BIGENDIAN)
+# define TCG_TARGET_WORDS_BIGENDIAN
+# error Untested code, please test and fix it!
+#endif
+
+/* Optional instructions. */
+//~ #define TCG_TARGET_HAS_bswap16_i32
+//~ #define TCG_TARGET_HAS_bswap16_i64
+//~ #define TCG_TARGET_HAS_bswap32_i32
+//~ #define TCG_TARGET_HAS_bswap32_i64
+//~ #define TCG_TARGET_HAS_bswap64_i64
+//~ #define TCG_TARGET_HAS_div_i32
+//~ #define TCG_TARGET_HAS_div_i64
+#define TCG_TARGET_HAS_ext8s_i32
+//~ #define TCG_TARGET_HAS_ext16s_i32
+//~ #define TCG_TARGET_HAS_ext8s_i64
+//~ #define TCG_TARGET_HAS_ext16s_i64
+#define TCG_TARGET_HAS_ext32s_i64
+#define TCG_TARGET_HAS_neg_i32
+//~ #define TCG_TARGET_HAS_neg_i64
+#define TCG_TARGET_HAS_not_i32
+//~ #define TCG_TARGET_HAS_not_i64
+//~ #define TCG_TARGET_HAS_rot_i32
+//~ #define TCG_TARGET_HAS_rot_i64
+
+/* Offset to user memory in user mode. */
+#define TCG_TARGET_HAS_GUEST_BASE
+
+/* Number of registers available. */
+#define TCG_TARGET_NB_REGS 8
+//~ #define TCG_TARGET_NB_REGS 16
+//~ #define TCG_TARGET_NB_REGS 32
+
+/* List of registers which are used by TCG. */
+enum {
+    TCG_REG_R0 = 0,
+    TCG_REG_R1,
+    TCG_REG_R2,
+    TCG_REG_R3,
+    TCG_REG_R4,
+    TCG_REG_R5,
+    TCG_REG_R6,
+    TCG_REG_R7,
+    TCG_AREG0 = TCG_REG_R7,
+#if defined(TARGET_ARM)
+    TCG_AREG1 = TCG_REG_R6,
+    TCG_AREG2 = TCG_REG_R5,
+#endif
+#if TCG_TARGET_NB_REGS == 16 || TCG_TARGET_NB_REGS == 32
+    TCG_REG_R8,
+    TCG_REG_R9,
+    TCG_REG_R10,
+    TCG_REG_R11,
+    TCG_REG_R12,
+    TCG_REG_R13,
+    TCG_REG_R14,
+    TCG_REG_R15,
+#endif
+#if TCG_TARGET_NB_REGS == 32
+    TCG_REG_R16,
+    TCG_REG_R17,
+    TCG_REG_R18,
+    TCG_REG_R19,
+    TCG_REG_R20,
+    TCG_REG_R21,
+    TCG_REG_R22,
+    TCG_REG_R23,
+    TCG_REG_R24,
+    TCG_REG_R25,
+    TCG_REG_R26,
+    TCG_REG_R27,
+    TCG_REG_R28,
+    TCG_REG_R29,
+    TCG_REG_R30,
+    TCG_REG_R31
+#endif
+};
+
+void tci_disas(uint8_t opc);
+
+#endif /* TCG_TARGET_H */
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 93066e2..6825882 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -155,6 +155,12 @@  int gen_new_label(void)
     return idx;
 }
 
+/* Forward declarations for functions which may be used in tcg-target.c. */
+static char *tcg_get_arg_str_idx(TCGContext *s, char *buf, int buf_size,
+                                 int idx);
+static TCGHelperInfo *tcg_find_helper(TCGContext *s, tcg_target_ulong val);
+static const char * const cond_name[10];
+
 #include "tcg-target.c"
 
 /* pool based memory allocation */
@@ -696,13 +702,15 @@  static void tcg_reg_alloc_start(TCGContext *s)
 static char *tcg_get_arg_str_idx(TCGContext *s, char *buf, int buf_size,
                                  int idx)
 {
-    TCGTemp *ts;
+    TCGTemp *ts = NULL;
 
-    ts = &s->temps[idx];
-    if (idx < s->nb_globals) {
+    if (idx >= 0 && idx < s->nb_temps) {
+        ts = &s->temps[idx];
+    }
+    if (ts && idx < s->nb_globals) {
         pstrcpy(buf, buf_size, ts->name);
     } else {
-        if (ts->temp_local) 
+        if (ts && ts->temp_local)
             snprintf(buf, buf_size, "loc%d", idx - s->nb_globals);
         else
             snprintf(buf, buf_size, "tmp%d", idx - s->nb_globals);
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 4304e2a..df15372 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -457,7 +457,9 @@  const TCGArg *tcg_gen_code_op(TCGContext *s, int opc, const TCGArg *args1,
                               unsigned int dead_iargs);
 
 extern uint8_t code_gen_prologue[];
-#if defined(_ARCH_PPC) && !defined(_ARCH_PPC64)
+#if defined(CONFIG_TCG_INTERPRETER)
+unsigned long tcg_qemu_tb_exec(uint8_t *tb_ptr);
+#elif defined(_ARCH_PPC) && !defined(_ARCH_PPC64)
 #define tcg_qemu_tb_exec(tb_ptr) \
     ((long REGPARM __attribute__ ((longcall)) (*)(void *))code_gen_prologue)(tb_ptr)
 #else
diff --git a/tcg/tci.c b/tcg/tci.c
new file mode 100644
index 0000000..3cf79a6
--- /dev/null
+++ b/tcg/tci.c
@@ -0,0 +1,952 @@ 
+/*
+ * Tiny Code Interpreter for QEMU
+ *
+ * Copyright (c) 2009 Stefan Weil
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <stdbool.h>    /* bool */
+
+#include "config.h"
+#include "qemu-common.h"
+#include "tcg-op.h"
+
+/* Marker for missing code. */
+#define TODO() \
+    fprintf(stderr, "TODO %s:%u: %s()\n", __FILE__, __LINE__, __FUNCTION__); \
+    tcg_abort()
+
+/* Trace message to see program flow. */
+#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
+#define TRACE() \
+    fprintf(stderr, "TCG %s:%u: %s()\n", __FILE__, __LINE__, __FUNCTION__)
+#else
+#define TRACE() ((void)0)
+#endif
+
+typedef tcg_target_ulong (*helper_function)(tcg_target_ulong, tcg_target_ulong,
+                                            tcg_target_ulong, tcg_target_ulong);
+
+#if defined(TARGET_I386)
+struct CPUX86State *env;
+#else
+#error Target support missing, please fix!
+#endif
+
+static tcg_target_ulong tci_reg[TCG_TARGET_NB_REGS];
+
+static tcg_target_ulong tci_read_reg(uint32_t index)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    return tci_reg[index];
+}
+
+static uint8_t tci_read_reg8(uint32_t index)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    return (uint8_t)(tci_reg[index]);
+}
+
+static int8_t tci_read_reg8s(uint32_t index)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    return (int8_t)(tci_reg[index]);
+}
+
+static uint16_t tci_read_reg16(uint32_t index)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    return (uint16_t)(tci_reg[index]);
+}
+
+static uint32_t tci_read_reg32(uint32_t index)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    return (uint32_t)(tci_reg[index]);
+}
+
+static int32_t tci_read_reg32s(uint32_t index)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    return (int32_t)(tci_reg[index]);
+}
+
+#if 0
+static int32_t tci_read_reg32s(uint32_t index)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    return (int32_t)(tci_reg[index]);
+}
+#endif
+
+static uint64_t tci_read_reg64(uint32_t index)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    return tci_reg[index];
+}
+
+#if 0
+static void tcg_write_reg(uint32_t index, tcg_target_ulong value)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    assert(index != TCG_AREG0);
+    tci_reg[index] = value;
+}
+#endif
+
+static void tci_write_reg8(uint32_t index, uint8_t value)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    assert(index != TCG_AREG0);
+    tci_reg[index] = value;
+}
+
+static void tci_write_reg8s(uint32_t index, int8_t value)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    assert(index != TCG_AREG0);
+    tci_reg[index] = value;
+}
+
+static void tci_write_reg16(uint32_t index, uint16_t value)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    assert(index != TCG_AREG0);
+    tci_reg[index] = value;
+}
+
+static void tci_write_reg32(uint32_t index, uint32_t value)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    assert(index != TCG_AREG0);
+    tci_reg[index] = value;
+}
+
+static void tci_write_reg32s(uint32_t index, int32_t value)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    assert(index != TCG_AREG0);
+    tci_reg[index] = value;
+}
+
+static void tci_write_reg64(uint32_t index, uint64_t value)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    assert(index != TCG_AREG0);
+    tci_reg[index] = value;
+}
+
+/* Read constant (native size) from bytecode. */
+static tcg_target_ulong tci_read_i(uint8_t **tb_ptr)
+{
+    tcg_target_ulong value = *(tcg_target_ulong *)(*tb_ptr);
+    *tb_ptr += sizeof(tcg_target_ulong);
+    return value;
+}
+
+/* Read constant (32 bit) from bytecode. */
+static uint32_t tci_read_i32(uint8_t **tb_ptr)
+{
+    uint32_t value = *(uint32_t *)(*tb_ptr);
+    *tb_ptr += 4;
+    return value;
+}
+
+/* Read constant (64 bit) from bytecode. */
+static uint64_t tci_read_i64(uint8_t **tb_ptr)
+{
+    uint64_t value = *(uint64_t *)(*tb_ptr);
+    *tb_ptr += 8;
+    return value;
+}
+
+/* Read indexed register (native size) from bytecode. */
+static tcg_target_ulong tci_read_r(uint8_t **tb_ptr)
+{
+    tcg_target_ulong value = tci_read_reg(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+
+/* Read indexed register (8 bit) from bytecode. */
+static uint8_t tci_read_r8(uint8_t **tb_ptr)
+{
+    uint8_t value = tci_read_reg8(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+
+/* Read indexed register (8 bit signed) from bytecode. */
+static int8_t tci_read_r8s(uint8_t **tb_ptr)
+{
+    int8_t value = tci_read_reg8s(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+
+/* Read indexed register (16 bit) from bytecode. */
+static uint16_t tci_read_r16(uint8_t **tb_ptr)
+{
+    uint16_t value = tci_read_reg16(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+
+/* Read indexed register (32 bit) from bytecode. */
+static uint32_t tci_read_r32(uint8_t **tb_ptr)
+{
+    uint32_t value = tci_read_reg32(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+
+/* Read indexed register (32 bit signed) from bytecode. */
+static int32_t tci_read_r32s(uint8_t **tb_ptr)
+{
+    int32_t value = tci_read_reg32s(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+
+/* Read indexed register (64 bit) from bytecode. */
+static uint64_t tci_read_r64(uint8_t **tb_ptr)
+{
+    uint64_t value = tci_read_reg64(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+
+/* Read indexed register or constant (32 bit) from bytecode. */
+static uint32_t tci_read_ri32(uint8_t **tb_ptr)
+{
+    bool const_arg;
+    uint32_t value;
+    const_arg = **tb_ptr;
+    *tb_ptr += 1;
+    if (const_arg) {
+        value = tci_read_i32(tb_ptr);
+    } else {
+        value = tci_read_r32(tb_ptr);
+    }
+    return value;
+}
+
+/* Read indexed register or constant (64 bit) from bytecode. */
+static uint64_t tci_read_ri64(uint8_t **tb_ptr)
+{
+    bool const_arg;
+    uint64_t value;
+    const_arg = **tb_ptr;
+    *tb_ptr += 1;
+    if (const_arg) {
+        value = tci_read_i64(tb_ptr);
+    } else {
+        value = tci_read_r64(tb_ptr);
+    }
+    return value;
+}
+
+static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
+{
+    bool result = false;
+    int32_t i0 = u0;
+    int32_t i1 = u1;
+    switch (condition) {
+        case TCG_COND_EQ:
+            result = (u0 == u1);
+            break;
+        case TCG_COND_NE:
+            result = (u0 != u1);
+            break;
+        case TCG_COND_LT:
+            result = (i0 < i1);
+            break;
+        case TCG_COND_GE:
+            result = (i0 >= i1);
+            break;
+        case TCG_COND_LE:
+            result = (i0 <= i1);
+            break;
+        case TCG_COND_GT:
+            result = (i0 > i1);
+            break;
+        case TCG_COND_LTU:
+            result = (u0 < u1);
+            break;
+        case TCG_COND_GEU:
+            result = (u0 >= u1);
+            break;
+        case TCG_COND_LEU:
+            result = (u0 <= u1);
+            break;
+        case TCG_COND_GTU:
+            result = (u0 > u1);
+            break;
+        default:
+            TODO();
+    }
+    return result;
+}
+
+static bool tci_compare64(uint64_t u0, uint64_t u1, TCGCond condition)
+{
+    bool result = false;
+    int64_t i0 = u0;
+    int64_t i1 = u1;
+    switch (condition) {
+        case TCG_COND_EQ:
+            result = (u0 == u1);
+            break;
+        case TCG_COND_NE:
+            result = (u0 != u1);
+            break;
+        case TCG_COND_LT:
+            result = (i0 < i1);
+            break;
+        case TCG_COND_GE:
+            result = (i0 >= i1);
+            break;
+        case TCG_COND_LE:
+            result = (i0 <= i1);
+            break;
+        case TCG_COND_GT:
+            result = (i0 > i1);
+            break;
+        case TCG_COND_LTU:
+            result = (u0 < u1);
+            break;
+        case TCG_COND_GEU:
+            result = (u0 >= u1);
+            break;
+        case TCG_COND_LEU:
+            result = (u0 <= u1);
+            break;
+        case TCG_COND_GTU:
+            result = (u0 > u1);
+            break;
+        default:
+            TODO();
+    }
+    return result;
+}
+
+/* Interpret pseudo code in tb. */
+unsigned long tcg_qemu_tb_exec(uint8_t *tb_ptr)
+{
+    unsigned long next_tb = 0;
+
+    TRACE();
+
+    tci_reg[TCG_AREG0] = (tcg_target_ulong)env;
+
+    for (;;) {
+        uint8_t opc = *(uint8_t *)tb_ptr++;
+        tcg_target_ulong t0, t1, t2;
+        tcg_target_ulong label;
+        TCGCond condition;
+        tci_disas(opc);
+
+        if (opc == INDEX_op_exit_tb) {
+            next_tb = *(uint64_t *)tb_ptr;
+            break;
+        }
+
+        switch (opc) {
+        case INDEX_op_end:
+        case INDEX_op_nop:
+            break;
+        case INDEX_op_nop1:
+        case INDEX_op_nop2:
+        case INDEX_op_nop3:
+        case INDEX_op_nopn:
+        case INDEX_op_discard:
+            TODO();
+            break;
+        case INDEX_op_set_label:
+            TODO();
+            break;
+        case INDEX_op_call:
+            t0 = tci_read_ri64(&tb_ptr);
+            t0 = ((helper_function)t0)(tci_read_reg(TCG_REG_R0),
+                                       tci_read_reg(TCG_REG_R1),
+                                       tci_read_reg(TCG_REG_R2),
+                                       tci_read_reg(TCG_REG_R3));
+            tci_write_reg32(TCG_REG_R0, t0);
+            break;
+        case INDEX_op_jmp:
+        case INDEX_op_br:
+            t0 = *(uint64_t *)tb_ptr;
+            tb_ptr = (uint8_t *)t0;
+            break;
+        case INDEX_op_mov_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r32(&tb_ptr);
+            tci_write_reg32(t0, t1);
+            break;
+        case INDEX_op_movi_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_i32(&tb_ptr);
+            tci_write_reg32(t0, t1);
+            break;
+    /* Load/store operations. */
+        case INDEX_op_ld8u_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            tci_write_reg8(t0, *(uint8_t *)(t1 + t2));
+            break;
+        case INDEX_op_ld8s_i32:
+        case INDEX_op_ld16u_i32:
+            TODO();
+            break;
+        case INDEX_op_ld16s_i32:
+            TODO();
+            break;
+        case INDEX_op_ld_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            tci_write_reg32(t0, *(uint32_t *)(t1 + t2));
+            break;
+        case INDEX_op_st8_i32:
+            t0 = tci_read_r8(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            *(uint8_t *)(t1 + t2) = t0;
+            break;
+        case INDEX_op_st16_i32:
+            t0 = tci_read_r16(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            *(uint16_t *)(t1 + t2) = t0;
+            break;
+        case INDEX_op_st_i32:
+            t0 = tci_read_r32(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            *(uint32_t *)(t1 + t2) = t0;
+            break;
+    /* Arithmetic operations. */
+        case INDEX_op_add_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 + t2);
+            break;
+        case INDEX_op_sub_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 - t2);
+            break;
+        case INDEX_op_mul_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 * t2);
+            break;
+#ifdef TCG_TARGET_HAS_div_i32
+        case INDEX_op_div_i32:
+        case INDEX_op_divu_i32:
+        case INDEX_op_rem_i32:
+        case INDEX_op_remu_i32:
+            TODO();
+            break;
+#else
+        case INDEX_op_div2_i32:
+        case INDEX_op_divu2_i32:
+            TODO();
+            break;
+#endif
+        case INDEX_op_and_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 & t2);
+            break;
+        case INDEX_op_or_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 | t2);
+            break;
+        case INDEX_op_xor_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 ^ t2);
+            break;
+    /* Shift/rotate operations. */
+        case INDEX_op_shl_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 << t2);
+            break;
+        case INDEX_op_shr_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 >> t2);
+            break;
+        case INDEX_op_sar_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, (t1 >> t2) | (t1 & (1UL << 31)));
+            break;
+#ifdef TCG_TARGET_HAS_rot_i32
+        case INDEX_op_rotl_i32:
+        case INDEX_op_rotr_i32:
+            TODO();
+            break;
+#endif
+        case INDEX_op_brcond_i32:
+            t0 = tci_read_r32(&tb_ptr);
+            t1 = tci_read_ri32(&tb_ptr);
+            condition = *tb_ptr++;
+            label = tci_read_i64(&tb_ptr);
+            if (tci_compare32(t0, t1, condition)) {
+                tb_ptr = (uint8_t *)label;
+            }
+            break;
+#if TCG_TARGET_REG_BITS == 32
+        case INDEX_op_add2_i32:
+        case INDEX_op_sub2_i32:
+        case INDEX_op_brcond2_i32:
+        case INDEX_op_mulu2_i32:
+            TODO();
+            break;
+#endif
+#ifdef TCG_TARGET_HAS_ext8s_i32
+        case INDEX_op_ext8s_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r8s(&tb_ptr);
+            tci_write_reg32(t0, t1);
+            break;
+#endif
+#ifdef TCG_TARGET_HAS_ext16s_i32
+        case INDEX_op_ext16s_i32:
+            TODO();
+            break;
+#endif
+#ifdef TCG_TARGET_HAS_bswap16_i32
+        case INDEX_op_bswap16_i32:
+            TODO();
+            break;
+#endif
+#ifdef TCG_TARGET_HAS_bswap32_i32
+        case INDEX_op_bswap32_i32:
+            TODO();
+            break;
+#endif
+#ifdef TCG_TARGET_HAS_not_i32
+        case INDEX_op_not_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r32(&tb_ptr);
+            tci_write_reg32(t0, ~t1);
+            break;
+#endif
+#ifdef TCG_TARGET_HAS_neg_i32
+        case INDEX_op_neg_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r32(&tb_ptr);
+            tci_write_reg32(t0, -t1);
+            break;
+#endif
+#if TCG_TARGET_REG_BITS == 64
+        case INDEX_op_mov_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r64(&tb_ptr);
+            tci_write_reg64(t0, t1);
+            break;
+        case INDEX_op_movi_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_i64(&tb_ptr);
+            tci_write_reg64(t0, t1);
+            break;
+    /* Load/store operations. */
+        case INDEX_op_ld8u_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            tci_write_reg8(t0, *(uint8_t *)(t1 + t2));
+            break;
+        case INDEX_op_ld8s_i64:
+        case INDEX_op_ld16u_i64:
+        case INDEX_op_ld16s_i64:
+            TODO();
+            break;
+        case INDEX_op_ld32u_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            tci_write_reg32(t0, *(uint32_t *)(t1 + t2));
+            break;
+        case INDEX_op_ld32s_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            tci_write_reg32s(t0, *(int32_t *)(t1 + t2));
+            break;
+        case INDEX_op_ld_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            tci_write_reg64(t0, *(uint64_t *)(t1 + t2));
+            break;
+        case INDEX_op_st8_i64:
+            t0 = tci_read_r8(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            *(uint8_t *)(t1 + t2) = t0;
+            break;
+        case INDEX_op_st16_i64:
+            t0 = tci_read_r16(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            *(uint16_t *)(t1 + t2) = t0;
+            break;
+        case INDEX_op_st32_i64:
+            t0 = tci_read_r32(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            *(uint32_t *)(t1 + t2) = t0;
+            break;
+        case INDEX_op_st_i64:
+            t0 = tci_read_r64(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            *(uint64_t *)(t1 + t2) = t0;
+            break;
+    /* Arithmetic operations. */
+        case INDEX_op_add_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 + t2);
+            break;
+        case INDEX_op_sub_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 - t2);
+            break;
+        case INDEX_op_mul_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 * t2);
+            break;
+#ifdef TCG_TARGET_HAS_div_i64
+        case INDEX_op_div_i64:
+        case INDEX_op_divu_i64:
+        case INDEX_op_rem_i64:
+        case INDEX_op_remu_i64:
+            TODO();
+            break;
+#else
+        case INDEX_op_div2_i64:
+        case INDEX_op_divu2_i64:
+            TODO();
+            break;
+#endif
+        case INDEX_op_and_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 & t2);
+            break;
+        case INDEX_op_or_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 | t2);
+            break;
+        case INDEX_op_xor_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 ^ t2);
+            break;
+    /* Shift/rotate operations. */
+        case INDEX_op_shl_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 << t2);
+            break;
+        case INDEX_op_shr_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 >> t2);
+            break;
+        case INDEX_op_sar_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, (t1 >> t2) | (t1 & (1ULL << 63)));
+            break;
+#ifdef TCG_TARGET_HAS_rot_i64
+        case INDEX_op_rotl_i64:
+        case INDEX_op_rotr_i64:
+            TODO();
+            break;
+#endif
+        case INDEX_op_brcond_i64:
+            t0 = tci_read_r64(&tb_ptr);
+            t1 = tci_read_ri64(&tb_ptr);
+            condition = *tb_ptr++;
+            label = tci_read_i(&tb_ptr);
+            if (tci_compare64(t0, t1, condition)) {
+                tb_ptr = (uint8_t *)label;
+            }
+            break;
+#ifdef TCG_TARGET_HAS_ext8s_i64
+        case INDEX_op_ext8s_i64:
+            TODO();
+            break;
+#endif
+#ifdef TCG_TARGET_HAS_ext16s_i64
+        case INDEX_op_ext16s_i64:
+            TODO();
+            break;
+#endif
+#ifdef TCG_TARGET_HAS_ext32s_i64
+        case INDEX_op_ext32s_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r32s(&tb_ptr);
+            tci_write_reg64(t0, t1);
+            break;
+#endif
+#ifdef TCG_TARGET_HAS_bswap16_i64
+        case INDEX_op_bswap16_i64:
+            TODO();
+            break;
+#endif
+#ifdef TCG_TARGET_HAS_bswap32_i64
+        case INDEX_op_bswap32_i64:
+            TODO();
+            break;
+#endif
+#ifdef TCG_TARGET_HAS_bswap64_i64
+        case INDEX_op_bswap64_i64:
+            TODO();
+            break;
+#endif
+#ifdef TCG_TARGET_HAS_not_i64
+        case INDEX_op_not_i64:
+            TODO();
+            break;
+#endif
+#ifdef TCG_TARGET_HAS_neg_i64
+        case INDEX_op_neg_i64:
+            TODO();
+            break;
+#endif
+#endif /* TCG_TARGET_REG_BITS == 64 */
+    /* QEMU specific */
+#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
+        case INDEX_op_debug_insn_start:
+            TODO();
+            break;
+#else
+        case INDEX_op_debug_insn_start:
+            TODO();
+            break;
+#endif
+        case INDEX_op_exit_tb:
+            TODO();
+            break;
+        case INDEX_op_goto_tb:
+            t0 = tci_read_i32(&tb_ptr);
+            tb_ptr += (int32_t)t0;
+            break;
+#if TCG_TARGET_REG_BITS == 32
+#if TARGET_LONG_BITS == 32
+        case INDEX_op_qemu_ld8u:
+            TODO();
+            break;
+#else
+        case INDEX_op_qemu_ld8u:
+            TODO();
+            break;
+#endif
+#if TARGET_LONG_BITS == 32
+        case INDEX_op_qemu_ld8s:
+            TODO();
+            break;
+#else
+        case INDEX_op_qemu_ld8s:
+            TODO();
+            break;
+#endif
+#if TARGET_LONG_BITS == 32
+        case INDEX_op_qemu_ld16u:
+            TODO();
+            break;
+#else
+        case INDEX_op_qemu_ld16u:
+            TODO();
+            break;
+#endif
+#if TARGET_LONG_BITS == 32
+        case INDEX_op_qemu_ld16s:
+            TODO();
+            break;
+#else
+        case INDEX_op_qemu_ld16s:
+            TODO();
+            break;
+#endif
+#if TARGET_LONG_BITS == 32
+        case INDEX_op_qemu_ld32u:
+            TODO();
+            break;
+#else
+        case INDEX_op_qemu_ld32u:
+            TODO();
+            break;
+#endif
+        case INDEX_op_qemu_ld32s:
+            TODO();
+            break;
+#if TARGET_LONG_BITS == 32
+        case INDEX_op_qemu_ld64:
+            TODO();
+            break;
+#else
+        case INDEX_op_qemu_ld64:
+            TODO();
+            break;
+#endif
+#else /* TCG_TARGET_REG_BITS == 32 */
+        case INDEX_op_qemu_ld8u:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            t2 = tci_read_i(&tb_ptr);
+            tci_write_reg8(t0, __ldb_mmu(t1, t2));
+#else
+            tci_write_reg8(t0, *(uint8_t *)(t1 + GUEST_BASE));
+#endif
+            break;
+        case INDEX_op_qemu_ld8s:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            t2 = tci_read_i(&tb_ptr);
+            tci_write_reg8s(t0, __ldb_mmu(t1, t2));
+#else
+            tci_write_reg8s(t0, *(int8_t *)(t1 + GUEST_BASE));
+#endif
+            break;
+        case INDEX_op_qemu_ld16u:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            t2 = tci_read_i(&tb_ptr);
+            tci_write_reg16(t0, __ldw_mmu(t1, t2));
+#else
+            tci_write_reg16(t0, *(uint16_t *)(t1 + GUEST_BASE));
+#endif
+            break;
+        case INDEX_op_qemu_ld16s:
+            TODO();
+            break;
+        case INDEX_op_qemu_ld32u:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            t2 = tci_read_i(&tb_ptr);
+            tci_write_reg32(t0, __ldl_mmu(t1, t2));
+#else
+            tci_write_reg32(t0, *(uint32_t *)(t1 + GUEST_BASE));
+#endif
+            break;
+        case INDEX_op_qemu_ld32s:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            t2 = tci_read_i(&tb_ptr);
+            tci_write_reg32s(t0, __ldl_mmu(t1, t2));
+#else
+            tci_write_reg32s(t0, *(int32_t *)(t1 + GUEST_BASE));
+#endif
+            break;
+        case INDEX_op_qemu_ld64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            t2 = tci_read_i(&tb_ptr);
+            tci_write_reg64(t0, __ldq_mmu(t1, t2));
+#else
+            tci_write_reg64(t0, *(uint64_t *)(t1 + GUEST_BASE));
+#endif
+            break;
+#endif /* TCG_TARGET_REG_BITS != 32 */
+        case INDEX_op_qemu_st8:
+            t0 = tci_read_r8(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            t2 = tci_read_i(&tb_ptr);
+            __stb_mmu(t1, t0, t2);
+#else
+            *(uint8_t *)(t1 + GUEST_BASE) = t0;
+#endif
+            break;
+        case INDEX_op_qemu_st16:
+            t0 = tci_read_r16(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            t2 = tci_read_i(&tb_ptr);
+            __stw_mmu(t1, t0, t2);
+#else
+            *(uint16_t *)(t1 + GUEST_BASE) = t0;
+#endif
+            break;
+        case INDEX_op_qemu_st32:
+            t0 = tci_read_r32(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            t2 = tci_read_i(&tb_ptr);
+            __stl_mmu(t1, t0, t2);
+#else
+            *(uint32_t *)(t1 + GUEST_BASE) = t0;
+#endif
+            break;
+        case INDEX_op_qemu_st64:
+            t0 = tci_read_r64(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            t2 = tci_read_i(&tb_ptr);
+            __stq_mmu(t1, t0, t2);
+#else
+            *(uint64_t *)(t1 + GUEST_BASE) = t0;
+#endif
+            break;
+        default:
+            TODO();
+            break;
+        }
+    }
+    return next_tb;
+}