diff mbox

Porting TCG to alpha platform

Message ID 682404.21141.qm@web15904.mail.cnb.yahoo.com
State New
Headers show

Commit Message

identifier scorpio Jan. 20, 2010, 5:19 p.m. UTC
Thank you all, especially Richard, for reviewing, and I'v partly amended my code according to you advices, but the result is not very encouraging, I can still run linux-0.2.img image and still can't run MS windows. I think that most of your advices are related to performance and may significantly reduce the TB size. Below i'll append my newly generated patch against stable-0.10, in case it is mangled, i also put it in the attachment.

now I have some answers for your doubts.

> > +static int target_parse_constraint(TCGArgConstraint
> *ct, const char **pct_str)
> > +{
> > +    const char *ct_str = *pct_str;
> > +
> > +    switch(ct_str[0])
> > +    {
> > +    case 'r':
> ...
> > +    case 'L':
> 
> Do you really need extra temporaries for L?  You
> already have 3.

in qemu_ld/st, we must use $16,$17,$18 as temporaries, because pass them as argument to helper functions such as qemu_ld/st_helpers[].

... 
> Err.. "8 insns"?  You'd only ever need to output
> 5.  Also, why would you ever want to explicitly never
> elide one of these insns if you could? Say, if only L0 and
> L3 were non-zero?
> 

yes, the number of output instructions is 5, and my comment is a bit out-of-date.
your method here is more elegant and I'll migrate to your "tcg_out_op_long()" version tomorrow.

... 
> With I/J constraints you don't need this special casing.

I'm not very familiar with I/J constraints and i'll study them later.

...
> > +        tcg_out_reloc(s,
> s->code_ptr, R_ALPHA_REFQUAD, label_index, 0);
> > +        s->code_ptr += 4;
> 
> I realize that it doesn't really matter what value you use
> here, so long as things are consistent with patch_reloc, but
> it'll be less confusing if you use the proper relocation
> type: R_ALPHA_BRADDR.
> 

you are right, R_ALPHA_BRADDR is more clear.

...
> > +        tcg_out_inst2(s, opc^4,
> TMP_REG1, 1);
> > +    /* record relocation infor */
> > +        tcg_out_reloc(s,
> s->code_ptr, R_ALPHA_REFQUAD, label_index, 0);
> > +        s->code_ptr += 4;
> 
> Bug: You've applied the relocation to the wrong
> instruction.
> Bug: What's with the "opc^4"?
> 

what did you mean that i "applied the relocation to the wrong
instruction", couldn't i apply relocation to INDEX_op_brcond_i32 operation?
and opc^4 here is used to toggle between OP_BLBC(opcode 0x38) and OP_BLBS(opcode 0x3c), ugly code :)

...
> > +    /* if VM is of 32-bit arch, clear
> higher 32-bit of addr */
> 
> Use a zapnot insn for this.

zapnot is a good thing.

...
> You don't need to push/pop anything here.  $26 should
> be saved by the prologue we emitted, and $15 is
> call-saved.  What you could usefully do is define a
> register constraint for $27 so that TCG automatically loads
> the value into that register and saves you a register move
> here.

I push/pop them here just for safe.

> 
> > +    case INDEX_op_sar_i32:
> > +        tcg_out_inst4i(s,
> OP_SHIFT, args[1], 32, FUNC_SLL, args[1]);
> > +        tcg_out_inst4i(s,
> OP_SHIFT, args[1], 32, FUNC_SRA, args[1]);
> 
> That last shift can be combined with the requested shift
> via addition. For constant input, this saves an insn; for
> register input, the addition can be done in parallel with
> the first shift.

i changed to use "addl r, 0, r" here.

> For comparing 32-bit inputs, it doesn't actually matter how
> you extend the inputs, so long as you do it the same for
> both inputs.  Therefore the best solution here is to
> sign-extend both inputs with "addl r,0,r".  Note as
> well that you don't need temporaries, as the inputs only
> have 32-bits defined; high bits are garbage in, garbage
> out.

i changed to use "addl r, 0, r" here too.

> You'll also want to define INDEX_op_ext32s_i64 as "addl r,0,r".

added.
 
> > +    case INDEX_op_div2_i32:
> > +    case INDEX_op_divu2_i32:
> 
> Don't define these, but you will need to define
> 
>   div_i32, divu_i32, rem_i32, remu_i32
>   div_i64, divu_i64, rem_i64, remu_i64

I think when qemu met x86 divide instructions, it will call helper functions to simulate them, must i define div_i32/divu_i32/...?

... 
> > +    tcg_out_push(s, TCG_REG_26);
> > +    tcg_out_push(s, TCG_REG_27);
> > +    tcg_out_push(s, TCG_REG_28);
> > +    tcg_out_push(s, TCG_REG_29);
> 
> Of these only $26 needs to be saved.

also, i save them for safe.

++++++++++++++++++++++++++++++++++
below is the newest patch ...

From 7cc2acddfb7333ab3f1f6b17fa8fa5dcdd3c0095 Mon Sep 17 00:00:00 2001
From: Dong Weiyu <cidentifier@yahoo.com.cn>
Date: Wed, 20 Jan 2010 23:48:55 +0800
Subject: [PATCH] Porting TCG to alpha platform.

---
 cpu-all.h              |    2 +-
 tcg/alpha/tcg-target.c | 1196 ++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/alpha/tcg-target.h |   70 +++
 3 files changed, 1267 insertions(+), 1 deletions(-)
 create mode 100644 tcg/alpha/tcg-target.c
 create mode 100644 tcg/alpha/tcg-target.h

+

+/* used for function call generation */

+#define TCG_REG_CALL_STACK TCG_REG_30
+#define TCG_TARGET_STACK_ALIGN 16
+#define TCG_TARGET_CALL_STACK_OFFSET 0
+
+/* we have signed extension instructions */
+#define TCG_TARGET_HAS_ext8s_i32
+#define TCG_TARGET_HAS_ext16s_i32
+#define TCG_TARGET_HAS_ext8s_i64
+#define TCG_TARGET_HAS_ext16s_i64
+#define TCG_TARGET_HAS_ext32s_i64
+
+/* Note: must be synced with dyngen-exec.h */

+#define TCG_AREG0 TCG_REG_15
+#define TCG_AREG1 TCG_REG_9
+#define TCG_AREG2 TCG_REG_10
+#define TCG_AREG3 TCG_REG_11
+#define TCG_AREG4 TCG_REG_12
+#define TCG_AREG5 TCG_REG_13
+#define TCG_AREG6 TCG_REG_14
+

+#define TMP_REG1 TCG_REG_23
+#define TMP_REG2 TCG_REG_24
+#define TMP_REG3 TCG_REG_25
+

+static inline void flush_icache_range(unsigned long start, unsigned long stop)
+{
+    __asm__ __volatile__ ("call_pal 0x86");
+}
+

Comments

Richard Henderson Jan. 20, 2010, 9:26 p.m. UTC | #1
On 01/20/2010 09:19 AM, identifier scorpio wrote:
> Below i'll append my newly generated patch against stable-0.10,
> in case it is mangled, i also put it in the attachment.

For the record, the inline portion was again mangled; the attachment is 
fine.

> in qemu_ld/st, we must use $16,$17,$18 as temporaries,
> because pass them as argument to helper functions such as qemu_ld/st_helpers[].

Ah, yes, I forgot about that.

>> With I/J constraints you don't need this special casing.
>
> I'm not very familiar with I/J constraints and i'll study them later.

I = uint8_t, to be used with the second arithmetic input.
J = 0, to be used with the first arithmetic input (i.e. $31).

>>> +        tcg_out_inst2(s, opc^4,
>> TMP_REG1, 1);
>>> +    /* record relocation infor */
>>> +        tcg_out_reloc(s,
>> s->code_ptr, R_ALPHA_REFQUAD, label_index, 0);
>>> +        s->code_ptr += 4;
>>
>> Bug: You've applied the relocation to the wrong
>> instruction.
>> Bug: What's with the "opc^4"?
>>
>
> what did you mean that i "applied the relocation to the wrong
> instruction", couldn't i apply relocation to INDEX_op_brcond_i32 operation?

With tcg_out_inst2 you emit the branch instruction, which calls 
tcg_out32, which increments s->code_ptr.  Next you call tcg_out_reloc 
with the updated s->code_ptr, which means that the relocation gets 
applied to the instruction *after* the branch.  Finally, you increment 
s->code_ptr *again*, which means that the instruction after the branch 
is in fact completely uninitialized.

> and opc^4 here is used to toggle between OP_BLBC(opcode 0x38) and OP_BLBS(opcode 0x3c), ugly code :)

It does beg the question of why you're reversing the sense of the jump 
at all.  Just because the branch is forward doesn't mean its condition 
should be changed.  I think that's definitely a bug.  The sense of the 
jump should be exactly the same, never mind the direction of the jump.

Oh... I see what you're doing here -- you're generating the entire 
branch instruction in patch_reloc, and you're generating branch over 
branch here.  That's both confusing and unnecessary.

We should do

static void patch_reloc(uint8_t *x_ptr, int type,
                         tcg_target_long value,
                         tcg_target_long addend)
{
     uint32_t *code_ptr = (uint32_t *)x;
     uint32_t insn = *code_ptr;

     value += addend;
     switch (type) {
     case R_ALPHA_BRADDR:
         value -= (long)x_ptr + 4;
         if ((value & 3) || value < -0x400000 || value >= 0x400000) {
             tcg_abort();
         }
         *code_ptr = insn | INSN_DISP21(val >> 2);
         break;

     default:
         tcg_abort();
     }
}

so as to apply the branch address relocation to the existing insn.
Which lets you write

static void tcg_out_br(TCGContext *s, int opc, int ra, int label_index)
{
     TCGLabel *l = &s->labels[label_index];
     tcg_target_long value;

     if (l->has_value) {
         value = l->u.value;
         value -= (long)s->code_ptr + 4;
         if ((value & 3) || value < -0x400000 || value >= 0x400000) {
             tcg_abort();
         }
         value >>= 2;
     } else {
         tcg_out_reloc(s, s->code_ptr, R_ALPHA_BRADDR, label_index, 0);
         value = 0;
     }
     tcg_out_fmt_br(s, opc, ra, value);
}

static void tcg_out_brcond(TCGContext *s, ...)
{
     // Emit comparison into TMP_REG1.

     opc = (cond & 1) ? INSN_BLBC : INSN_BLBS;
     tcg_out_br(s, opc, TMP_REG1, label_index);
}

Isn't that much clearer?

> +    tcg_out_mov(s, r1, addr_reg);
> +    tcg_out_mov(s, r0, addr_reg);
> +
> +#if TARGET_LONG_BITS == 32
> +    /* if VM is of 32-bit arch, clear higher 32-bit of addr */
> +    tcg_out_fmt_opi(s, INSN_ZAPNOT, r0, 0x0f, r0);
> +    tcg_out_fmt_opi(s, INSN_ZAPNOT, r1, 0x0f, r1);
> +#endif

I suggest creating a

static inline tcg_out_mov_addr(TCGContext *s, int rd, int rs)
{
     if (TARGET_LONG_BITS == 32) {
         tcg_out_fmt_opi(s, INSN_ZAPNOT, rs, 0x0f, rd);
     } else {
         tcg_out_mov(s, rd, rs);
     }
}

and using that throughout qemu_ld/st.  That will save some redundant 
moves you are creating there, as well as removing some conditional 
compilation.  Indeed, I would suggest replacing all of the conditional 
compilation vs TARGET_LONG_BITS with normal if's.

... Of course, in this particular case, that zapnot is redundant with 
the INSN_AND that follows, for both R0 and R1.

> +    tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, (tcg_target_long)qemu_ld_helpers[s_bits]);
> +    tcg_out_push(s, addr_reg);
> +    //tcg_out_push(s, TCG_REG_26);
> +    //tcg_out_push(s, TCG_REG_15);
> +    tcg_out_mov(s, TCG_REG_27, TMP_REG1);
> +    tcg_out_fmt_jmp(s, INSN_CALL, TCG_REG_26, TMP_REG1, 0);
> +    //tcg_out_pop(s, TCG_REG_15);
> +    //tcg_out_pop(s, TCG_REG_26);
> +    tcg_out_pop(s, addr_reg);

You need not save and restore ADDR_REG; it is not used after the call.
Also, you can load the address into $27 directly and not use the temp.

> +    *(uint32_t *)label1_ptr = (uint32_t) \
> +        ( INSN_BNE | ( TMP_REG1 << 21 ) | ( val & 0x1fffff));

Frankly, I don't really think it's worth being so convoluted with the 
branches in here.  I know that's the way that the i386 port does it in 
qemu_ld/st, but I think we should rather pattern it after the i386 brcond2.

I.e. use gen_new_label to create a new label for use within the routine, 
use a normal call into tcg_out_br to generate the branch, and use 
tcg_out_label to place the label at the proper place and resolve the 
forward branch.

It may be be a teeny bit less efficient, but it's far clearer.

> +    tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, \
> +	offsetof(CPUTLBEntry, addend) - offsetof(CPUTLBEntry, addr_read));
> +    tcg_out_fmt_opr(s, INSN_ADDQ, r1, TMP_REG1, r1);
> +    tcg_out_fmt_mem(s, INSN_LDQ, TMP_REG1, r1, 0);

You may want to use

   tcg_out_ld(s, TCG_TYPE_I64, TMP_REG1, r1, offsetof ...);

for clarity, and to reuse the tcg_out_op_long improvements.

> +#else
> +    r0 = addr_reg;
> +#endif // endif defined(CONFIG_SOFTMMU)

Missing GUEST_BASE handling, though that won't help your winnt...

>>> +    case INDEX_op_sar_i32:
>>> +        tcg_out_inst4i(s,
>> OP_SHIFT, args[1], 32, FUNC_SLL, args[1]);
>>> +        tcg_out_inst4i(s,
>> OP_SHIFT, args[1], 32, FUNC_SRA, args[1]);
>>
>> That last shift can be combined with the requested shift
>> via addition. For constant input, this saves an insn; for
>> register input, the addition can be done in parallel with
>> the first shift.
>
> i changed to use "addl r, 0, r" here.

Even better.

> I think when qemu met x86 divide instructions, it will call helper
> functions to simulate them, must i define div_i32/divu_i32/...?

If you want to emulate anything other than x86, yes.


r~
diff mbox

Patch

From 7cc2acddfb7333ab3f1f6b17fa8fa5dcdd3c0095 Mon Sep 17 00:00:00 2001
From: Dong Weiyu <cidentifier@yahoo.com.cn>
Date: Wed, 20 Jan 2010 23:48:55 +0800
Subject: [PATCH] Porting TCG to alpha platform.

---
 cpu-all.h              |    2 +-
 tcg/alpha/tcg-target.c | 1196 ++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/alpha/tcg-target.h |   70 +++
 3 files changed, 1267 insertions(+), 1 deletions(-)
 create mode 100644 tcg/alpha/tcg-target.c
 create mode 100644 tcg/alpha/tcg-target.h

diff --git a/cpu-all.h b/cpu-all.h
index e0c3efd..bdf6fb2 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -22,7 +22,7 @@ 
 
 #include "qemu-common.h"
 
-#if defined(__arm__) || defined(__sparc__) || defined(__mips__) || defined(__hppa__)
+#if defined(__arm__) || defined(__sparc__) || defined(__mips__) || defined(__hppa__) || defined(__alpha__)
 #define WORDS_ALIGNED
 #endif
 
diff --git a/tcg/alpha/tcg-target.c b/tcg/alpha/tcg-target.c
new file mode 100644
index 0000000..143f576
--- /dev/null
+++ b/tcg/alpha/tcg-target.c
@@ -0,0 +1,1196 @@ 
+/*
+ * Tiny Code Generator for QEMU on ALPHA platform
+*/
+
+#ifndef NDEBUG
+static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
+    "$0", "$1", "$2", "$3", "$4", "$5", "$6", "$7",
+    "$8", "$9", "$10", "$11", "$12", "$13", "$14", "$15",
+    "$16", "$17", "$18", "$19", "$20", "$21", "$22", "$23",
+    "$24", "$25", "$26", "$27", "$28", "$29", "$30", "$31",
+};
+#endif
+
+/* 
+ * $26 ~ $31 are special, reserved, 
+ * and $25 is deliberately reserved for jcc operation
+ * and $0 is usually used for return function result, better allocate it later
+ * and $15 is used for cpu_env pointer, allocate it at last
+*/
+static const int tcg_target_reg_alloc_order[] = {
+    TCG_REG_9, TCG_REG_10, TCG_REG_11, TCG_REG_12, TCG_REG_13, TCG_REG_14,
+    TCG_REG_1, TCG_REG_2, TCG_REG_3, TCG_REG_4, TCG_REG_5, TCG_REG_6,
+    TCG_REG_7, TCG_REG_8, TCG_REG_22, 
+    TCG_REG_16, TCG_REG_17, TCG_REG_18, TCG_REG_19, TCG_REG_20, TCG_REG_21
+};
+
+/*
+ * according to alpha calling convention, these 6 registers are used for 
+ * function parameter passing. if function has more than 6 parameters, remained
+ * ones are stored on stack.
+*/
+static const int tcg_target_call_iarg_regs[6] = { 
+    TCG_REG_16, TCG_REG_17, TCG_REG_18, TCG_REG_19, TCG_REG_20, TCG_REG_21
+};
+
+/*
+ * according to alpha calling convention, $0 is used for returning function result.
+*/
+static const int tcg_target_call_oarg_regs[1] = { TCG_REG_0 };
+
+/*
+ * save the address of TB's epilogue.
+*/
+static uint8_t *tb_ret_addr;
+
+#define INSN_OP(x)     (((x) & 0x3f) << 26)
+#define INSN_FUNC1(x)  (((x) & 0x3) << 14)
+#define INSN_FUNC2(x)  (((x) & 0x7f) << 5)
+#define INSN_RA(x)     (((x) & 0x1f) << 21)
+#define INSN_RB(x)     (((x) & 0x1f) << 16)
+#define INSN_RC(x)     ((x) & 0x1f)
+#define INSN_LIT(x)    (((x) & 0xff) << 13)
+#define INSN_DISP16(x) ((x) & 0xffff)
+#define INSN_DISP21(x) ((x) & 0x1fffff)
+#define INSN_RSVED(x)  ((x) & 0x3fff)
+
+#define INSN_JMP       (INSN_OP(0x1a) | INSN_FUNC1(0))
+#define INSN_CALL      (INSN_OP(0x1a) | INSN_FUNC1(1))
+#define INSN_RET       (INSN_OP(0x1a) | INSN_FUNC1(2))
+#define INSN_BR        INSN_OP(0x30)
+#define INSN_BEQ       INSN_OP(0x39)
+#define INSN_BNE       INSN_OP(0x3d)
+#define INSN_BLBC      INSN_OP(0x38)
+#define INSN_BLBS      INSN_OP(0x3c)
+#define INSN_ADDL      (INSN_OP(0x10) | INSN_FUNC2(0))
+#define INSN_SUBL      (INSN_OP(0x10) | INSN_FUNC2(0x9))
+#define INSN_ADDQ      (INSN_OP(0x10) | INSN_FUNC2(0x20))
+#define INSN_SUBQ      (INSN_OP(0x10) | INSN_FUNC2(0x29))
+#define INSN_CMPEQ     (INSN_OP(0x10) | INSN_FUNC2(0x2d))
+#define INSN_CMPLT     (INSN_OP(0x10) | INSN_FUNC2(0x4d))
+#define INSN_CMPLE     (INSN_OP(0x10) | INSN_FUNC2(0x6d))
+#define INSN_CMPULT    (INSN_OP(0x10) | INSN_FUNC2(0x1d))
+#define INSN_CMPULE    (INSN_OP(0x10) | INSN_FUNC2(0x3d))
+#define INSN_MULL      (INSN_OP(0x13) | INSN_FUNC2(0))
+#define INSN_MULQ      (INSN_OP(0x13) | INSN_FUNC2(0x20))
+#define INSN_AND       (INSN_OP(0x11) | INSN_FUNC2(0))
+#define INSN_BIS       (INSN_OP(0x11) | INSN_FUNC2(0x20))
+#define INSN_XOR       (INSN_OP(0x11) | INSN_FUNC2(0x40))
+#define INSN_SLL       (INSN_OP(0x12) | INSN_FUNC2(0x39))
+#define INSN_SRL       (INSN_OP(0x12) | INSN_FUNC2(0x34))
+#define INSN_SRA       (INSN_OP(0x12) | INSN_FUNC2(0x3c))
+#define INSN_ZAPNOT    (INSN_OP(0x12) | INSN_FUNC2(0x31))
+#define INSN_SEXTB     (INSN_OP(0x1c) | INSN_FUNC2(0))
+#define INSN_SEXTW     (INSN_OP(0x1c) | INSN_FUNC2(0x1))
+#define INSN_LDA       INSN_OP(0x8)
+#define INSN_LDAH      INSN_OP(0x9)
+#define INSN_LDBU      INSN_OP(0xa)
+#define INSN_LDWU      INSN_OP(0xc)
+#define INSN_LDL       INSN_OP(0x28)
+#define INSN_LDQ       INSN_OP(0x29)
+#define INSN_STB       INSN_OP(0xe)
+#define INSN_STW       INSN_OP(0xd)
+#define INSN_STL       INSN_OP(0x2c)
+#define INSN_STQ       INSN_OP(0x2d)
+
+/*
+ * return the # of regs used for parameter passing on procedure calling.
+ * note that alpha use $16~$21 to transfer the first 6 paramenters of a procedure.
+*/
+static inline int tcg_target_get_call_iarg_regs_count(int flags)
+{
+    return 6;
+}
+
+/*
+ * given constraint, return available register set. this function is called once
+ * for each op at qemu's initialization stage.
+*/
+static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+{
+    const char *ct_str = *pct_str;
+
+    switch(ct_str[0]) 
+    {
+    case 'r':
+        /* constaint 'r' means any register is okay */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set32(ct->u.regs, 0, 0xffffffffu);
+        break;
+
+    case 'L':
+        /* 
+        * constranit 'L' is used for qemu_ld/st, which has 2 meanings:
+        * 1st, we the argument need to be allocated a register.
+        * 2nd, we should reserve some registers that belong to caller-clobbered 
+        * list for qemu_ld/st local usage, so these registers must not be 
+        * allocated to the argument that the 'L' constraint is describing.
+        *
+        * note that op qemu_ld/st has the TCG_OPF_CALL_CLOBBER flag, and 
+        * tcg will free all callee-clobbered registers before generate target
+        * insn for qemu_ld/st, so we can use these register directly without
+        * warrying about destroying their content.
+        */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set32(ct->u.regs, 0, 0xffffffffu);
+        tcg_regset_reset_reg(ct->u.regs, TCG_REG_0);
+        tcg_regset_reset_reg(ct->u.regs, TCG_REG_16);
+        tcg_regset_reset_reg(ct->u.regs, TCG_REG_17);
+        tcg_regset_reset_reg(ct->u.regs, TCG_REG_18);
+        break;
+
+    default:
+        return -1;
+    }
+
+    ct_str++;
+    *pct_str = ct_str;
+    return 0;
+}
+
+/*
+ * whether op's input argument may use constant 
+*/
+static inline int tcg_target_const_match( \
+	tcg_target_long val, const TCGArgConstraint *arg_ct)
+{
+    int ct = arg_ct->ct;
+    return (ct & TCG_CT_CONST) ? 1 : 0;
+}
+
+static inline void tcg_out_fmt_br(TCGContext *s, int opc, int ra, int disp)
+{
+    tcg_out32(s, (opc)|INSN_RA(ra)|INSN_DISP21(disp));
+}
+
+static inline void tcg_out_fmt_mem(TCGContext *s, int opc, int ra, int rb, int disp)
+{
+    tcg_out32(s, (opc)|INSN_RA(ra)|INSN_RB(rb)|INSN_DISP16(disp));
+}
+
+static inline void tcg_out_fmt_jmp(TCGContext *s, int opc, int ra, int rb, int rsved)
+{
+    tcg_out32(s, (opc)|INSN_RA(ra)|INSN_RB(rb)|INSN_RSVED(rsved));
+}
+
+static inline void tcg_out_fmt_opr(TCGContext *s, int opc, int ra, int rb, int rc)
+{
+    tcg_out32(s, (opc)|INSN_RA(ra)|INSN_RB(rb)|INSN_RC(rc));
+}
+
+static inline void tcg_out_fmt_opi(TCGContext *s, int opc, int ra, int lit, int rc)
+{
+    tcg_out32(s, (opc)|INSN_RA(ra)|INSN_LIT(lit)|INSN_RC(rc)|(1<<12));
+}
+
+/*
+ * mov from a reg to another
+*/
+static inline void tcg_out_mov(TCGContext *s, int rc, int rb)
+{  
+    if ( rb != rc ) {
+        tcg_out_fmt_opr(s, INSN_BIS, TCG_REG_31, rb, rc);
+    }
+}
+
+/*
+ * mov a 64-bit immediate 'arg' to regsiter 'ra', this function will
+ * generate fixed length (5 insns) of target insn sequence.
+*/
+static void tcg_out_movi_fixl( \
+    TCGContext *s, TCGType type, int ra, tcg_target_long arg)
+{
+    tcg_target_long l0, l1, l2, l3;
+    tcg_target_long l1_tmp, l2_tmp, l3_tmp;
+
+    l0 = arg & 0xffffu;
+    l1_tmp = l1 = ( arg >> 16) & 0xffffu;
+    l2_tmp = l2 = ( arg >> 32) & 0xffffu;
+    l3_tmp = l3 = ( arg >> 48) & 0xffffu;
+
+    if ( l0 & 0x8000u)
+        l1_tmp = (l1 + 1) & 0xffffu;
+    if ( (l1_tmp & 0x8000u) || ((l1_tmp == 0) && (l1_tmp != l1)))
+        l2_tmp = (l2 + 1) & 0xffffu;
+    if ( (l2_tmp & 0x8000u) || ((l2_tmp == 0) && (l2_tmp != l2)))
+        l3_tmp = (l3 + 1) & 0xffffu;
+
+    tcg_out_fmt_mem(s, INSN_LDAH, ra, TCG_REG_31, l3_tmp);
+    tcg_out_fmt_mem(s, INSN_LDA, ra, ra, l2_tmp);
+    tcg_out_fmt_opi(s, INSN_SLL, ra, 32, ra);
+    tcg_out_fmt_mem(s, INSN_LDAH, ra, ra, l1_tmp);
+    tcg_out_fmt_mem(s, INSN_LDA, ra, ra, l0);
+}
+
+/*
+ * mov 64-bit immediate 'arg' to regsiter 'ra'. this function will
+ * generate variable length of target insn sequence.
+*/
+static inline void tcg_out_movi( \
+    TCGContext *s, TCGType type, int ra, tcg_target_long arg)
+{
+    if ( type == TCG_TYPE_I32)
+        arg = (int32_t)arg;
+
+    if( arg == (int16_t)arg ) {
+        tcg_out_fmt_mem(s, INSN_LDA, ra, TCG_REG_31, arg);
+    } else if( arg == (int32_t)arg ) {
+        tcg_out_fmt_mem(s, INSN_LDAH, ra, TCG_REG_31, (arg>>16));
+        if( arg & ((tcg_target_ulong)0x8000) ) {
+            tcg_out_fmt_mem(s, INSN_LDAH, ra, ra, 1);
+        }
+        tcg_out_fmt_mem(s, INSN_LDA, ra, ra, arg);
+    } else {
+        tcg_out_movi_fixl(s, type, ra, arg);
+    }
+}
+
+static inline int _is_tmp_reg( int r)
+{
+    if ( r == TMP_REG1 || r == TMP_REG2 || r == TMP_REG3)
+        return 1;
+    else
+        return 0;
+}
+
+/*
+ * load value in disp(Rb) to Ra.
+*/
+static inline void tcg_out_ld( \
+    TCGContext *s, TCGType type, int ra, int rb, tcg_target_long disp)
+{
+    int opc;
+    
+    if ( _is_tmp_reg(ra) || _is_tmp_reg(rb))
+        tcg_abort();
+
+    opc = ((type == TCG_TYPE_I32) ? INSN_LDL : INSN_LDQ);
+
+    if( disp != (int16_t)disp ) {
+        tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, disp);
+        tcg_out_fmt_opr(s, INSN_ADDQ, rb, TMP_REG1, TMP_REG1);
+        tcg_out_fmt_mem(s, opc, ra, TMP_REG1, 0);
+    }
+    else
+        tcg_out_fmt_mem(s, opc, ra, rb, disp);
+}
+
+/*
+ * store value in Ra to disp(Rb).
+*/
+static inline void tcg_out_st( \
+    TCGContext *s, TCGType type, int ra, int rb, tcg_target_long disp)
+{
+    int opc;
+
+    if ( _is_tmp_reg(ra) || _is_tmp_reg(rb))
+        tcg_abort();
+    
+    opc = ((type == TCG_TYPE_I32) ? INSN_STL : INSN_STQ);
+
+    if( disp != (int16_t)disp ) {
+        tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, disp);
+        tcg_out_fmt_opr(s, INSN_ADDQ, rb, TMP_REG1, TMP_REG1);
+        tcg_out_fmt_mem(s, opc, ra, TMP_REG1, 0);
+    }
+    else
+        tcg_out_fmt_mem(s, opc, ra, rb, disp);
+}
+
+/*
+ * generate arithmatic instruction with immediate. ra is used as both
+ * input and output, and val is used as another input.
+*/
+static inline void tgen_arithi( \
+    TCGContext *s, int opc, int ra, tcg_target_long val)
+{
+    if ( _is_tmp_reg(ra))
+        tcg_abort();
+
+    if (val == (uint8_t)val) {
+        tcg_out_fmt_opi(s, opc, ra, val, ra);
+    } else {
+        tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, val);
+        tcg_out_fmt_opr(s, opc, ra, TMP_REG1, ra);
+    }
+}
+
+static void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val)
+{
+    if (val != 0)
+        tgen_arithi(s, INSN_ADDQ, reg, val);
+}
+
+static inline void tcg_out_push(TCGContext *s, int reg)
+{
+    tcg_out_fmt_opi(s, INSN_SUBQ, TCG_REG_30, 8, TCG_REG_30);
+    tcg_out_fmt_mem(s, INSN_STQ, reg, TCG_REG_30, 0);
+}
+
+static inline void tcg_out_pop(TCGContext *s, int reg)
+{
+    tcg_out_fmt_mem(s, INSN_LDQ, reg, TCG_REG_30, 0);
+    tcg_out_fmt_opi(s, INSN_ADDQ, TCG_REG_30, 8, TCG_REG_30);
+}
+
+static const uint64_t tcg_cond_to_jcc[10] = {
+    [TCG_COND_EQ] = INSN_CMPEQ,
+    [TCG_COND_NE] = INSN_CMPEQ,
+    [TCG_COND_LT] = INSN_CMPLT,
+    [TCG_COND_GE] = INSN_CMPLT,
+    [TCG_COND_LE] = INSN_CMPLE,
+    [TCG_COND_GT] = INSN_CMPLE,
+    [TCG_COND_LTU] = INSN_CMPULT,
+    [TCG_COND_GEU] = INSN_CMPULT,
+    [TCG_COND_LEU] = INSN_CMPULE,
+    [TCG_COND_GTU] = INSN_CMPULE
+};
+
+static void patch_reloc(uint8_t *code_ptr, \
+    int type, tcg_target_long value, tcg_target_long addend)
+{
+    TCGContext s;
+    tcg_target_long val;
+
+    if ( type != R_ALPHA_BRADDR)
+        tcg_abort();
+    
+    s.code_ptr = code_ptr;
+    val = (value - (tcg_target_long)s.code_ptr - 4) >> 2; 
+    if ( !(val >= -0x100000 && val < 0x100000)) {
+        tcg_abort();
+    }
+
+    tcg_out_fmt_br(&s, INSN_BR, TCG_REG_31, val);
+}
+
+static void tcg_out_br(TCGContext *s, int label_index)
+{
+    TCGLabel *l = &s->labels[label_index];
+
+    if (l->has_value) {
+        tcg_target_long val;
+        val = ((tcg_target_long)(l->u.value) - (tcg_target_long)s->code_ptr - 4) >> 2;
+        if ( val >= -0x100000 && val < 0x100000) {
+            // if distance can be put into 21-bit field
+            tcg_out_fmt_br(s, INSN_BR, TCG_REG_31, val);
+	} else {
+            tcg_abort();
+	}
+    } else {
+        tcg_out_reloc(s, s->code_ptr, R_ALPHA_BRADDR, label_index, 0);
+        s->code_ptr += 4;
+    }
+}
+
+static void tcg_out_brcond( TCGContext *s, int cond, \
+    TCGArg arg1, TCGArg arg2, int const_arg2, int label_index)
+{
+    int opc;
+    TCGLabel *l = &s->labels[label_index];
+
+    if ( cond < TCG_COND_EQ || cond > TCG_COND_GTU || const_arg2)
+        tcg_abort();
+
+    opc = tcg_cond_to_jcc[cond];
+    tcg_out_fmt_opr(s, opc, arg1, arg2, TMP_REG1);
+
+    if (l->has_value) {
+        tcg_target_long val;
+        val = ((tcg_target_long)l->u.value - (tcg_target_long)s->code_ptr - 4) >> 2;
+        if ( val >= -0x100000 && val < 0x100000) {
+            // if distance can be put into 21-bit field
+            opc = (cond & 1) ? INSN_BLBC : INSN_BLBS;
+            tcg_out_fmt_br(s, opc, TMP_REG1, val);
+	} else {
+            tcg_abort();
+	}
+    } else {
+        opc = (cond & 1) ? INSN_BLBS : INSN_BLBC;
+        tcg_out_fmt_br(s, opc, TMP_REG1, 1);
+        tcg_out_reloc(s, s->code_ptr, R_ALPHA_BRADDR, label_index, 0);
+        s->code_ptr += 4;
+    }
+}
+
+#if defined(CONFIG_SOFTMMU)
+
+#include "../../softmmu_defs.h"
+
+static void *qemu_ld_helpers[4] = {
+    __ldb_mmu,
+    __ldw_mmu,
+    __ldl_mmu,
+    __ldq_mmu,
+};
+
+static void *qemu_st_helpers[4] = {
+    __stb_mmu,
+    __stw_mmu,
+    __stl_mmu,
+    __stq_mmu,
+};
+
+#endif
+
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
+{
+    int addr_reg, data_reg, r0, r1, mem_index, s_bits;
+    tcg_target_long val;
+
+#if defined(CONFIG_SOFTMMU)
+    uint8_t *label1_ptr, *label2_ptr;
+#endif
+
+    data_reg = *args++;
+    addr_reg = *args++;
+    mem_index = *args;
+    s_bits = opc & 3;
+
+    r0 = TCG_REG_16;
+    r1 = TCG_REG_17;
+
+#if defined(CONFIG_SOFTMMU)
+
+    tcg_out_mov(s, r1, addr_reg); 
+    tcg_out_mov(s, r0, addr_reg); 
+ 
+#if TARGET_LONG_BITS == 32
+    /* if VM is of 32-bit arch, clear higher 32-bit of addr */
+    tcg_out_fmt_opi(s, INSN_ZAPNOT, r0, 0x0f, r0);
+    tcg_out_fmt_opi(s, INSN_ZAPNOT, r1, 0x0f, r1);
+#endif
+
+    tgen_arithi(s, INSN_AND, r0, TARGET_PAGE_MASK|((1<<s_bits)-1));
+
+    tgen_arithi(s, INSN_SRL, r1, TARGET_PAGE_BITS-CPU_TLB_ENTRY_BITS);
+    tgen_arithi(s, INSN_AND, r1, (CPU_TLB_SIZE-1)<<CPU_TLB_ENTRY_BITS);
+    
+    tcg_out_addi(s, r1, offsetof(CPUState, tlb_table[mem_index][0].addr_read));
+    tcg_out_fmt_opr(s, INSN_ADDQ, r1, TCG_REG_15, r1);
+#if TARGET_LONG_BITS == 32
+    tcg_out_fmt_mem(s, INSN_LDL, TMP_REG1, r1, 0);
+    tcg_out_fmt_opi(s, INSN_ZAPNOT, TMP_REG1, 0x0f, TMP_REG1);
+#else
+    tcg_out_fmt_mem(s, INSN_LDQ, TMP_REG1, r1, 0);
+#endif
+		
+    //
+    // now, r0 contains the page# and TMP_REG1 contains the addr to tlb_entry.addr_read
+    // we below will compare them
+    //
+    tcg_out_fmt_opr(s, INSN_CMPEQ, TMP_REG1, r0, TMP_REG1);
+
+    tcg_out_mov(s, r0, addr_reg);
+#if TARGET_LONG_BITS == 32
+    tcg_out_fmt_opi(s, INSN_ZAPNOT, r0, 0x0f, r0);
+#endif
+
+    //
+    // if equal, we jump to label1. since label1 is not resolved yet, 
+    // we just record a relocation.
+    //
+    label1_ptr = s->code_ptr;
+    s->code_ptr += 4;
+
+    //
+    // here, unequal, TLB-miss.
+    //
+    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_17, mem_index);
+    tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, (tcg_target_long)qemu_ld_helpers[s_bits]);
+    tcg_out_push(s, addr_reg);
+    //tcg_out_push(s, TCG_REG_26);
+    //tcg_out_push(s, TCG_REG_15);
+    tcg_out_mov(s, TCG_REG_27, TMP_REG1);
+    tcg_out_fmt_jmp(s, INSN_CALL, TCG_REG_26, TMP_REG1, 0);
+    //tcg_out_pop(s, TCG_REG_15);
+    //tcg_out_pop(s, TCG_REG_26);
+    tcg_out_pop(s, addr_reg);
+	
+    //
+    // after helper function call, the result of ld is saved in $0
+    //
+    switch(opc) {
+    case 0 | 4:
+        tcg_out_fmt_opr(s, INSN_SEXTB, TCG_REG_31, TCG_REG_0, data_reg);
+        break;
+    case 1 | 4:
+        tcg_out_fmt_opr(s, INSN_SEXTW, TCG_REG_31, TCG_REG_0, data_reg);
+        break;
+    case 2 | 4:
+        tcg_out_fmt_opr(s, INSN_ADDL, TCG_REG_0, TCG_REG_31, data_reg);
+        break;
+    case 0:
+        tcg_out_fmt_opi(s, INSN_ZAPNOT, TCG_REG_0, 0x1, data_reg);
+        break;
+    case 1:
+        tcg_out_fmt_opi(s, INSN_ZAPNOT, TCG_REG_0, 0x3, data_reg);
+        break;
+    case 2:
+        tcg_out_fmt_opi(s, INSN_ZAPNOT, TCG_REG_0, 0xf, data_reg);
+        break;
+    case 3:
+        tcg_out_mov(s, data_reg, TCG_REG_0);
+        break;
+    default:
+        tcg_abort();
+        break;
+    }
+
+    //
+    // we have done, jmp to label2. label2 is not resolved yet, 
+    // we record a relocation.
+    //
+    label2_ptr = s->code_ptr;
+    s->code_ptr += 4;
+    
+    // patch jmp to label1
+    val = (s->code_ptr - label1_ptr - 4) >> 2;
+    if ( !(val >= -0x100000 && val < 0x100000)) {
+        tcg_abort();
+    }
+    *(uint32_t *)label1_ptr = (uint32_t) \
+        ( INSN_BNE | ( TMP_REG1 << 21 ) | ( val & 0x1fffff));
+
+    //
+    // if we get here, a TLB entry is hit, r0 contains the guest addr and 
+    // r1 contains the ptr that point to tlb_entry.addr_read. what we should
+    // do is to load the tlb_entry.addend (64-bit on alpha) and add it to 
+    // r0 to get the host VA
+    //
+    tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, \
+	offsetof(CPUTLBEntry, addend) - offsetof(CPUTLBEntry, addr_read));
+    tcg_out_fmt_opr(s, INSN_ADDQ, r1, TMP_REG1, r1);
+    tcg_out_fmt_mem(s, INSN_LDQ, TMP_REG1, r1, 0);
+    tcg_out_fmt_opr(s, INSN_ADDQ, r0, TMP_REG1, r0);
+	
+#else
+    r0 = addr_reg;
+#endif // endif defined(CONFIG_SOFTMMU)
+
+#ifdef TARGET_WORDS_BIGENDIAN
+    tcg_abort();
+#endif
+
+    //
+    // when we get here, r0 contains the host VA that can be used to access guest PA
+    //
+    switch(opc) {
+    case 0:
+        tcg_out_fmt_mem(s, INSN_LDBU, data_reg, r0, 0);
+        break;
+    case 0 | 4:
+        tcg_out_fmt_mem(s, INSN_LDBU, data_reg, r0, 0);
+        tcg_out_fmt_opr(s, INSN_SEXTB, TCG_REG_31, data_reg, data_reg);
+        break;
+    case 1:
+        tcg_out_fmt_mem(s, INSN_LDWU, data_reg, r0, 0);
+        break;
+    case 1 | 4:
+        tcg_out_fmt_mem(s, INSN_LDWU, data_reg, r0, 0);
+        tcg_out_fmt_opr(s, INSN_SEXTW, TCG_REG_31, data_reg, data_reg);
+        break;
+    case 2:
+        tcg_out_fmt_mem(s, INSN_LDL, data_reg, r0, 0);
+        tcg_out_fmt_opi(s, INSN_ZAPNOT, data_reg, 0xf, data_reg);
+        break;
+    case 2 | 4:
+        tcg_out_fmt_mem(s, INSN_LDL, data_reg, r0, 0);
+        break;
+    case 3:
+        tcg_out_fmt_mem(s, INSN_LDQ, data_reg, r0, 0);
+        break;
+    default:
+        tcg_abort();
+    }
+
+#if defined(CONFIG_SOFTMMU)
+    /* label2: */
+    val = (s->code_ptr - label2_ptr - 4) >> 2;
+    if ( !(val >= -0x100000 && val < 0x100000)) {
+        tcg_abort();
+    }
+    *(uint32_t *)label2_ptr = (uint32_t)( INSN_BR \
+        | ( TCG_REG_31  << 21 ) | ( val & 0x1fffff) );
+#endif
+}
+
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
+{
+    int addr_reg, data_reg, r0, r1, mem_index, s_bits;
+    tcg_target_long val;
+
+#if defined(CONFIG_SOFTMMU)
+    uint8_t *label1_ptr, *label2_ptr;
+#endif
+
+    data_reg = *args++;
+    addr_reg = *args++;
+    mem_index = *args;
+    s_bits = opc&3;
+
+    r0 = TCG_REG_16;
+    r1 = TCG_REG_17;
+
+#if defined(CONFIG_SOFTMMU)
+
+    tcg_out_mov(s, r1, addr_reg); 
+    tcg_out_mov(s, r0, addr_reg); 
+ 
+#if TARGET_LONG_BITS == 32
+    /* if VM is of 32-bit arch, clear higher 32-bit of addr */
+    tcg_out_fmt_opi(s, INSN_ZAPNOT, r0, 0x0f, r0);
+    tcg_out_fmt_opi(s, INSN_ZAPNOT, r1, 0x0f, r1);
+#endif
+
+    tgen_arithi(s, INSN_AND, r0, TARGET_PAGE_MASK | ((1 << s_bits) - 1));
+
+    tgen_arithi(s, INSN_SRL, r1, TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
+    tgen_arithi(s, INSN_AND, r1, (CPU_TLB_SIZE-1) << CPU_TLB_ENTRY_BITS);
+
+    tcg_out_addi(s, r1, offsetof(CPUState, tlb_table[mem_index][0].addr_write));
+    tcg_out_fmt_opr(s, INSN_ADDQ, r1, TCG_REG_15, r1);
+
+#if TARGET_LONG_BITS == 32
+    tcg_out_fmt_mem(s, INSN_LDL, TMP_REG1, r1, 0);
+    tcg_out_fmt_opi(s, INSN_ZAPNOT, TMP_REG1, 0x0f, TMP_REG1);
+#else
+    tcg_out_fmt_mem(s, INSN_LDQ, TMP_REG1, r1, 0);
+#endif
+
+    //
+    // now, r0 contains the page# and TMP_REG1 contains the addr to tlb_entry.addr_read
+    // we below will compare them
+    //    
+    tcg_out_fmt_opr(s, INSN_CMPEQ, TMP_REG1, r0, TMP_REG1);
+
+    tcg_out_mov(s, r0, addr_reg);
+#if TARGET_LONG_BITS == 32
+    tcg_out_fmt_opi(s, INSN_ZAPNOT, r0, 0x0f, r0);
+#endif
+
+    //
+    // if equal, we jump to label1. since label1 is not resolved yet, 
+    // we just record a relocation.
+    //
+    label1_ptr = s->code_ptr;
+    s->code_ptr += 4;
+
+    // here, unequal, TLB-miss, ...
+    tcg_out_mov(s, TCG_REG_17, data_reg);
+    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_18, mem_index);
+    tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, (tcg_target_long)qemu_st_helpers[s_bits]);
+        
+    tcg_out_push(s, data_reg);
+    tcg_out_push(s, addr_reg);
+    //tcg_out_push(s, TCG_REG_26);
+    //tcg_out_push(s, TCG_REG_15);
+    tcg_out_mov(s, TCG_REG_27,TMP_REG1);
+    tcg_out_fmt_jmp(s, INSN_CALL, TCG_REG_26, TMP_REG1, 0);
+    //tcg_out_pop(s, TCG_REG_15);
+    //tcg_out_pop(s, TCG_REG_26);
+    tcg_out_pop(s, addr_reg);
+    tcg_out_pop(s, data_reg);
+
+    //
+    // we have done, jmp to label2. label2 is not resolved yet,
+    // we record a relocation.
+    //
+    label2_ptr = s->code_ptr;
+    s->code_ptr += 4;
+    
+    // patch jmp to label1
+    val = (s->code_ptr - label1_ptr - 4) >> 2;
+    if ( !(val >= -0x100000 && val < 0x100000)) {
+        tcg_abort();
+    }
+    *(uint32_t *)label1_ptr = (uint32_t) \
+        ( INSN_BNE | ( TMP_REG1  << 21 ) | ( val & 0x1fffff));
+
+    //
+    // if we get here, a TLB entry is hit, r0 contains the guest addr and 
+    // r1 contains the ptr that point to tlb_entry.addr_read. what we should
+    // do is to load the tlb_entry.addend (64-bit on alpha) and add it to 
+    // r0 to get the host VA
+    //
+    tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, \
+        offsetof(CPUTLBEntry, addend) - offsetof(CPUTLBEntry, addr_write));
+    tcg_out_fmt_opr(s, INSN_ADDQ, r1, TMP_REG1, r1);
+    tcg_out_fmt_mem(s, INSN_LDQ, TMP_REG1, r1,  0);
+    tcg_out_fmt_opr(s, INSN_ADDQ, r0, TMP_REG1, r0);
+
+#else
+    r0 = addr_reg;
+#endif
+
+#ifdef TARGET_WORDS_BIGENDIAN
+    tcg_abort();
+#endif
+
+    //
+    // when we get here, r0 contains the host VA that can be used to access guest PA
+    //
+    switch(opc) {
+    case 0:
+        tcg_out_fmt_mem(s, INSN_STB, data_reg, r0, 0);
+        break;
+    case 1:
+        tcg_out_fmt_mem(s, INSN_STW, data_reg, r0, 0);
+        break;
+    case 2:
+        tcg_out_fmt_mem(s, INSN_STL, data_reg, r0, 0);
+        break;
+    case 3:
+        tcg_out_fmt_mem(s, INSN_STQ, data_reg, r0, 0);
+        break;
+    default:
+        tcg_abort();
+    }
+
+#if defined(CONFIG_SOFTMMU)
+    /* patch jmp to label2: */
+    val = (s->code_ptr - label2_ptr - 4) >> 2;
+    if ( !(val >= -0x100000 && val < 0x100000)) {
+        tcg_abort();
+    }
+    *(uint32_t *)label2_ptr = (uint32_t)( INSN_BR \
+        | ( TCG_REG_31  << 21 ) | ( val & 0x1fffff));
+#endif
+}
+
+static inline void tgen_ldxx( TCGContext *s, int ra, int rb, tcg_target_long disp, int flags)
+{
+    int opc_array[4] = { INSN_LDBU, INSN_LDWU, INSN_LDL, INSN_LDQ};
+    int opc = opc_array[flags & 3];
+
+    if ( _is_tmp_reg(ra) || _is_tmp_reg(rb))
+        tcg_abort();
+
+    if( disp != (int16_t)disp ) {
+        /* disp cannot be stored in insn directly */
+        tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, disp);	
+        tcg_out_fmt_opr(s, INSN_ADDQ, rb, TMP_REG1, TMP_REG1);
+        tcg_out_fmt_mem(s, opc, ra, TMP_REG1, 0);
+    } else {
+        tcg_out_fmt_mem(s, opc, ra, rb, disp);
+    }
+
+    switch ( flags & 7)	{
+    case 0:
+    case 1:
+    case 2|4:
+    case 3:
+        break;
+    case 0|4:
+        tcg_out_fmt_opr(s, INSN_SEXTB, TCG_REG_31, ra, ra);
+        break;
+    case 1|4:
+        tcg_out_fmt_opr(s, INSN_SEXTW, TCG_REG_31, ra, ra);
+        break;
+    case 2:
+        tcg_out_fmt_opi(s, INSN_ZAPNOT, ra, 0x0f, ra);
+        break;
+    default:
+        tcg_abort();
+    }
+}
+
+static inline void tgen_stxx( TCGContext *s, int ra, int rb, tcg_target_long disp, int flags)
+{
+    int opc_array[4] = { INSN_STB, INSN_STW, INSN_STL, INSN_STQ};
+    int opc = opc_array[flags & 3];
+
+    if( disp != (int16_t)disp ) {
+        /* disp cannot be stored in insn directly */
+        tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, disp);
+        tcg_out_fmt_opr(s, INSN_ADDQ, rb, TMP_REG1, TMP_REG1);
+        tcg_out_fmt_mem(s, opc, ra, TMP_REG1, 0);
+    } else {
+        tcg_out_fmt_mem(s, opc, ra, rb, disp);
+    }
+}
+
+static inline void tcg_out_op(TCGContext *s, \
+	int opc, const TCGArg *args, const int *const_args)
+{
+    int oc;
+    switch(opc)
+    {
+    case INDEX_op_exit_tb:
+        /*
+         * exit_tb t0, where t0 is always constant and should be returned to engine
+         * since we'll back to engine soon, $0 and $1 will never be used
+        */
+        tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_0, args[0]);
+        tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, (tcg_target_long)tb_ret_addr);
+   	tcg_out_fmt_jmp(s, INSN_JMP, TCG_REG_31, TMP_REG1, 0);
+        break;
+
+    case INDEX_op_goto_tb:
+        /* goto_tb idx, where idx is constant 0 or 1, indicating the branch # */
+        if (s->tb_jmp_offset) {
+            /* we don't support direct jmp */
+            tcg_abort();
+        } else {
+            tcg_out_movi( s, TCG_TYPE_I64, TMP_REG1, (tcg_target_long)(s->tb_next + args[0]));
+            tcg_out_fmt_mem(s, INSN_LDQ, TMP_REG1, TMP_REG1, 0);
+            tcg_out_fmt_jmp(s, INSN_JMP, TCG_REG_31, TMP_REG1, 0);
+        }
+        s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
+        break;
+
+    case INDEX_op_call:
+        if (const_args[0]) {
+            tcg_abort();
+	} else {
+            //tcg_out_push( s, TCG_REG_26);
+            //tcg_out_push( s, TCG_REG_15);
+            tcg_out_mov( s, TCG_REG_27, args[0]);
+            tcg_out_fmt_jmp(s, INSN_CALL, TCG_REG_26, args[0], 0);
+            //tcg_out_pop( s, TCG_REG_15);
+            //tcg_out_pop( s, TCG_REG_26);
+        }
+        break;
+
+    case INDEX_op_jmp: 
+        if (const_args[0]) {
+            tcg_abort();
+        } else {
+            tcg_out_fmt_jmp(s, INSN_JMP, TCG_REG_31, args[0], 0);
+        }
+        break;
+
+    case INDEX_op_br:
+        tcg_out_br(s, args[0]);
+        break;
+
+    case INDEX_op_ld8u_i32: 
+    case INDEX_op_ld8u_i64:
+        tgen_ldxx( s, args[0], args[1], args[2], 0);
+        break;

+    case INDEX_op_ld8s_i32: 
+    case INDEX_op_ld8s_i64: 
+        tgen_ldxx( s, args[0], args[1], args[2], 0|4);
+        break;
+    case INDEX_op_ld16u_i32:
+    case INDEX_op_ld16u_i64:
+        tgen_ldxx( s, args[0], args[1], args[2], 1);
+        break;
+    case INDEX_op_ld16s_i32:
+    case INDEX_op_ld16s_i64: 
+        tgen_ldxx( s, args[0], args[1], args[2], 1|4);
+        break;
	
+    case INDEX_op_ld32u_i64: 
+        tgen_ldxx( s, args[0], args[1], args[2], 2);
+        break;
+    case INDEX_op_ld_i32: 
+    case INDEX_op_ld32s_i64:
+        tgen_ldxx( s, args[0], args[1], args[2], 2|4);
+        break;
+    case INDEX_op_ld_i64: 
+        tgen_ldxx( s, args[0], args[1], args[2], 3);
+        break;
+		
+    case INDEX_op_st8_i32:
+    case INDEX_op_st8_i64: 
+        tgen_stxx( s, args[0], args[1], args[2], 0);
+        break;

+    case INDEX_op_st16_i32:
+    case INDEX_op_st16_i64: 
+        tgen_stxx( s, args[0], args[1], args[2], 1);
+        break;
+    case INDEX_op_st_i32:
+    case INDEX_op_st32_i64: 
+        tgen_stxx( s, args[0], args[1], args[2], 2);
+        break;

+    case INDEX_op_st_i64: 
+        tgen_stxx( s, args[0], args[1], args[2], 3);
+        break;
+
+    case INDEX_op_add_i32: 
+    case INDEX_op_add_i64: 
+        oc = INSN_ADDQ;
+        goto gen_arith;
+    case INDEX_op_sub_i32: 
+    case INDEX_op_sub_i64:
+        oc = INSN_SUBQ;
+        goto gen_arith;
+    case INDEX_op_mul_i32: 
+        oc = INSN_MULL;
+	goto gen_arith;
+    case INDEX_op_mul_i64: 
+        oc = INSN_MULQ;
+        goto gen_arith;
+    case INDEX_op_and_i32:
+    case INDEX_op_and_i64:
+        oc = INSN_AND;
+        goto gen_arith;
+    case INDEX_op_or_i32:
+    case INDEX_op_or_i64: 
+        oc = INSN_BIS;
+        goto gen_arith;
+    case INDEX_op_xor_i32:
+    case INDEX_op_xor_i64:
+        oc = INSN_XOR;
+	goto gen_arith;
+    case INDEX_op_shl_i32:
+    case INDEX_op_shl_i64:
+        oc = INSN_SLL;
+	goto gen_arith;
+    case INDEX_op_shr_i32:
+        tcg_out_fmt_opi(s, INSN_ZAPNOT, args[1], 0x0f, args[1]);
+    case INDEX_op_shr_i64:
+        oc = INSN_SRL;
+        goto gen_arith;
+    case INDEX_op_sar_i32:
+        tcg_out_fmt_opr(s, INSN_ADDL, args[1], TCG_REG_31, args[1]);
+    case INDEX_op_sar_i64:
+        oc = INSN_SRA;
+    gen_arith:
+        if (const_args[2]) {
+            tcg_abort();
+        } else {
+            tcg_out_fmt_opr(s, oc, args[1], args[2], args[0]);
+        }
+        break;
+
+    case INDEX_op_brcond_i32:
+        tcg_out_fmt_opr(s, INSN_ADDL, args[0], TCG_REG_31, args[0]);
+        tcg_out_fmt_opr(s, INSN_ADDL, args[1], TCG_REG_31, args[1]);
+        tcg_out_brcond(s, args[2], args[0], args[1], const_args[1], args[3]);
+        break;
+    case INDEX_op_brcond_i64:
+        tcg_out_brcond(s, args[2], args[0], args[1], const_args[1], args[3]);
+        break;
+
+    case INDEX_op_ext8s_i32:
+    case INDEX_op_ext8s_i64:
+        tcg_out_fmt_opr(s, INSN_SEXTB, TCG_REG_31, args[1], args[0]);
+        break;
+    case INDEX_op_ext16s_i32:
+    case INDEX_op_ext16s_i64:
+        tcg_out_fmt_opr(s, INSN_SEXTW, TCG_REG_31, args[1], args[0]);
+        break;
+    case INDEX_op_ext32s_i64:
+        tcg_out_fmt_opr(s, INSN_ADDL, args[1], TCG_REG_31, args[0]);
+        break;
+    
+    case INDEX_op_qemu_ld8u:
+        tcg_out_qemu_ld(s, args, 0);
+        break;
+    case INDEX_op_qemu_ld8s:
+        tcg_out_qemu_ld(s, args, 0 | 4);
+        break;
+    case INDEX_op_qemu_ld16u:
+        tcg_out_qemu_ld(s, args, 1);
+        break;
+    case INDEX_op_qemu_ld16s:
+        tcg_out_qemu_ld(s, args, 1 | 4);
+        break;
+    case INDEX_op_qemu_ld32u:
+        tcg_out_qemu_ld(s, args, 2);
+        break;
+    case INDEX_op_qemu_ld32s:
+        tcg_out_qemu_ld(s, args, 2 | 4);
+        break;
+    case INDEX_op_qemu_ld64:
+        tcg_out_qemu_ld(s, args, 3);
+        break;
+
+    case INDEX_op_qemu_st8:
+        tcg_out_qemu_st(s, args, 0);
+        break;
+    case INDEX_op_qemu_st16:
+        tcg_out_qemu_st(s, args, 1);
+        break;
+    case INDEX_op_qemu_st32:
+        tcg_out_qemu_st(s, args, 2);
+        break;
+    case INDEX_op_qemu_st64:
+        tcg_out_qemu_st(s, args, 3);
+        break;
+
+    case INDEX_op_movi_i32: 
+    case INDEX_op_movi_i64: 
+    case INDEX_op_mov_i32: 
+    case INDEX_op_mov_i64:
+    case INDEX_op_div2_i32:
+    case INDEX_op_divu2_i32:
+    default:
+        tcg_abort();
+    }
+}
+
+static const TCGTargetOpDef alpha_op_defs[] = {
+    { INDEX_op_exit_tb, { } },
+    { INDEX_op_goto_tb, { } },
+    { INDEX_op_call, { "r" } },
+    { INDEX_op_jmp, { "r" } },
+    { INDEX_op_br, { } },
+
+    { INDEX_op_mov_i32, { "r", "r" } },
+    { INDEX_op_movi_i32, { "r" } },
+    { INDEX_op_ld8u_i32, { "r", "r" } },
+    { INDEX_op_ld8s_i32, { "r", "r" } },
+    { INDEX_op_ld16u_i32, { "r", "r" } },
+    { INDEX_op_ld16s_i32, { "r", "r" } },
+    { INDEX_op_ld_i32, { "r", "r" } },
+    { INDEX_op_st8_i32, { "r", "r" } },
+    { INDEX_op_st16_i32, { "r", "r" } },
+    { INDEX_op_st_i32, { "r", "r" } },
+
+    { INDEX_op_add_i32, { "r", "0", "r" } },
+    { INDEX_op_mul_i32, { "r", "0", "r" } },
+    //{ INDEX_op_div2_i32, { "a", "d", "0", "1", "r" } },
+    //{ INDEX_op_divu2_i32, { "a", "d", "0", "1", "r" } },
+    { INDEX_op_sub_i32, { "r", "0", "r" } },
+    { INDEX_op_and_i32, { "r", "0", "r" } },
+    { INDEX_op_or_i32, { "r", "0", "r" } },
+    { INDEX_op_xor_i32, { "r", "0", "r" } },
+
+    { INDEX_op_shl_i32, { "r", "0", "r" } },
+    { INDEX_op_shr_i32, { "r", "0", "r" } },
+    { INDEX_op_sar_i32, { "r", "0", "r" } },
+
+    { INDEX_op_brcond_i32, { "r", "r" } },		
+
+    { INDEX_op_mov_i64, { "r", "r" } },	
+    { INDEX_op_movi_i64, { "r" } },
+    { INDEX_op_ld8u_i64, { "r", "r" } },
+    { INDEX_op_ld8s_i64, { "r", "r" } },
+    { INDEX_op_ld16u_i64, { "r", "r" } },
+    { INDEX_op_ld16s_i64, { "r", "r" } },
+    { INDEX_op_ld32u_i64, { "r", "r" } },
+    { INDEX_op_ld32s_i64, { "r", "r" } },
+    { INDEX_op_ld_i64, { "r", "r" } },
+    { INDEX_op_st8_i64, { "r", "r" } },	
+    { INDEX_op_st16_i64, { "r", "r" } },
+    { INDEX_op_st32_i64, { "r", "r" } },
+    { INDEX_op_st_i64, { "r", "r" } },
+
+    { INDEX_op_add_i64, { "r", "0", "r" } },
+    { INDEX_op_mul_i64, { "r", "0", "r" } },
+    //{ INDEX_op_div2_i64, { "a", "d", "0", "1", "r" } },
+    //{ INDEX_op_divu2_i64, { "a", "d", "0", "1", "r" } },
+    { INDEX_op_sub_i64, { "r", "0", "r" } },
+    { INDEX_op_and_i64, { "r", "0", "r" } },
+    { INDEX_op_or_i64, { "r", "0", "r" } },
+    { INDEX_op_xor_i64, { "r", "0", "r" } },
+
+    { INDEX_op_shl_i64, { "r", "0", "r" } },
+    { INDEX_op_shr_i64, { "r", "0", "r" } },
+    { INDEX_op_sar_i64, { "r", "0", "r" } },
+
+    { INDEX_op_brcond_i64, { "r", "r" } },
+
+    { INDEX_op_ext8s_i32, { "r", "r"} },
+    { INDEX_op_ext16s_i32, { "r", "r"} },
+    { INDEX_op_ext8s_i64, { "r", "r"} },
+    { INDEX_op_ext16s_i64, { "r", "r"} },
+    { INDEX_op_ext32s_i64, { "r", "r"} },
+
+    { INDEX_op_qemu_ld8u, { "r", "L" } },
+    { INDEX_op_qemu_ld8s, { "r", "L" } },
+    { INDEX_op_qemu_ld16u, { "r", "L" } },
+    { INDEX_op_qemu_ld16s, { "r", "L" } },
+    { INDEX_op_qemu_ld32u, { "r", "L" } },
+    { INDEX_op_qemu_ld32s, { "r", "L" } },
+    { INDEX_op_qemu_ld64, { "r", "L" } },
+
+    { INDEX_op_qemu_st8, { "L", "L" } },
+    { INDEX_op_qemu_st16, { "L", "L" } },
+    { INDEX_op_qemu_st32, { "L", "L" } },
+    //{ INDEX_op_qemu_st64, { "L", "L", "L"} },
+    { INDEX_op_qemu_st64, { "L", "L"} },
+    { -1 },
+};
+
+
+static int tcg_target_callee_save_regs[] = {
+    TCG_REG_15,		// used for the global env, so no need to save
+    TCG_REG_9,
+    TCG_REG_10,
+    TCG_REG_11,
+    TCG_REG_12,
+    TCG_REG_13,
+    TCG_REG_14
+};
+
+/*
+ * Generate global QEMU prologue and epilogue code 
+*/
+void tcg_target_qemu_prologue(TCGContext *s)
+{
+    int i, frame_size, push_size, stack_addend;
+   
+    /* TB prologue */
+    /*printf("TB prologue @ %lx\n", s->code_ptr);*/
+	
+    /* save TCG_REG_26 */
+    tcg_out_push(s, TCG_REG_26);
+    tcg_out_push(s, TCG_REG_27);
+    tcg_out_push(s, TCG_REG_28);
+    tcg_out_push(s, TCG_REG_29);
+
+    /* save all callee saved registers */
+    for(i = 0; i < ARRAY_SIZE(tcg_target_callee_save_regs); i++) {
+        tcg_out_push(s, tcg_target_callee_save_regs[i]);
+    }
+	
+    /* reserve some stack space */
+    push_size = 8 + (4 + ARRAY_SIZE(tcg_target_callee_save_regs)) * 8;
+    frame_size = push_size + 4*TCG_STATIC_CALL_ARGS_SIZE;
+    frame_size = (frame_size + TCG_TARGET_STACK_ALIGN - 1) & ~(TCG_TARGET_STACK_ALIGN - 1);
+    stack_addend = frame_size - push_size;
+    tcg_out_addi(s, TCG_REG_30, -stack_addend);
+
+    tcg_out_fmt_jmp(s, INSN_JMP, TCG_REG_31, TCG_REG_16, 0);		/* jmp $16 */
+
+    /* TB epilogue */
+    tb_ret_addr = s->code_ptr;
+    tcg_out_addi(s, TCG_REG_30, stack_addend);
+    for(i = ARRAY_SIZE(tcg_target_callee_save_regs) - 1; i >= 0; i--) {
+        tcg_out_pop(s, tcg_target_callee_save_regs[i]);
+    }
+
+    tcg_out_pop(s, TCG_REG_29);
+    tcg_out_pop(s, TCG_REG_28);
+    tcg_out_pop(s, TCG_REG_27);
+    tcg_out_pop(s, TCG_REG_26);
+    tcg_out_fmt_jmp(s, INSN_RET, TCG_REG_31, TCG_REG_26, 0);		/* ret */
+}
+
+
+void tcg_target_init(TCGContext *s)
+{
+    /* fail safe */
+    if ((1 << CPU_TLB_ENTRY_BITS) != sizeof(CPUTLBEntry))
+        tcg_abort();
+
+    tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xffffffff);
+    tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, 0xffffffff);
+    tcg_regset_set32(tcg_target_call_clobber_regs, 0,
+		(1 << TCG_REG_1  ) | (1 << TCG_REG_2 ) | (1 << TCG_REG_3  ) | (1 << TCG_REG_4 ) |
+		(1 << TCG_REG_5  ) | (1 << TCG_REG_6 ) | (1 << TCG_REG_7  ) | (1 << TCG_REG_8 ) | 
+		(1 << TCG_REG_22) | (1 << TCG_REG_23) | (1 << TCG_REG_24) | (1 << TCG_REG_25) | 
+              (1 << TCG_REG_16) | (1 << TCG_REG_17) | (1 << TCG_REG_18) | (1 << TCG_REG_19) | 
+              (1 << TCG_REG_20) | (1 << TCG_REG_21) | (1 << TCG_REG_0 ));
+
+    //tcg_regset_set32( tcg_target_call_clobber_regs, 0, 0xffffffff);
+    
+    tcg_regset_clear(s->reserved_regs);
+    // $26~$31 not allocated by tcg.c
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_26);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_27);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_28);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_29);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_30);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_31);
+    // resved registers for tmp usage
+    tcg_regset_set_reg(s->reserved_regs, TMP_REG1);
+    tcg_regset_set_reg(s->reserved_regs, TMP_REG2);
+    tcg_regset_set_reg(s->reserved_regs, TMP_REG3);
+
+    tcg_add_target_add_op_defs(alpha_op_defs);
+}
+
diff --git a/tcg/alpha/tcg-target.h b/tcg/alpha/tcg-target.h
new file mode 100644
index 0000000..79c57af
--- /dev/null
+++ b/tcg/alpha/tcg-target.h
@@ -0,0 +1,70 @@ 
+/*
+ * Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#define TCG_TARGET_ALPHA 1
+
+#define TCG_TARGET_REG_BITS 64
+
+#define TCG_TARGET_NB_REGS 32
+
+enum {
+    TCG_REG_0 = 0, TCG_REG_1, TCG_REG_2, TCG_REG_3,
+    TCG_REG_4, TCG_REG_5, TCG_REG_6, TCG_REG_7,
+    TCG_REG_8, TCG_REG_9, TCG_REG_10, TCG_REG_11,
+    TCG_REG_12, TCG_REG_13, TCG_REG_14, TCG_REG_15,
+    TCG_REG_16, TCG_REG_17, TCG_REG_18, TCG_REG_19,
+    TCG_REG_20, TCG_REG_21, TCG_REG_22, TCG_REG_23,
+    TCG_REG_24, TCG_REG_25, TCG_REG_26, TCG_REG_27,
+    TCG_REG_28, TCG_REG_29, TCG_REG_30, TCG_REG_31
+};
+
+/* used for function call generation */
+#define TCG_REG_CALL_STACK TCG_REG_30
+#define TCG_TARGET_STACK_ALIGN 16
+#define TCG_TARGET_CALL_STACK_OFFSET 0
+
+/* we have signed extension instructions */
+#define TCG_TARGET_HAS_ext8s_i32
+#define TCG_TARGET_HAS_ext16s_i32
+#define TCG_TARGET_HAS_ext8s_i64
+#define TCG_TARGET_HAS_ext16s_i64
+#define TCG_TARGET_HAS_ext32s_i64
+
+/* Note: must be synced with dyngen-exec.h */
+#define TCG_AREG0 TCG_REG_15
+#define TCG_AREG1 TCG_REG_9
+#define TCG_AREG2 TCG_REG_10
+#define TCG_AREG3 TCG_REG_11
+#define TCG_AREG4 TCG_REG_12
+#define TCG_AREG5 TCG_REG_13
+#define TCG_AREG6 TCG_REG_14
+
+#define TMP_REG1 TCG_REG_23
+#define TMP_REG2 TCG_REG_24
+#define TMP_REG3 TCG_REG_25
+
+static inline void flush_icache_range(unsigned long start, unsigned long stop)
+{
+    __asm__ __volatile__ ("call_pal 0x86");
+}
+
-- 
1.6.3.3