diff mbox series

[4/7] tcg: Add support for "inlining" regions of code

Message ID 150506083546.19604.543091497330269756.stgit@frigg.lan
State New
Headers show
Series trace: Add guest code events | expand

Commit Message

Lluís Vilanova Sept. 10, 2017, 4:27 p.m. UTC
TCG BBLs and instructions have multiple exit points from where to raise
tracing events, but some of the necessary information in the generic
disassembly infrastructure is not available until after generating these
exit points.

This patch adds support for "inline points" (where the tracing code will
be placed), and "inline regions" (which identify the TCG code that must
be inlined). The TCG compiler will basically copy each inline region to
any inline points that reference it.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
---
 include/qemu/log.h      |    1 
 include/qemu/typedefs.h |    1 
 tcg/tcg-op.h            |   39 +++++++++++
 tcg/tcg-opc.h           |    3 +
 tcg/tcg.c               |  166 +++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.h               |   18 +++++
 util/log.c              |    2 +
 7 files changed, 230 insertions(+)

Comments

Richard Henderson Sept. 13, 2017, 5:09 p.m. UTC | #1
On 09/10/2017 09:27 AM, Lluís Vilanova wrote:
> TCG BBLs and instructions have multiple exit points from where to raise
> tracing events, but some of the necessary information in the generic
> disassembly infrastructure is not available until after generating these
> exit points.
> 
> This patch adds support for "inline points" (where the tracing code will
> be placed), and "inline regions" (which identify the TCG code that must
> be inlined). The TCG compiler will basically copy each inline region to
> any inline points that reference it.

I am not keen on this.

Is there a reason you can't just emit the tracing code at the appropriate place
to begin with?  Perhaps I have to wait to see how this is used...


r~
Lluís Vilanova Sept. 14, 2017, 3:20 p.m. UTC | #2
Richard Henderson writes:

> On 09/10/2017 09:27 AM, Lluís Vilanova wrote:
>> TCG BBLs and instructions have multiple exit points from where to raise
>> tracing events, but some of the necessary information in the generic
>> disassembly infrastructure is not available until after generating these
>> exit points.
>> 
>> This patch adds support for "inline points" (where the tracing code will
>> be placed), and "inline regions" (which identify the TCG code that must
>> be inlined). The TCG compiler will basically copy each inline region to
>> any inline points that reference it.

> I am not keen on this.

> Is there a reason you can't just emit the tracing code at the appropriate place
> to begin with?  Perhaps I have to wait to see how this is used...

As I tried to briefly explain on next patch, the main problem without inlining
is that we will see guest_tb_after_trans twice on the trace for each TB in
conditional instructions on the guest, since they have two exit points (which we
capture when emitting goto_tb in TCG).

We cannot instead emit it only once by overloading the brcond opcode in TCG,
since that can be used internally in the guest instruction emulation without
necessarily ending a TB (or we could have more than one brcond for a single
instruction).

I hope it's clearer now.


Thanks,
  Lluis
Richard Henderson Sept. 14, 2017, 4:15 p.m. UTC | #3
On 09/14/2017 08:20 AM, Lluís Vilanova wrote:
> Richard Henderson writes:
> 
>> On 09/10/2017 09:27 AM, Lluís Vilanova wrote:
>>> TCG BBLs and instructions have multiple exit points from where to raise
>>> tracing events, but some of the necessary information in the generic
>>> disassembly infrastructure is not available until after generating these
>>> exit points.
>>>
>>> This patch adds support for "inline points" (where the tracing code will
>>> be placed), and "inline regions" (which identify the TCG code that must
>>> be inlined). The TCG compiler will basically copy each inline region to
>>> any inline points that reference it.
> 
>> I am not keen on this.
> 
>> Is there a reason you can't just emit the tracing code at the appropriate place
>> to begin with?  Perhaps I have to wait to see how this is used...
> 
> As I tried to briefly explain on next patch, the main problem without inlining
> is that we will see guest_tb_after_trans twice on the trace for each TB in
> conditional instructions on the guest, since they have two exit points (which we
> capture when emitting goto_tb in TCG).

Without seeing the code, I suspect this is because you didn't examine the
argument to tcg_gen_exit_tb.  You can tell when goto_tb must have been emitted
and avoid logging twice.


r~
Lluís Vilanova Sept. 15, 2017, 12:55 p.m. UTC | #4
Richard Henderson writes:

> On 09/14/2017 08:20 AM, Lluís Vilanova wrote:
>> Richard Henderson writes:
>> 
>>> On 09/10/2017 09:27 AM, Lluís Vilanova wrote:
>>>> TCG BBLs and instructions have multiple exit points from where to raise
>>>> tracing events, but some of the necessary information in the generic
>>>> disassembly infrastructure is not available until after generating these
>>>> exit points.
>>>> 
>>>> This patch adds support for "inline points" (where the tracing code will
>>>> be placed), and "inline regions" (which identify the TCG code that must
>>>> be inlined). The TCG compiler will basically copy each inline region to
>>>> any inline points that reference it.
>> 
>>> I am not keen on this.
>> 
>>> Is there a reason you can't just emit the tracing code at the appropriate place
>>> to begin with?  Perhaps I have to wait to see how this is used...
>> 
>> As I tried to briefly explain on next patch, the main problem without inlining
>> is that we will see guest_tb_after_trans twice on the trace for each TB in
>> conditional instructions on the guest, since they have two exit points (which we
>> capture when emitting goto_tb in TCG).

> Without seeing the code, I suspect this is because you didn't examine the
> argument to tcg_gen_exit_tb.  You can tell when goto_tb must have been emitted
> and avoid logging twice.

The generated tracing code for 'guest_*_after' must be right before the
"goto_tb" opcode at the end of a TB (AFAIU generated by
tcg_gen_lookup_and_goto_ptr()), and we have two of those when decoding a guest
conditional jump.

If we couple this with the semantics of the trace_*_tcg functions (trace the
event at translation time, and generate TCG code to trace the event at execution
time), we get the case I described (we don't want to call trace_tb_after_tcg()
or trace_insn_after_tcg() twice for the same TB or instruction).

That is, unless I've missed something.


The only alternative I can think of is changing tracetool to offer an additional
API that provides separate functions for translation-time tracing and
execution-time generation. So from this:

  static inline void trace_event_tcg(CPUState *cpu, TCGv_env env, ...)
  {
      trace_event_trans(cpu, ...);
      if (trace_event_get_vcpu_state(cpu, EVENT_EXEC)) {
          gen_helper_trace_event_exec(env, ...);
      }
  }

We can extend it into this:

  static inline void gen_trace_event_exec(TCGv_env env, ...)
      if (trace_event_get_vcpu_state(cpu, EVENT_EXEC)) {
          gen_helper_trace_event_exec(env, ...);
      }
  }
  static inline void trace_event_tcg(CPUState *cpu, TCGv_env env, ...)
  {
      trace_event_trans(cpu, ...);
      gen_trace_event_exec(env, ...);
  }


Cheers,
  Lluis
Lluís Vilanova Sept. 26, 2017, 4:31 p.m. UTC | #5
Lluís Vilanova writes:

> Richard Henderson writes:
>> On 09/14/2017 08:20 AM, Lluís Vilanova wrote:
>>> Richard Henderson writes:
>>> 
>>>> On 09/10/2017 09:27 AM, Lluís Vilanova wrote:
>>>>> TCG BBLs and instructions have multiple exit points from where to raise
>>>>> tracing events, but some of the necessary information in the generic
>>>>> disassembly infrastructure is not available until after generating these
>>>>> exit points.
>>>>> 
>>>>> This patch adds support for "inline points" (where the tracing code will
>>>>> be placed), and "inline regions" (which identify the TCG code that must
>>>>> be inlined). The TCG compiler will basically copy each inline region to
>>>>> any inline points that reference it.
>>> 
>>>> I am not keen on this.
>>> 
>>>> Is there a reason you can't just emit the tracing code at the appropriate place
>>>> to begin with?  Perhaps I have to wait to see how this is used...
>>> 
>>> As I tried to briefly explain on next patch, the main problem without inlining
>>> is that we will see guest_tb_after_trans twice on the trace for each TB in
>>> conditional instructions on the guest, since they have two exit points (which we
>>> capture when emitting goto_tb in TCG).

>> Without seeing the code, I suspect this is because you didn't examine the
>> argument to tcg_gen_exit_tb.  You can tell when goto_tb must have been emitted
>> and avoid logging twice.

> The generated tracing code for 'guest_*_after' must be right before the
> "goto_tb" opcode at the end of a TB (AFAIU generated by
> tcg_gen_lookup_and_goto_ptr()), and we have two of those when decoding a guest
> conditional jump.

> If we couple this with the semantics of the trace_*_tcg functions (trace the
> event at translation time, and generate TCG code to trace the event at execution
> time), we get the case I described (we don't want to call trace_tb_after_tcg()
> or trace_insn_after_tcg() twice for the same TB or instruction).

> That is, unless I've missed something.


> The only alternative I can think of is changing tracetool to offer an additional
> API that provides separate functions for translation-time tracing and
> execution-time generation. So from this:

>   static inline void trace_event_tcg(CPUState *cpu, TCGv_env env, ...)
>   {
>       trace_event_trans(cpu, ...);
>       if (trace_event_get_vcpu_state(cpu, EVENT_EXEC)) {
>           gen_helper_trace_event_exec(env, ...);
>       }
>   }

> We can extend it into this:

>   static inline void gen_trace_event_exec(TCGv_env env, ...)
>       if (trace_event_get_vcpu_state(cpu, EVENT_EXEC)) {
>           gen_helper_trace_event_exec(env, ...);
>       }
>   }
>   static inline void trace_event_tcg(CPUState *cpu, TCGv_env env, ...)
>   {
>       trace_event_trans(cpu, ...);
>       gen_trace_event_exec(env, ...);
>   }

Richard, do you prefer to keep the "TCG inline" feature or switch the internal
tracing API to this second approach?


Thanks,
  Lluis
Richard Henderson Sept. 26, 2017, 4:52 p.m. UTC | #6
On 09/26/2017 09:31 AM, Lluís Vilanova wrote:
> Lluís Vilanova writes:
> 
>> Richard Henderson writes:
>>> On 09/14/2017 08:20 AM, Lluís Vilanova wrote:
>>>> Richard Henderson writes:
>>>>
>>>>> On 09/10/2017 09:27 AM, Lluís Vilanova wrote:
>>>>>> TCG BBLs and instructions have multiple exit points from where to raise
>>>>>> tracing events, but some of the necessary information in the generic
>>>>>> disassembly infrastructure is not available until after generating these
>>>>>> exit points.
>>>>>>
>>>>>> This patch adds support for "inline points" (where the tracing code will
>>>>>> be placed), and "inline regions" (which identify the TCG code that must
>>>>>> be inlined). The TCG compiler will basically copy each inline region to
>>>>>> any inline points that reference it.
>>>>
>>>>> I am not keen on this.
>>>>
>>>>> Is there a reason you can't just emit the tracing code at the appropriate place
>>>>> to begin with?  Perhaps I have to wait to see how this is used...
>>>>
>>>> As I tried to briefly explain on next patch, the main problem without inlining
>>>> is that we will see guest_tb_after_trans twice on the trace for each TB in
>>>> conditional instructions on the guest, since they have two exit points (which we
>>>> capture when emitting goto_tb in TCG).
> 
>>> Without seeing the code, I suspect this is because you didn't examine the
>>> argument to tcg_gen_exit_tb.  You can tell when goto_tb must have been emitted
>>> and avoid logging twice.
> 
>> The generated tracing code for 'guest_*_after' must be right before the
>> "goto_tb" opcode at the end of a TB (AFAIU generated by
>> tcg_gen_lookup_and_goto_ptr()), and we have two of those when decoding a guest
>> conditional jump.
> 
>> If we couple this with the semantics of the trace_*_tcg functions (trace the
>> event at translation time, and generate TCG code to trace the event at execution
>> time), we get the case I described (we don't want to call trace_tb_after_tcg()
>> or trace_insn_after_tcg() twice for the same TB or instruction).
> 
>> That is, unless I've missed something.
> 
> 
>> The only alternative I can think of is changing tracetool to offer an additional
>> API that provides separate functions for translation-time tracing and
>> execution-time generation. So from this:
> 
>>   static inline void trace_event_tcg(CPUState *cpu, TCGv_env env, ...)
>>   {
>>       trace_event_trans(cpu, ...);
>>       if (trace_event_get_vcpu_state(cpu, EVENT_EXEC)) {
>>           gen_helper_trace_event_exec(env, ...);
>>       }
>>   }
> 
>> We can extend it into this:
> 
>>   static inline void gen_trace_event_exec(TCGv_env env, ...)
>>       if (trace_event_get_vcpu_state(cpu, EVENT_EXEC)) {
>>           gen_helper_trace_event_exec(env, ...);
>>       }
>>   }
>>   static inline void trace_event_tcg(CPUState *cpu, TCGv_env env, ...)
>>   {
>>       trace_event_trans(cpu, ...);
>>       gen_trace_event_exec(env, ...);
>>   }
> 
> Richard, do you prefer to keep the "TCG inline" feature or switch the internal
> tracing API to this second approach?

I don't think I fully understand what you're proposing.  The example
transformation above is merely syntactic and has no functional change.

As previously stated, I'm not keen on the "tcg inline" approach.  I would
prefer that you hook into tcg_gen_{exit_tb,goto_tb,goto_ptr} functions within
tcg/tcg-op.c to log transitions between TBs.


r~
diff mbox series

Patch

diff --git a/include/qemu/log.h b/include/qemu/log.h
index a50e994c21..23acc63c73 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -43,6 +43,7 @@  static inline bool qemu_log_separate(void)
 #define CPU_LOG_PAGE       (1 << 14)
 #define LOG_TRACE          (1 << 15)
 #define CPU_LOG_TB_OP_IND  (1 << 16)
+#define CPU_LOG_TB_OP_INLINE (1 << 17)
 
 /* Returns true if a bit is set in the current loglevel mask
  */
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 39bc8351a3..2fb5670af3 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -96,6 +96,7 @@  typedef struct SerialState SerialState;
 typedef struct SHPCDevice SHPCDevice;
 typedef struct SMBusDevice SMBusDevice;
 typedef struct SSIBus SSIBus;
+typedef struct TCGInlineLabel TCGInlineLabel;
 typedef struct uWireSlave uWireSlave;
 typedef struct VirtIODevice VirtIODevice;
 typedef struct Visitor Visitor;
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 5d3278f243..da3784f8f2 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -326,6 +326,45 @@  void tcg_gen_ext16u_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg);
 
+static inline int _get_inline_index(TCGInlineLabel *l)
+{
+    TCGContext *s = &tcg_ctx;
+    return l - s->inline_labels;
+}
+
+static inline void gen_set_inline_point(TCGInlineLabel *l)
+{
+    TCGContext *s = &tcg_ctx;
+    TCGInlinePoint *p = tcg_malloc(sizeof(TCGInlinePoint));
+    p->op_idx = s->gen_next_op_idx;
+    p->next_point = l->first_point;
+    l->first_point = p;
+    tcg_gen_op1i(INDEX_op_set_inline_point,
+                 _get_inline_index(l));
+}
+
+static inline void gen_set_inline_region_begin(TCGInlineLabel *l)
+{
+    TCGContext *s = &tcg_ctx;
+    if (l->begin_op_idx != -1) {
+        tcg_abort();
+    }
+    l->begin_op_idx = s->gen_next_op_idx;
+    tcg_gen_op1i(INDEX_op_set_inline_region_begin,
+                 _get_inline_index(l));
+}
+
+static inline void gen_set_inline_region_end(TCGInlineLabel *l)
+{
+    TCGContext *s = &tcg_ctx;
+    if (l->begin_op_idx == -1) {
+        tcg_abort();
+    }
+    l->end_op_idx = s->gen_next_op_idx;
+    tcg_gen_op1i(INDEX_op_set_inline_region_end,
+                 _get_inline_index(l));
+}
+
 static inline void tcg_gen_discard_i32(TCGv_i32 arg)
 {
     tcg_gen_op1_i32(INDEX_op_discard, arg);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 956fb1e9f3..279ac0dc1f 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -29,6 +29,9 @@ 
 /* predefined ops */
 DEF(discard, 1, 0, 0, TCG_OPF_NOT_PRESENT)
 DEF(set_label, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_NOT_PRESENT)
+DEF(set_inline_point, 0, 0, 1, TCG_OPF_NOT_PRESENT)
+DEF(set_inline_region_begin, 0, 0, 1, TCG_OPF_NOT_PRESENT)
+DEF(set_inline_region_end, 0, 0, 1, TCG_OPF_NOT_PRESENT)
 
 /* variable number of parameters */
 DEF(call, 0, 0, 3, TCG_OPF_CALL_CLOBBER | TCG_OPF_NOT_PRESENT)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index fd8a3dfe93..b48196da27 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -251,6 +251,23 @@  TCGLabel *gen_new_label(void)
     return l;
 }
 
+TCGInlineLabel *gen_new_inline_label(void)
+{
+    TCGContext *s = &tcg_ctx;
+    int idx;
+    TCGInlineLabel *l;
+
+    if (s->nb_inline_labels >= TCG_MAX_INLINE_LABELS) {
+        tcg_abort();
+    }
+    idx = s->nb_inline_labels++;
+    l = &s->inline_labels[idx];
+    l->first_point = NULL;
+    l->begin_op_idx = -1;
+    l->end_op_idx = -1;
+    return l;
+}
+
 #include "tcg-target.inc.c"
 
 /* pool based memory allocation */
@@ -462,6 +479,10 @@  void tcg_func_start(TCGContext *s)
     s->nb_labels = 0;
     s->current_frame_offset = s->frame_start;
 
+    s->inline_labels = tcg_malloc(sizeof(TCGInlineLabel) *
+                                  TCG_MAX_INLINE_LABELS);
+    s->nb_inline_labels = 0;
+
 #ifdef CONFIG_DEBUG_TCG
     s->goto_tb_issue_mask = 0;
 #endif
@@ -1423,6 +1444,139 @@  static inline void tcg_la_bb_end(TCGContext *s, uint8_t *temp_state)
     }
 }
 
+static inline int _get_op_next(TCGContext *s, int idx)
+{
+    return s->gen_op_buf[idx].next;
+}
+
+static inline void _set_op_next(TCGContext *s, int idx, int next)
+{
+    s->gen_op_buf[idx].next = next;
+}
+
+static inline int _get_op_prev(TCGContext *s, int idx)
+{
+    return s->gen_op_buf[idx].prev;
+}
+
+static inline void _set_op_prev(TCGContext *s, int idx, int prev)
+{
+    s->gen_op_buf[idx].prev = prev;
+}
+
+static inline void _inline_region_ignore(TCGContext *s, TCGInlineLabel *l)
+{
+    int l_prev = _get_op_prev(s, l->begin_op_idx);
+    int l_next = _get_op_next(s, l->end_op_idx);
+    _set_op_next(s, l_prev, l_next);
+    _set_op_prev(s, l_next, l_prev);
+}
+
+static inline void _op_ignore(TCGContext *s, int op_idx)
+{
+    int p_prev = _get_op_prev(s, op_idx);
+    int p_next = _get_op_next(s, op_idx);
+    _set_op_next(s, p_prev, p_next);
+    _set_op_prev(s, p_next, p_prev);
+}
+
+static inline void _inline_point_ignore(TCGContext *s, TCGInlinePoint *p)
+{
+    _op_ignore(s, p->op_idx);
+}
+
+static inline void _inline_weave(TCGContext *s, TCGInlinePoint *p,
+                                 int begin, int end)
+{
+    int begin_prev = _get_op_prev(s, begin);
+    int end_next = _get_op_next(s, end);
+    int p_prev = _get_op_prev(s, p->op_idx);
+    int p_next = _get_op_next(s, p->op_idx);
+    /* point.prev -> begin */
+    _set_op_next(s, p_prev, begin);
+    _set_op_prev(s, begin, p_prev);
+    /* end -> point.next */
+    _set_op_next(s, end, p_next);
+    _set_op_prev(s, p_next, end);
+    /* begin.prev -> end.next */
+    _set_op_next(s, begin_prev, end_next);
+    _set_op_prev(s, end_next, begin_prev);
+}
+
+/*
+ * Handles inline_set_label/inline_region_begin/inline_region_end opcodes (which
+ * will disappear after this optimization).
+ */
+static void tcg_inline(TCGContext *s)
+{
+    int i;
+    for (i = 0; i < s->nb_inline_labels; i++) {
+        TCGInlineLabel *l = &s->inline_labels[i];
+        size_t region_op_count = l->end_op_idx - l->begin_op_idx - 1;
+
+        /* open region is an error */
+        if (l->begin_op_idx != -1 && l->end_op_idx == -1) {
+            tcg_abort();
+        }
+
+        if (l->first_point == NULL) {   /* region without points  */
+            _inline_region_ignore(s, l);
+        } else if (l->begin_op_idx == -1) { /* points without region */
+            TCGInlinePoint *p;
+            for (p = l->first_point; p != NULL; p = p->next_point) {
+                _inline_point_ignore(s, p);
+            }
+        } else if (region_op_count == 0) { /* empty region */
+            TCGInlinePoint *p;
+            for (p = l->first_point; p != NULL; p = p->next_point) {
+                _inline_point_ignore(s, p);
+            }
+            _inline_region_ignore(s, l);
+        } else {                        /* actual inlining */
+            bool first_point = true;
+            int l_begin = _get_op_next(s, l->begin_op_idx);
+            int l_end = _get_op_prev(s, l->end_op_idx);
+            TCGInlinePoint *p;
+            for (p = l->first_point; p != NULL; p = p->next_point) {
+                if (first_point) {
+                    /* redirect point to existing region (skip markers) */
+                    _inline_weave(s, p, l_begin, l_end);
+                    _op_ignore(s, l->begin_op_idx);
+                    _op_ignore(s, l->end_op_idx);
+                } else {
+                    /* create a copy of the region */
+                    int l_end_next = _get_op_next(s, l_end);
+                    int op;
+                    int pos = p->op_idx;
+                    for (op = l_begin; op != l_end_next;
+                         op = _get_op_next(s, op)) {
+                        /* insert opcode copies */
+                        int insert_idx = s->gen_next_op_idx;
+                        int opc = s->gen_op_buf[op].opc;
+                        int args = s->gen_op_buf[op].args;
+                        int nargs = tcg_op_defs[opc].nb_args;
+                        if (opc == INDEX_op_call) {
+                            nargs += s->gen_op_buf[op].calli;
+                            nargs += s->gen_op_buf[op].callo;
+                        }
+                        tcg_op_insert_after(s, &s->gen_op_buf[pos], opc, nargs);
+                        pos = insert_idx;
+                        s->gen_op_buf[pos].calli = s->gen_op_buf[op].calli;
+                        s->gen_op_buf[pos].callo = s->gen_op_buf[op].callo;
+                        /* insert argument copies */
+                        memcpy(&s->gen_opparam_buf[s->gen_op_buf[pos].args],
+                               &s->gen_opparam_buf[args],
+                               nargs * sizeof(s->gen_opparam_buf[0]));
+                    }
+                    _op_ignore(s, p->op_idx);
+                }
+                first_point = false;
+            }
+        }
+    }
+}
+
+
 /* Liveness analysis : update the opc_arg_life array to tell if a
    given input arguments is dead. Instructions updating dead
    temporaries are removed. */
@@ -2560,6 +2714,18 @@  int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     }
 #endif
 
+    /* inline code regions before any optimization pass */
+    tcg_inline(s);
+
+#ifdef DEBUG_DISAS
+    if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_INLINE)
+                 && qemu_log_in_addr_range(tb->pc))) {
+        qemu_log("OP after inline:\n");
+        tcg_dump_ops(s);
+        qemu_log("\n");
+    }
+#endif
+
 #ifdef CONFIG_PROFILER
     s->opt_time -= profile_getclock();
 #endif
diff --git a/tcg/tcg.h b/tcg/tcg.h
index ac94133870..c6e3c6e68d 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -397,6 +397,20 @@  static inline unsigned get_alignment_bits(TCGMemOp memop)
 
 typedef tcg_target_ulong TCGArg;
 
+#define TCG_MAX_INLINE_REGIONS 1
+#define TCG_MAX_INLINE_LABELS TCG_MAX_INLINE_REGIONS
+
+typedef struct TCGInlinePoint {
+    int op_idx;
+    struct TCGInlinePoint *next_point;
+} TCGInlinePoint;
+
+typedef struct TCGInlineLabel {
+    TCGInlinePoint *first_point;
+    int begin_op_idx, end_op_idx;
+} TCGInlineLabel;
+
+
 /* Define type and accessor macros for TCG variables.
 
    TCG variables are the inputs and outputs of TCG ops, as described
@@ -649,6 +663,9 @@  struct TCGContext {
     int nb_temps;
     int nb_indirects;
 
+    TCGInlineLabel *inline_labels;
+    int nb_inline_labels;
+
     /* goto_tb support */
     tcg_insn_unit *code_buf;
     uint16_t *tb_jmp_reset_offset; /* tb->jmp_reset_offset */
@@ -950,6 +967,7 @@  TCGv_i32 tcg_const_local_i32(int32_t val);
 TCGv_i64 tcg_const_local_i64(int64_t val);
 
 TCGLabel *gen_new_label(void);
+TCGInlineLabel *gen_new_inline_label(void);
 
 /**
  * label_arg
diff --git a/util/log.c b/util/log.c
index 96f30dd21a..947a982c74 100644
--- a/util/log.c
+++ b/util/log.c
@@ -246,6 +246,8 @@  const QEMULogItem qemu_log_items[] = {
       "show target assembly code for each compiled TB" },
     { CPU_LOG_TB_OP, "op",
       "show micro ops for each compiled TB" },
+    { CPU_LOG_TB_OP_INLINE, "op_inline",
+      "show micro ops after inlining" },
     { CPU_LOG_TB_OP_OPT, "op_opt",
       "show micro ops after optimization" },
     { CPU_LOG_TB_OP_IND, "op_ind",