diff mbox series

[v2,1/3] accel/tcg: Add a quicker check for breakpoints

Message ID 20221025202424.195984-2-leandro.lupori@eldorado.org.br
State New
Headers show
Series Performance optimizations for PPC64 | expand

Commit Message

Leandro Lupori Oct. 25, 2022, 8:24 p.m. UTC
Profiling QEMU during Fedora 35 for PPC64 boot revealed that a
considerable amount of time was being spent in
check_for_breakpoints() (0.61% of total time on PPC64 and 2.19% on
amd64), even though it was just checking that its queue was empty
and returning, when no breakpoints were set. It turns out this
function is not inlined by the compiler and it's always called by
helper_lookup_tb_ptr(), one of the most called functions.

By leaving only the check for empty queue in
check_for_breakpoints() and moving the remaining code to
check_for_breakpoints_slow(), called only when the queue is not
empty, it's possible to avoid the call overhead. An improvement of
about 3% in total time was measured on POWER9.

Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
---
 accel/tcg/cpu-exec.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

Comments

Richard Henderson Oct. 25, 2022, 9:37 p.m. UTC | #1
On 10/26/22 06:24, Leandro Lupori wrote:
> Profiling QEMU during Fedora 35 for PPC64 boot revealed that a
> considerable amount of time was being spent in
> check_for_breakpoints() (0.61% of total time on PPC64 and 2.19% on
> amd64), even though it was just checking that its queue was empty
> and returning, when no breakpoints were set. It turns out this
> function is not inlined by the compiler and it's always called by
> helper_lookup_tb_ptr(), one of the most called functions.
> 
> By leaving only the check for empty queue in
> check_for_breakpoints() and moving the remaining code to
> check_for_breakpoints_slow(), called only when the queue is not
> empty, it's possible to avoid the call overhead. An improvement of
> about 3% in total time was measured on POWER9.
> 
> Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
> ---
>   accel/tcg/cpu-exec.c | 15 +++++++++------
>   1 file changed, 9 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~
Richard Henderson Oct. 25, 2022, 9:42 p.m. UTC | #2
On 10/26/22 07:37, Richard Henderson wrote:
> On 10/26/22 06:24, Leandro Lupori wrote:
>> Profiling QEMU during Fedora 35 for PPC64 boot revealed that a
>> considerable amount of time was being spent in
>> check_for_breakpoints() (0.61% of total time on PPC64 and 2.19% on
>> amd64), even though it was just checking that its queue was empty
>> and returning, when no breakpoints were set. It turns out this
>> function is not inlined by the compiler and it's always called by
>> helper_lookup_tb_ptr(), one of the most called functions.
>>
>> By leaving only the check for empty queue in
>> check_for_breakpoints() and moving the remaining code to
>> check_for_breakpoints_slow(), called only when the queue is not
>> empty, it's possible to avoid the call overhead. An improvement of
>> about 3% in total time was measured on POWER9.
>>
>> Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
>> ---
>>   accel/tcg/cpu-exec.c | 15 +++++++++------
>>   1 file changed, 9 insertions(+), 6 deletions(-)
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Cherry-picking this one patch to tcg-next.

r~
diff mbox series

Patch

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index f9e5cc9ba0..bb4b9e92ce 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -304,16 +304,12 @@  static void log_cpu_exec(target_ulong pc, CPUState *cpu,
     }
 }
 
-static bool check_for_breakpoints(CPUState *cpu, target_ulong pc,
-                                  uint32_t *cflags)
+static bool check_for_breakpoints_slow(CPUState *cpu, target_ulong pc,
+                                       uint32_t *cflags)
 {
     CPUBreakpoint *bp;
     bool match_page = false;
 
-    if (likely(QTAILQ_EMPTY(&cpu->breakpoints))) {
-        return false;
-    }
-
     /*
      * Singlestep overrides breakpoints.
      * This requirement is visible in the record-replay tests, where
@@ -374,6 +370,13 @@  static bool check_for_breakpoints(CPUState *cpu, target_ulong pc,
     return false;
 }
 
+static inline bool check_for_breakpoints(CPUState *cpu, target_ulong pc,
+                                         uint32_t *cflags)
+{
+    return unlikely(!QTAILQ_EMPTY(&cpu->breakpoints)) &&
+        check_for_breakpoints_slow(cpu, pc, cflags);
+}
+
 /**
  * helper_lookup_tb_ptr: quick check for next tb
  * @env: current cpu state