diff mbox series

[ovs-dev] utilities: Add TASK_STOPPED accounting to the kernel_delay.py script.

Message ID 09c84d5f6c39f61f697c7f09e12f00e4fb410440.1706187082.git.echaudro@redhat.com
State Changes Requested
Delegated to: Simon Horman
Headers show
Series [ovs-dev] utilities: Add TASK_STOPPED accounting to the kernel_delay.py script. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Eelco Chaudron Jan. 25, 2024, 12:51 p.m. UTC
This changes add statistics for when a thread is put into stop state.
For example with the following:

kill -STOP $(pidof ovs-vswitchd); sleep 1; kill -CONT $(pidof ovs-vswitchd);

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
 utilities/usdt-scripts/kernel_delay.py  | 101 +++++++++++++++++++-----
 utilities/usdt-scripts/kernel_delay.rst |  24 ++++++
 2 files changed, 105 insertions(+), 20 deletions(-)

Comments

Simon Horman Jan. 26, 2024, 10:48 a.m. UTC | #1
On Thu, Jan 25, 2024 at 01:51:22PM +0100, Eelco Chaudron wrote:
> This changes add statistics for when a thread is put into stop state.
> For example with the following:
> 
> kill -STOP $(pidof ovs-vswitchd); sleep 1; kill -CONT $(pidof ovs-vswitchd);
> 
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

Thanks Eelco,

I have some minor feedback below, that doesn't need to block
forward progress of this patch.

Acked-by: Simon Horman <horms@ovn.org>

> ---
>  utilities/usdt-scripts/kernel_delay.py  | 101 +++++++++++++++++++-----
>  utilities/usdt-scripts/kernel_delay.rst |  24 ++++++
>  2 files changed, 105 insertions(+), 20 deletions(-)
> 
> diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py
> index b2012fdf2..7570d93c1 100755
> --- a/utilities/usdt-scripts/kernel_delay.py
> +++ b/utilities/usdt-scripts/kernel_delay.py
> @@ -81,7 +81,6 @@ struct event_t {
>  
>      u32 syscall;
>      u64 entry_ts;
> -
>  };
>  
>  BPF_RINGBUF_OUTPUT(events, <BUFFER_PAGE_CNT>);
> @@ -241,15 +240,13 @@ TRACEPOINT_PROBE(raw_syscalls, sys_exit) {
>      return 0;
>  }
>  
> -

super nit: for consistency I think an extra blank line should go here.

>  /*
> - * For measuring the thread run time, we need the following.
> + * For measuring the thread stopped time, we need the following.
>   */
> -struct run_time_data_t {
> +struct stop_time_data_t {
>      u64 count;
>      u64 total_ns;
> -    u64 max_ns;
> -    u64 min_ns;
> +    u64 worst_ns;
>  };
>  
>  struct pid_tid_key_t {
> @@ -257,6 +254,43 @@ struct pid_tid_key_t {
>      u32  tid;
>  };
>  
> +BPF_HASH(stop_start, u64, u64);
> +BPF_HASH(stop_data, struct pid_tid_key_t, struct stop_time_data_t);
> +
> +static inline void thread_handle_stopped_run(u32 pid, u32 tgid, u64 ktime)
> +{
> +    u64 pid_tgid = (u64) tgid << 32 | pid;
> +    u64 *start_ns = stop_start.lookup(&pid_tgid);
> +
> +    if (!start_ns || *start_ns == 0)
> +        return;
> +
> +    struct stop_time_data_t *val, zero = {};
> +    struct pid_tid_key_t key = { .pid = tgid,
> +                                 .tid = pid };
> +
> +    val = stop_data.lookup_or_try_init(&key, &zero);
> +    if (val) {
> +        u64 delta = ktime - *start_ns;
> +        val->count++;
> +        val->total_ns += delta;
> +        if (val->worst_ns == 0 || delta > val->worst_ns)
> +            val->worst_ns = delta;

I am curious to know why the 'worst_ns == 0' part of the condition is needed.
My my reading, it seems that if used it would set worst_ns to it's current
value of 0.

> +    }
> +    *start_ns = 0;
> +}
> +
> +
> +/*
> + * For measuring the thread run time, we need the following.
> + */
> +struct run_time_data_t {
> +    u64 count;
> +    u64 total_ns;
> +    u64 max_ns;
> +    u64 min_ns;
> +};
> +
>  BPF_HASH(run_start, u64, u64);
>  BPF_HASH(run_data, struct pid_tid_key_t, struct run_time_data_t);
>  

...
Eelco Chaudron Jan. 26, 2024, 10:55 a.m. UTC | #2
On 26 Jan 2024, at 11:48, Simon Horman wrote:

> On Thu, Jan 25, 2024 at 01:51:22PM +0100, Eelco Chaudron wrote:
>> This changes add statistics for when a thread is put into stop state.
>> For example with the following:
>>
>> kill -STOP $(pidof ovs-vswitchd); sleep 1; kill -CONT $(pidof ovs-vswitchd);
>>
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>
> Thanks Eelco,
>
> I have some minor feedback below, that doesn't need to block
> forward progress of this patch.

Thanks, see below, will send a v2.

//Eelco


> Acked-by: Simon Horman <horms@ovn.org>
>
>> ---
>>  utilities/usdt-scripts/kernel_delay.py  | 101 +++++++++++++++++++-----
>>  utilities/usdt-scripts/kernel_delay.rst |  24 ++++++
>>  2 files changed, 105 insertions(+), 20 deletions(-)
>>
>> diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py
>> index b2012fdf2..7570d93c1 100755
>> --- a/utilities/usdt-scripts/kernel_delay.py
>> +++ b/utilities/usdt-scripts/kernel_delay.py
>> @@ -81,7 +81,6 @@ struct event_t {
>>
>>      u32 syscall;
>>      u64 entry_ts;
>> -
>>  };
>>
>>  BPF_RINGBUF_OUTPUT(events, <BUFFER_PAGE_CNT>);
>> @@ -241,15 +240,13 @@ TRACEPOINT_PROBE(raw_syscalls, sys_exit) {
>>      return 0;
>>  }
>>
>> -
>
> super nit: for consistency I think an extra blank line should go here.

Good catch, I thought I got all of these ;)

>>  /*
>> - * For measuring the thread run time, we need the following.
>> + * For measuring the thread stopped time, we need the following.
>>   */
>> -struct run_time_data_t {
>> +struct stop_time_data_t {
>>      u64 count;
>>      u64 total_ns;
>> -    u64 max_ns;
>> -    u64 min_ns;
>> +    u64 worst_ns;
>>  };
>>
>>  struct pid_tid_key_t {
>> @@ -257,6 +254,43 @@ struct pid_tid_key_t {
>>      u32  tid;
>>  };
>>
>> +BPF_HASH(stop_start, u64, u64);
>> +BPF_HASH(stop_data, struct pid_tid_key_t, struct stop_time_data_t);
>> +
>> +static inline void thread_handle_stopped_run(u32 pid, u32 tgid, u64 ktime)
>> +{
>> +    u64 pid_tgid = (u64) tgid << 32 | pid;
>> +    u64 *start_ns = stop_start.lookup(&pid_tgid);
>> +
>> +    if (!start_ns || *start_ns == 0)
>> +        return;
>> +
>> +    struct stop_time_data_t *val, zero = {};
>> +    struct pid_tid_key_t key = { .pid = tgid,
>> +                                 .tid = pid };
>> +
>> +    val = stop_data.lookup_or_try_init(&key, &zero);
>> +    if (val) {
>> +        u64 delta = ktime - *start_ns;
>> +        val->count++;
>> +        val->total_ns += delta;
>> +        if (val->worst_ns == 0 || delta > val->worst_ns)
>> +            val->worst_ns = delta;
>
> I am curious to know why the 'worst_ns == 0' part of the condition is needed.
> My my reading, it seems that if used it would set worst_ns to it's current
> value of 0.

Good catch, guess this was a quick cut/paste from the min_ns case. Will fix this in all existing code.

>> +    }
>> +    *start_ns = 0;
>> +}
>> +
>> +
>> +/*
>> + * For measuring the thread run time, we need the following.
>> + */
>> +struct run_time_data_t {
>> +    u64 count;
>> +    u64 total_ns;
>> +    u64 max_ns;
>> +    u64 min_ns;
>> +};
>> +
>>  BPF_HASH(run_start, u64, u64);
>>  BPF_HASH(run_data, struct pid_tid_key_t, struct run_time_data_t);
>>
>
> ...
diff mbox series

Patch

diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py
index b2012fdf2..7570d93c1 100755
--- a/utilities/usdt-scripts/kernel_delay.py
+++ b/utilities/usdt-scripts/kernel_delay.py
@@ -81,7 +81,6 @@  struct event_t {
 
     u32 syscall;
     u64 entry_ts;
-
 };
 
 BPF_RINGBUF_OUTPUT(events, <BUFFER_PAGE_CNT>);
@@ -241,15 +240,13 @@  TRACEPOINT_PROBE(raw_syscalls, sys_exit) {
     return 0;
 }
 
-
 /*
- * For measuring the thread run time, we need the following.
+ * For measuring the thread stopped time, we need the following.
  */
-struct run_time_data_t {
+struct stop_time_data_t {
     u64 count;
     u64 total_ns;
-    u64 max_ns;
-    u64 min_ns;
+    u64 worst_ns;
 };
 
 struct pid_tid_key_t {
@@ -257,6 +254,43 @@  struct pid_tid_key_t {
     u32  tid;
 };
 
+BPF_HASH(stop_start, u64, u64);
+BPF_HASH(stop_data, struct pid_tid_key_t, struct stop_time_data_t);
+
+static inline void thread_handle_stopped_run(u32 pid, u32 tgid, u64 ktime)
+{
+    u64 pid_tgid = (u64) tgid << 32 | pid;
+    u64 *start_ns = stop_start.lookup(&pid_tgid);
+
+    if (!start_ns || *start_ns == 0)
+        return;
+
+    struct stop_time_data_t *val, zero = {};
+    struct pid_tid_key_t key = { .pid = tgid,
+                                 .tid = pid };
+
+    val = stop_data.lookup_or_try_init(&key, &zero);
+    if (val) {
+        u64 delta = ktime - *start_ns;
+        val->count++;
+        val->total_ns += delta;
+        if (val->worst_ns == 0 || delta > val->worst_ns)
+            val->worst_ns = delta;
+    }
+    *start_ns = 0;
+}
+
+
+/*
+ * For measuring the thread run time, we need the following.
+ */
+struct run_time_data_t {
+    u64 count;
+    u64 total_ns;
+    u64 max_ns;
+    u64 min_ns;
+};
+
 BPF_HASH(run_start, u64, u64);
 BPF_HASH(run_data, struct pid_tid_key_t, struct run_time_data_t);
 
@@ -312,6 +346,8 @@  static inline int sched_wakeup__(u32 pid, u32 tgid)
 
     u64 t = bpf_ktime_get_ns();
     ready_start.update(&pid_tgid, &t);
+
+    thread_handle_stopped_run(pid, tgid, t);
     return 0;
 }
 
@@ -336,22 +372,26 @@  RAW_TRACEPOINT_PROBE(sched_switch)
     if (!capture_enabled__())
         return 0;
 
-    if (prev-><STATE_FIELD> == TASK_RUNNING && prev->tgid == MONITOR_PID)
-        sched_wakeup__(prev->pid, prev->tgid);
-
     if (prev->tgid == MONITOR_PID) {
+        u64 prev_pid_tgid = (u64)next->tgid << 32 | next->pid;
         ktime = bpf_ktime_get_ns();
+
+        if (prev-><STATE_FIELD> == TASK_RUNNING)
+            ready_start.update(&prev_pid_tgid, &ktime);
+
+        if (prev-><STATE_FIELD> & __TASK_STOPPED)
+            stop_start.update(&prev_pid_tgid, &ktime);
+
         thread_stop_run(prev->pid, prev->tgid, ktime);
     }
 
-    u64 pid_tgid = (u64)next->tgid << 32 | next->pid;
-
     if (next->tgid != MONITOR_PID)
         return 0;
 
     if (ktime == 0)
         ktime = bpf_ktime_get_ns();
 
+    u64 pid_tgid = (u64)next->tgid << 32 | next->pid;
     u64 *start_ns = ready_start.lookup(&pid_tgid);
 
     if (start_ns && *start_ns != 0) {
@@ -844,6 +884,8 @@  def reset_capture():
     bpf["softirq_start"].clear()
     bpf["softirq_data"].clear()
     bpf["stack_traces"].clear()
+    bpf["stop_start"].clear()
+    bpf["stop_data"].clear()
 
 
 #
@@ -879,6 +921,9 @@  def process_results(syscall_events=None, trigger_delta=None):
     threads_ready = {k.tid for k, _ in bpf["ready_data"].items()
                      if k.pid != 0xffffffff}
 
+    threads_stopped = {k.tid for k, _ in bpf["stop_data"].items()
+                       if k.pid != 0xffffffff}
+
     threads_hardirq = {k.tid for k, _ in bpf["hardirq_data"].items()
                        if k.pid != 0xffffffff}
 
@@ -886,7 +931,7 @@  def process_results(syscall_events=None, trigger_delta=None):
                        if k.pid != 0xffffffff}
 
     threads = sorted(threads_syscall | threads_run | threads_ready |
-                     threads_hardirq | threads_softirq,
+                     threads_stopped | threads_hardirq | threads_softirq,
                      key=lambda x: get_thread_name(options.pid, x))
 
     #
@@ -933,28 +978,44 @@  def process_results(syscall_events=None, trigger_delta=None):
         #
         # THREAD RUN STATISTICS
         #
-        print("\n{:10} {:16} {}\n{}{:10}  {:>16}  {:>16}  {:>16}".format(
-            "", "", "[THREAD RUN STATISTICS]", indent,
-            "SCHED_CNT", "TOTAL ns", "MIN ns", "MAX ns"))
-
         for k, v in filter(lambda t: t[0].tid == thread,
                            bpf["run_data"].items()):
 
+            print("\n{:10} {:16} {}\n{}{:10}  {:>16}  {:>16}  {:>16}".format(
+                "", "", "[THREAD RUN STATISTICS]", indent,
+                "SCHED_CNT", "TOTAL ns", "MIN ns", "MAX ns"))
+
             print("{}{:10}  {:16,}  {:16,}  {:16,}".format(
                 indent, v.count, v.total_ns, v.min_ns, v.max_ns))
+            break
 
         #
         # THREAD READY STATISTICS
         #
-        print("\n{:10} {:16} {}\n{}{:10}  {:>16}  {:>16}".format(
-            "", "", "[THREAD READY STATISTICS]", indent,
-            "SCHED_CNT", "TOTAL ns", "MAX ns"))
-
         for k, v in filter(lambda t: t[0].tid == thread,
                            bpf["ready_data"].items()):
 
+            print("\n{:10} {:16} {}\n{}{:10}  {:>16}  {:>16}".format(
+                "", "", "[THREAD READY STATISTICS]", indent,
+                "SCHED_CNT", "TOTAL ns", "MAX ns"))
+
             print("{}{:10}  {:16,}  {:16,}".format(
                 indent, v.count, v.total_ns, v.worst_ns))
+            break
+
+        #
+        # THREAD STOPPED STATISTICS
+        #
+        for k, v in filter(lambda t: t[0].tid == thread,
+                           bpf["stop_data"].items()):
+
+            print("\n{:10} {:16} {}\n{}{:10}  {:>16}  {:>16}".format(
+                "", "", "[THREAD STOPPED STATISTICS]", indent,
+                "STOP_CNT", "TOTAL ns", "MAX ns"))
+
+            print("{}{:10}  {:16,}  {:16,}".format(
+                indent, v.count, v.total_ns, v.worst_ns))
+            break
 
         #
         # HARD IRQ STATISTICS
diff --git a/utilities/usdt-scripts/kernel_delay.rst b/utilities/usdt-scripts/kernel_delay.rst
index 0ebd30afb..e2e43752d 100644
--- a/utilities/usdt-scripts/kernel_delay.rst
+++ b/utilities/usdt-scripts/kernel_delay.rst
@@ -75,6 +75,10 @@  with the ``--pid`` option.
                   SCHED_CNT           TOTAL ns            MAX ns
                        7            11,334             6,636
 
+                  [THREAD STOPPED STATISTICS]
+                  STOP_CNT            TOTAL ns            MAX ns
+                       3         3,045,728,323     1,015,739,474
+
                   [HARD IRQ STATISTICS]
                   NAME                       COUNT          TOTAL ns            MAX ns
                   eno8303-rx-1                   1             3,586             3,586
@@ -102,6 +106,7 @@  followed by resource-specific data. Which are:
 - ``SYSCALL STATISTICS``
 - ``THREAD RUN STATISTICS``
 - ``THREAD READY STATISTICS``
+- ``THREAD STOPPED STATISTICS``
 - ``HARD IRQ STATISTICS``
 - ``SOFT IRQ STATISTICS``
 
@@ -143,6 +148,25 @@  Note that these statistics only count events where the thread was getting
 ready to run and started running during the measurement interval.
 
 
+``THREAD STOPPED STATISTICS``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+``THREAD STOPPED STATISTICS`` reveal the number of instances where the thread
+has been scheduled out while in the running state due to its transition to
+the TASK_STOPPED state.
+
+This behavior can be replicated by manually placing the thread in the stopped
+state and subsequently resuming it. For instance:
+
+.. code-block:: console
+
+  # kill -STOP $(pidof ovs-vswitchd); \
+    sleep 1; \
+    kill -CONT $(pidof ovs-vswitchd);
+
+Note that these statistics only count events where the thread was running at
+the time it was put to stopped state.
+
+
 ``HARD IRQ STATISTICS``
 ~~~~~~~~~~~~~~~~~~~~~~~
 ``HARD IRQ STATISTICS`` tell you how much time was spent servicing hard