diff mbox series

[ovs-dev] utilities: Add upcall statistics to the kernel_delay.py script.

Message ID 173254404327.545539.5146903122430077833.stgit@wsfd-advnetlab224.anl.eng.rdu2.dc.redhat.com
State Changes Requested
Delegated to: aaron conole
Headers show
Series [ovs-dev] utilities: Add upcall statistics to the kernel_delay.py script. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Eelco Chaudron Nov. 25, 2024, 2:14 p.m. UTC
This patch installs a kernel return probe on ovs_dp_upcall() to record
all successful and failed calls per CPU. The statistics are reported
when the script exits, providing insights into the upcall behavior.

This is an example output:

  # UPCALL STATISTICS:
    Total upcalls       : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...]
    Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...]
    Failed upcalls      : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
       11, EAGAIN       : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
 utilities/automake.mk                   |    1 
 utilities/usdt-scripts/kernel_delay.py  |  127 +++++++++++++++++++++++++++++++
 utilities/usdt-scripts/kernel_delay.rst |    9 ++
 3 files changed, 136 insertions(+), 1 deletion(-)

Comments

Eelco Chaudron Nov. 27, 2024, 7:40 a.m. UTC | #1
On 25 Nov 2024, at 15:14, Eelco Chaudron wrote:

> This patch installs a kernel return probe on ovs_dp_upcall() to record
> all successful and failed calls per CPU. The statistics are reported
> when the script exits, providing insights into the upcall behavior.
>
> This is an example output:
>
>   # UPCALL STATISTICS:
>     Total upcalls       : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...]
>     Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...]
>     Failed upcalls      : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
>        11, EAGAIN       : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
>
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

Recheck-request: github-robot
Adrián Moreno Dec. 13, 2024, 12:56 p.m. UTC | #2
On Mon, Nov 25, 2024 at 03:14:20PM +0100, Eelco Chaudron wrote:
> This patch installs a kernel return probe on ovs_dp_upcall() to record
> all successful and failed calls per CPU. The statistics are reported
> when the script exits, providing insights into the upcall behavior.
>
> This is an example output:
>
>   # UPCALL STATISTICS:
>     Total upcalls       : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...]
>     Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...]
>     Failed upcalls      : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
>        11, EAGAIN       : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
>
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

Thanks for this patch. Correlating upcall failures to kernel delays is
extremely useful. Some comments below.

> ---
>  utilities/automake.mk                   |    1
>  utilities/usdt-scripts/kernel_delay.py  |  127 +++++++++++++++++++++++++++++++
>  utilities/usdt-scripts/kernel_delay.rst |    9 ++
>  3 files changed, 136 insertions(+), 1 deletion(-)
>
> diff --git a/utilities/automake.mk b/utilities/automake.mk
> index acc1af4e0..22260b254 100644
> --- a/utilities/automake.mk
> +++ b/utilities/automake.mk
> @@ -149,6 +149,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \
>  	utilities/ovs-pipegen.py \
>  	utilities/usdt-scripts/dpif_op_nl_monitor.py \
>  	utilities/usdt-scripts/flow_reval_monitor.py \
> +	utilities/usdt-scripts/kernel_delay.py \
>  	utilities/usdt-scripts/upcall_monitor.py \
>  	utilities/usdt-scripts/upcall_cost.py
>
> diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py
> index de6b0c9de..a5aa1a3d4 100755
> --- a/utilities/usdt-scripts/kernel_delay.py
> +++ b/utilities/usdt-scripts/kernel_delay.py
> @@ -39,6 +39,7 @@
>  #
>  import argparse
>  import datetime
> +import errno
>  import os
>  import pytz
>  import psutil
> @@ -556,6 +557,30 @@ TRACEPOINT_PROBE(irq, softirq_exit)
>      data->start_ns = 0;
>      return 0;
>  }
> +
> +
> +/*
> + * For measuring upcall statistics (per CPU).
> + */
> +#define MAX_ERROR 255

Why this limit? The kernel seems to use 4095 in the PTR_ERR
infrastructure. Although not exposed to user applications,
kernel has some > 255 values. They don't seem networking related for
now but still.

In order to reduce this arbitrary limit and avoid needless iteration
over an almost empty array, I'd make it a percpu_hash. It supports
"lookup_or_try_init" and "increment" so it's as easy to use from ebpf
and almost identically from python.

> +BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 1);
> +
> +#if <INSTALL_OVS_DP_UPCALL_PROBE>
> +int kretprobe__ovs_dp_upcall(struct pt_regs *ctx)
> +{

When trying the script I was a bit confused that numbers didn't add up,
e.g:

# UPCALL STATISTICS:
  Total upcalls       : 776234 [0: 204345, 1: 38773, 2: 38206, 3:
38700, 4: 104040, 5: 77400, 6: 70290, 8: 69705, 9: 67410, 10: 14220,
11: 53145]
  Successfull upcalls : 268640 [0: 72829, 1: 14556, 2: 12876, 3:
13192, 4: 34227, 5: 26480, 6: 24130, 8: 23791, 9: 23372, 10: 6164, 11:
17023]
  Failed upcalls      : 508564 [0: 131516, 1: 25187, 2: 25330, 3:
25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11:
36122]
     11, EAGAIN       : 509485 [0: 131516, 1: 26112, 2: 25330, 3:
25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11:
36122]

Total of EAGAIN erros != total Failed and
Total Failed + Total Successfull != Total.

I believe the reason is that events of this kind are still being
triggered while results are being processed. The problem seems to go
away if I add the following lines at the top of this probe:

if (!capture_enabled__())
    return 0;


> +    int ret = PT_REGS_RC(ctx);
> +
> +    if (ret > 0)
> +        ret = 0;
> +
> +    ret = -ret;
> +    if (ret > MAX_ERROR)
> +        ret = MAX_ERROR;
> +
> +    upcall_count.increment(ret);
> +    return 0;
> +}
> +#endif /* For measuring upcall statistics/errors. */
>  """
>
>
> @@ -900,6 +925,95 @@ def print_timestamp(msg):
>      print(time_string)
>
>
> +#
> +# Get errno short string
> +#
> +def get_errno_short(err):
> +    try:
> +        return errno.errorcode[err]
> +    except KeyError:
> +        return "_unknown_"
> +
> +
> +#
> +# Format a bfd per cpu array entry per cpu (if the count is > 0)
> +#

s/bfd/bpf?
Also, "per cpu" twice makes the sentense a bit difficult to read.

> +def format_per_cpu_array(cpu_array, key=None, skip_key=None):
> +    per_cpu = ""
> +
> +    if key is not None:

"is not None" is redundant: "if key:" is equivalent.

> +        total = cpu_array.sum(key).value
> +        if total > 0:
> +            for cpu, value in enumerate(cpu_array.getvalue(key)):
> +                if value == 0:
> +                    continue
> +
> +                per_cpu += " {}: {},".format(cpu, value)
> +    else:
> +        total = 0
> +        total_cpu = None
> +
> +        for key in cpu_array.keys():

See below.

> +            # I've seen extrem odd hight values, to avoid those, exclude any

typo? "extremely odd high values"?

> +            # index higher thant 1024 for now.

What values you were getting? I'm not seing any weird value.
Besides, isn't the maximum set in the ebpf side to 255?
Plus, it's an array so its size is limitted and no key can exceed 256
right?

> +            if (not (0 <= key.value <= 1024)) or (skip_key is not None and
> +                                                  skip_key.value == key.value):

ditto: key.value in range(0, 1024)
s/skip_key is not None/skip_key/?

OTOH, from include/linux/err.h:
#define MAX_ERRNO	4095

Maybe that value makes more sense although I really don't think we need
a check here.


> +                continue

> +
> +            if total_cpu is None:

if not total_cpu

> +                total_cpu = [0] * len(cpu_array.getvalue(key))
> +
> +            for cpu, value in enumerate(cpu_array.getvalue(key)):
> +                total_cpu[cpu] += value
> +                total += value
> +
> +        if total >= 0:
> +            for cpu, value in enumerate(total_cpu):
> +                if value == 0:
> +                    continue
> +
> +                per_cpu += " {}: {},".format(cpu, value)
> +
> +    return str(total), per_cpu.strip(", ")
> +
> +
> +#
> +# Display kernel upcall statistics
> +#
> +def display_upcall_results():
> +    upcalls = bpf.get_table("upcall_count")

From bcc's reference: get_table()
Returns a table object. This is no longer used, as tables can now be
read as items from BPF. Eg: ```BPF[name]```.

Not sure if this means the API will be deprecated...

> +    have_upcalls = False
> +
> +    for k in upcalls.keys():

Interestingly, "keys()" is not documented as an API. Nevertheless, it's
not necessary, since "table" iterates over keys (same as list or dict in
python). So it could be written as "for k in upcalls:".

> +        if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0:

I think the pythonic way is "if k.value not in range(0, 255). Still how
would it be possible to read keys greater than 255 if it's being limited
at the ebpf side?

> +            continue
> +        have_upcalls = True
> +        break
> +
> +    if not have_upcalls:
> +        return
> +
> +    print("\n\n# UPCALL STATISTICS:\n  Total upcalls       : {} [{}]".format(
> +        *format_per_cpu_array(upcalls)))
> +

When looking at theese statistics, users may take a while to figure out
what each number means, e.g:

# UPCALL STATISTICS:
  Total upcalls       : 240933 [0: 33165, 2: 27495, 3: 18270, 4:
15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345]
  Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5:
11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475]
  Failed upcalls      : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870]
     11, EAGAIN       : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387

An option could be to describe the format in the header, e.g:

# UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...])
  Total upcalls       : 240933 [0: 33165, 2: 27495, 3: 18270, 4:
15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345]
  Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5:
11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475]
  Failed upcalls      : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870]
     11, EAGAIN       : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387

Another option could be to print them in a table:

     Result       TOTAL    CPU 0   CPU 2   CPU 3    CPU 4   CPU 5
CPU 6   CPU 8   CPU 9    CPU 10
 -------------- --------- ------- ------- -------- ------- -------
------- ------- -------- --------
  TOTAL           240933   33165   27495    18270   15300   33885
4050   51348    33075    24345
   Successful     80538    11804    8382     5660    5716   11606
2845   13717    10333    10475
   Failed        161587    21361   19113    12610    9584   22279
2397   37631    22742    13870
    11, EAGAIN   162528    21361   19113    12610    9584   22279
3345   37631    22742     1387




> +    for k in upcalls.keys():

nit: Same as above, "for k in upcalls:"

> +        error = k.value
> +        total, per_cpu = format_per_cpu_array(upcalls, key=k)
> +
> +        if error != 0 and total == "0":

Maybe return the total as integer to avoid this strange looking '== "0"'?

> +            continue
> +
> +        if error == 0:
> +            print("  Successfull upcalls : {} [{}]".format(total, per_cpu))
> +
> +            total, per_cpu = format_per_cpu_array(upcalls, skip_key=k)
> +            print("  Failed upcalls      : {} [{}]".format(total, per_cpu))
> +        else:
> +            print("    {:3}, {:13}: {} [{}]".format(error,
> +                                                    get_errno_short(error),
> +                                                    total, per_cpu))
> +
> +
>  #
>  # process_results()
>  #
> @@ -1074,7 +1188,12 @@ def process_results(syscall_events=None, trigger_delta=None):
>                  indent, "TOTAL:", "", total_count, total_ns))
>
>      #
> -    # Print events
> +    # Print upcall statistics
> +    #
> +    display_upcall_results()
> +
> +    #
> +    # Print syscall events
>      #
>      lost_stack_traces = 0
>      if syscall_events:
> @@ -1194,6 +1313,9 @@ def main():
>      parser.add_argument("--skip-syscall-poll-events",
>                          help="Skip poll() syscalls with --syscall-events",
>                          action="store_true")
> +    parser.add_argument("--skip-upcall-stats",
> +                        help="Skip the collection of upcall statistics",
> +                        action="store_true")
>      parser.add_argument("--stack-trace-size",
>                          help="Number of unique stack traces that can be "
>                          "recorded, default 4096. 0 to disable",
> @@ -1298,6 +1420,9 @@ def main():
>      source = source.replace("<STACK_TRACE_ENABLED>", "true"
>                              if options.stack_trace_size > 0 else "false")
>
> +    source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0"
> +                            if options.skip_upcall_stats else "1")
> +
>      #
>      # Handle start/stop probes
>      #
> diff --git a/utilities/usdt-scripts/kernel_delay.rst b/utilities/usdt-scripts/kernel_delay.rst
> index 0f6f916a7..41620d760 100644
> --- a/utilities/usdt-scripts/kernel_delay.rst
> +++ b/utilities/usdt-scripts/kernel_delay.rst
> @@ -109,6 +109,7 @@ followed by resource-specific data. Which are:
>  - ``THREAD STOPPED STATISTICS``
>  - ``HARD IRQ STATISTICS``
>  - ``SOFT IRQ STATISTICS``
> +- ``UPCALL STATISTICS``
>
>  The following sections will describe in detail what statistics they report.
>
> @@ -187,6 +188,14 @@ number of interrupts (``COUNT``), the total time spent in the interrupt
>  handler (``TOTAL ns``), and the worst-case duration (``MAX ns``).
>
>
> +``UPCALL STATISTICS``
> +~~~~~~~~~~~~~~~~~~~~~
> +The ``UPCALL STATISTICS`` section provides information on the number of
> +upcalls sent by the kernel to userspace. If any upcalls fail to be sent,
> +the specific return codes are recorded. Statistics are presented both as
> +a total and on a per-CPU basis.
> +
> +
>  The ``--syscall-events`` option
>  -------------------------------
>  In addition to reporting global syscall statistics in ``SYSCALL_STATISTICS``,
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Aaron Conole Dec. 16, 2024, 2:21 p.m. UTC | #3
Eelco Chaudron <echaudro@redhat.com> writes:

> This patch installs a kernel return probe on ovs_dp_upcall() to record
> all successful and failed calls per CPU. The statistics are reported
> when the script exits, providing insights into the upcall behavior.
>
> This is an example output:
>
>   # UPCALL STATISTICS:
>     Total upcalls       : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...]
>     Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...]
>     Failed upcalls      : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
>        11, EAGAIN       : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
>
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> ---

Some nits below, but I don't think anything that would prevent it from
being accepted.

>  utilities/automake.mk                   |    1 
>  utilities/usdt-scripts/kernel_delay.py  |  127 +++++++++++++++++++++++++++++++
>  utilities/usdt-scripts/kernel_delay.rst |    9 ++
>  3 files changed, 136 insertions(+), 1 deletion(-)
>
> diff --git a/utilities/automake.mk b/utilities/automake.mk
> index acc1af4e0..22260b254 100644
> --- a/utilities/automake.mk
> +++ b/utilities/automake.mk
> @@ -149,6 +149,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \
>  	utilities/ovs-pipegen.py \
>  	utilities/usdt-scripts/dpif_op_nl_monitor.py \
>  	utilities/usdt-scripts/flow_reval_monitor.py \
> +	utilities/usdt-scripts/kernel_delay.py \

Should this be a separate patch so we can backport it?

>  	utilities/usdt-scripts/upcall_monitor.py \
>  	utilities/usdt-scripts/upcall_cost.py
>  
> diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py
> index de6b0c9de..a5aa1a3d4 100755
> --- a/utilities/usdt-scripts/kernel_delay.py
> +++ b/utilities/usdt-scripts/kernel_delay.py
> @@ -39,6 +39,7 @@
>  #
>  import argparse
>  import datetime
> +import errno
>  import os
>  import pytz
>  import psutil
> @@ -556,6 +557,30 @@ TRACEPOINT_PROBE(irq, softirq_exit)
>      data->start_ns = 0;
>      return 0;
>  }
> +
> +
> +/*
> + * For measuring upcall statistics (per CPU).
> + */
> +#define MAX_ERROR 255
> +BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 1);
> +
> +#if <INSTALL_OVS_DP_UPCALL_PROBE>
> +int kretprobe__ovs_dp_upcall(struct pt_regs *ctx)
> +{
> +    int ret = PT_REGS_RC(ctx);
> +
> +    if (ret > 0)
> +        ret = 0;
> +
> +    ret = -ret;
> +    if (ret > MAX_ERROR)
> +        ret = MAX_ERROR;
> +
> +    upcall_count.increment(ret);
> +    return 0;
> +}
> +#endif /* For measuring upcall statistics/errors. */
>  """
>  
>  
> @@ -900,6 +925,95 @@ def print_timestamp(msg):
>      print(time_string)
>  
>  
> +#
> +# Get errno short string
> +#
> +def get_errno_short(err):
> +    try:
> +        return errno.errorcode[err]
> +    except KeyError:
> +        return "_unknown_"
> +
> +
> +#
> +# Format a bfd per cpu array entry per cpu (if the count is > 0)
> +#
> +def format_per_cpu_array(cpu_array, key=None, skip_key=None):
> +    per_cpu = ""
> +
> +    if key is not None:
> +        total = cpu_array.sum(key).value
> +        if total > 0:
> +            for cpu, value in enumerate(cpu_array.getvalue(key)):
> +                if value == 0:
> +                    continue
> +
> +                per_cpu += " {}: {},".format(cpu, value)
> +    else:
> +        total = 0
> +        total_cpu = None
> +
> +        for key in cpu_array.keys():
> +            # I've seen extrem odd hight values, to avoid those, exclude any
> +            # index higher thant 1024 for now.
> +            if (not (0 <= key.value <= 1024)) or (skip_key is not None and
> +                                                  skip_key.value == key.value):
> +                continue
> +
> +            if total_cpu is None:
> +                total_cpu = [0] * len(cpu_array.getvalue(key))
> +
> +            for cpu, value in enumerate(cpu_array.getvalue(key)):
> +                total_cpu[cpu] += value
> +                total += value
> +
> +        if total >= 0:
> +            for cpu, value in enumerate(total_cpu):
> +                if value == 0:
> +                    continue
> +
> +                per_cpu += " {}: {},".format(cpu, value)
> +
> +    return str(total), per_cpu.strip(", ")
> +
> +
> +#
> +# Display kernel upcall statistics
> +#
> +def display_upcall_results():
> +    upcalls = bpf.get_table("upcall_count")
> +    have_upcalls = False
> +
> +    for k in upcalls.keys():
> +        if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0:
> +            continue
> +        have_upcalls = True
> +        break
> +
> +    if not have_upcalls:
> +        return
> +
> +    print("\n\n# UPCALL STATISTICS:\n  Total upcalls       : {} [{}]".format(
> +        *format_per_cpu_array(upcalls)))
> +
> +    for k in upcalls.keys():
> +        error = k.value
> +        total, per_cpu = format_per_cpu_array(upcalls, key=k)
> +
> +        if error != 0 and total == "0":
> +            continue
> +
> +        if error == 0:
> +            print("  Successfull upcalls : {} [{}]".format(total, per_cpu))
> +
> +            total, per_cpu = format_per_cpu_array(upcalls, skip_key=k)
> +            print("  Failed upcalls      : {} [{}]".format(total, per_cpu))
> +        else:
> +            print("    {:3}, {:13}: {} [{}]".format(error,
> +                                                    get_errno_short(error),
> +                                                    total, per_cpu))
> +
> +
>  #
>  # process_results()
>  #
> @@ -1074,7 +1188,12 @@ def process_results(syscall_events=None, trigger_delta=None):
>                  indent, "TOTAL:", "", total_count, total_ns))
>  
>      #
> -    # Print events
> +    # Print upcall statistics
> +    #
> +    display_upcall_results()
> +
> +    #
> +    # Print syscall events

This text changed from the original.  It's okay to not separate it here
I think, but that's why the diff looks the way it does.

>      #
>      lost_stack_traces = 0
>      if syscall_events:
> @@ -1194,6 +1313,9 @@ def main():
>      parser.add_argument("--skip-syscall-poll-events",
>                          help="Skip poll() syscalls with --syscall-events",
>                          action="store_true")
> +    parser.add_argument("--skip-upcall-stats",
> +                        help="Skip the collection of upcall statistics",
> +                        action="store_true")
>      parser.add_argument("--stack-trace-size",
>                          help="Number of unique stack traces that can be "
>                          "recorded, default 4096. 0 to disable",
> @@ -1298,6 +1420,9 @@ def main():
>      source = source.replace("<STACK_TRACE_ENABLED>", "true"
>                              if options.stack_trace_size > 0 else "false")
>  
> +    source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0"
> +                            if options.skip_upcall_stats else "1")
> +
>      #
>      # Handle start/stop probes
>      #
> diff --git a/utilities/usdt-scripts/kernel_delay.rst b/utilities/usdt-scripts/kernel_delay.rst
> index 0f6f916a7..41620d760 100644
> --- a/utilities/usdt-scripts/kernel_delay.rst
> +++ b/utilities/usdt-scripts/kernel_delay.rst
> @@ -109,6 +109,7 @@ followed by resource-specific data. Which are:
>  - ``THREAD STOPPED STATISTICS``
>  - ``HARD IRQ STATISTICS``
>  - ``SOFT IRQ STATISTICS``
> +- ``UPCALL STATISTICS``
>  
>  The following sections will describe in detail what statistics they report.
>  
> @@ -187,6 +188,14 @@ number of interrupts (``COUNT``), the total time spent in the interrupt
>  handler (``TOTAL ns``), and the worst-case duration (``MAX ns``).
>  
>  
> +``UPCALL STATISTICS``
> +~~~~~~~~~~~~~~~~~~~~~
> +The ``UPCALL STATISTICS`` section provides information on the number of
> +upcalls sent by the kernel to userspace. If any upcalls fail to be sent,
> +the specific return codes are recorded. Statistics are presented both as
> +a total and on a per-CPU basis.
> +
> +
>  The ``--syscall-events`` option
>  -------------------------------
>  In addition to reporting global syscall statistics in ``SYSCALL_STATISTICS``,
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Eelco Chaudron Dec. 17, 2024, 2:30 p.m. UTC | #4
On 16 Dec 2024, at 15:21, Aaron Conole wrote:

> Eelco Chaudron <echaudro@redhat.com> writes:
>
>> This patch installs a kernel return probe on ovs_dp_upcall() to record
>> all successful and failed calls per CPU. The statistics are reported
>> when the script exits, providing insights into the upcall behavior.
>>
>> This is an example output:
>>
>>   # UPCALL STATISTICS:
>>     Total upcalls       : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...]
>>     Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...]
>>     Failed upcalls      : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
>>        11, EAGAIN       : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
>>
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>> ---
>
> Some nits below, but I don't think anything that would prevent it from
> being accepted.
>
>>  utilities/automake.mk                   |    1
>>  utilities/usdt-scripts/kernel_delay.py  |  127 +++++++++++++++++++++++++++++++
>>  utilities/usdt-scripts/kernel_delay.rst |    9 ++
>>  3 files changed, 136 insertions(+), 1 deletion(-)
>>
>> diff --git a/utilities/automake.mk b/utilities/automake.mk
>> index acc1af4e0..22260b254 100644
>> --- a/utilities/automake.mk
>> +++ b/utilities/automake.mk
>> @@ -149,6 +149,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \
>>  	utilities/ovs-pipegen.py \
>>  	utilities/usdt-scripts/dpif_op_nl_monitor.py \
>>  	utilities/usdt-scripts/flow_reval_monitor.py \
>> +	utilities/usdt-scripts/kernel_delay.py \
>
> Should this be a separate patch so we can backport it?

I sneaked this in, as I don’t see this as a must for backporting. Changes to the script will probably only happen on the main branch, and if backported, they should have passed flake8 :)

>>  	utilities/usdt-scripts/upcall_monitor.py \
>>  	utilities/usdt-scripts/upcall_cost.py
>>
>> diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py
>> index de6b0c9de..a5aa1a3d4 100755
>> --- a/utilities/usdt-scripts/kernel_delay.py
>> +++ b/utilities/usdt-scripts/kernel_delay.py
>> @@ -39,6 +39,7 @@
>>  #
>>  import argparse
>>  import datetime
>> +import errno
>>  import os
>>  import pytz
>>  import psutil
>> @@ -556,6 +557,30 @@ TRACEPOINT_PROBE(irq, softirq_exit)
>>      data->start_ns = 0;
>>      return 0;
>>  }
>> +
>> +
>> +/*
>> + * For measuring upcall statistics (per CPU).
>> + */
>> +#define MAX_ERROR 255
>> +BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 1);
>> +
>> +#if <INSTALL_OVS_DP_UPCALL_PROBE>
>> +int kretprobe__ovs_dp_upcall(struct pt_regs *ctx)
>> +{
>> +    int ret = PT_REGS_RC(ctx);
>> +
>> +    if (ret > 0)
>> +        ret = 0;
>> +
>> +    ret = -ret;
>> +    if (ret > MAX_ERROR)
>> +        ret = MAX_ERROR;
>> +
>> +    upcall_count.increment(ret);
>> +    return 0;
>> +}
>> +#endif /* For measuring upcall statistics/errors. */
>>  """
>>
>>
>> @@ -900,6 +925,95 @@ def print_timestamp(msg):
>>      print(time_string)
>>
>>
>> +#
>> +# Get errno short string
>> +#
>> +def get_errno_short(err):
>> +    try:
>> +        return errno.errorcode[err]
>> +    except KeyError:
>> +        return "_unknown_"
>> +
>> +
>> +#
>> +# Format a bfd per cpu array entry per cpu (if the count is > 0)
>> +#
>> +def format_per_cpu_array(cpu_array, key=None, skip_key=None):
>> +    per_cpu = ""
>> +
>> +    if key is not None:
>> +        total = cpu_array.sum(key).value
>> +        if total > 0:
>> +            for cpu, value in enumerate(cpu_array.getvalue(key)):
>> +                if value == 0:
>> +                    continue
>> +
>> +                per_cpu += " {}: {},".format(cpu, value)
>> +    else:
>> +        total = 0
>> +        total_cpu = None
>> +
>> +        for key in cpu_array.keys():
>> +            # I've seen extrem odd hight values, to avoid those, exclude any
>> +            # index higher thant 1024 for now.
>> +            if (not (0 <= key.value <= 1024)) or (skip_key is not None and
>> +                                                  skip_key.value == key.value):
>> +                continue
>> +
>> +            if total_cpu is None:
>> +                total_cpu = [0] * len(cpu_array.getvalue(key))
>> +
>> +            for cpu, value in enumerate(cpu_array.getvalue(key)):
>> +                total_cpu[cpu] += value
>> +                total += value
>> +
>> +        if total >= 0:
>> +            for cpu, value in enumerate(total_cpu):
>> +                if value == 0:
>> +                    continue
>> +
>> +                per_cpu += " {}: {},".format(cpu, value)
>> +
>> +    return str(total), per_cpu.strip(", ")
>> +
>> +
>> +#
>> +# Display kernel upcall statistics
>> +#
>> +def display_upcall_results():
>> +    upcalls = bpf.get_table("upcall_count")
>> +    have_upcalls = False
>> +
>> +    for k in upcalls.keys():
>> +        if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0:
>> +            continue
>> +        have_upcalls = True
>> +        break
>> +
>> +    if not have_upcalls:
>> +        return
>> +
>> +    print("\n\n# UPCALL STATISTICS:\n  Total upcalls       : {} [{}]".format(
>> +        *format_per_cpu_array(upcalls)))
>> +
>> +    for k in upcalls.keys():
>> +        error = k.value
>> +        total, per_cpu = format_per_cpu_array(upcalls, key=k)
>> +
>> +        if error != 0 and total == "0":
>> +            continue
>> +
>> +        if error == 0:
>> +            print("  Successfull upcalls : {} [{}]".format(total, per_cpu))
>> +
>> +            total, per_cpu = format_per_cpu_array(upcalls, skip_key=k)
>> +            print("  Failed upcalls      : {} [{}]".format(total, per_cpu))
>> +        else:
>> +            print("    {:3}, {:13}: {} [{}]".format(error,
>> +                                                    get_errno_short(error),
>> +                                                    total, per_cpu))
>> +
>> +
>>  #
>>  # process_results()
>>  #
>> @@ -1074,7 +1188,12 @@ def process_results(syscall_events=None, trigger_delta=None):
>>                  indent, "TOTAL:", "", total_count, total_ns))
>>
>>      #
>> -    # Print events
>> +    # Print upcall statistics
>> +    #
>> +    display_upcall_results()
>> +
>> +    #
>> +    # Print syscall events
>
> This text changed from the original.  It's okay to not separate it here
> I think, but that's why the diff looks the way it does.

Yes this looked odd, could not convince diff to show if differently.

>>      #
>>      lost_stack_traces = 0
>>      if syscall_events:
>> @@ -1194,6 +1313,9 @@ def main():
>>      parser.add_argument("--skip-syscall-poll-events",
>>                          help="Skip poll() syscalls with --syscall-events",
>>                          action="store_true")
>> +    parser.add_argument("--skip-upcall-stats",
>> +                        help="Skip the collection of upcall statistics",
>> +                        action="store_true")
>>      parser.add_argument("--stack-trace-size",
>>                          help="Number of unique stack traces that can be "
>>                          "recorded, default 4096. 0 to disable",
>> @@ -1298,6 +1420,9 @@ def main():
>>      source = source.replace("<STACK_TRACE_ENABLED>", "true"
>>                              if options.stack_trace_size > 0 else "false")
>>
>> +    source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0"
>> +                            if options.skip_upcall_stats else "1")
>> +
>>      #
>>      # Handle start/stop probes
>>      #
>> diff --git a/utilities/usdt-scripts/kernel_delay.rst b/utilities/usdt-scripts/kernel_delay.rst
>> index 0f6f916a7..41620d760 100644
>> --- a/utilities/usdt-scripts/kernel_delay.rst
>> +++ b/utilities/usdt-scripts/kernel_delay.rst
>> @@ -109,6 +109,7 @@ followed by resource-specific data. Which are:
>>  - ``THREAD STOPPED STATISTICS``
>>  - ``HARD IRQ STATISTICS``
>>  - ``SOFT IRQ STATISTICS``
>> +- ``UPCALL STATISTICS``
>>
>>  The following sections will describe in detail what statistics they report.
>>
>> @@ -187,6 +188,14 @@ number of interrupts (``COUNT``), the total time spent in the interrupt
>>  handler (``TOTAL ns``), and the worst-case duration (``MAX ns``).
>>
>>
>> +``UPCALL STATISTICS``
>> +~~~~~~~~~~~~~~~~~~~~~
>> +The ``UPCALL STATISTICS`` section provides information on the number of
>> +upcalls sent by the kernel to userspace. If any upcalls fail to be sent,
>> +the specific return codes are recorded. Statistics are presented both as
>> +a total and on a per-CPU basis.
>> +
>> +
>>  The ``--syscall-events`` option
>>  -------------------------------
>>  In addition to reporting global syscall statistics in ``SYSCALL_STATISTICS``,
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Eelco Chaudron Dec. 17, 2024, 2:30 p.m. UTC | #5
On 13 Dec 2024, at 13:56, Adrián Moreno wrote:

> On Mon, Nov 25, 2024 at 03:14:20PM +0100, Eelco Chaudron wrote:
>> This patch installs a kernel return probe on ovs_dp_upcall() to record
>> all successful and failed calls per CPU. The statistics are reported
>> when the script exits, providing insights into the upcall behavior.
>>
>> This is an example output:
>>
>>   # UPCALL STATISTICS:
>>     Total upcalls       : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...]
>>     Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...]
>>     Failed upcalls      : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
>>        11, EAGAIN       : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
>>
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>
> Thanks for this patch. Correlating upcall failures to kernel delays is
> extremely useful. Some comments below.

Thanks Adrian for the review. I’ll send out a v2 in a minute, but see my responses below.

//Eelco

>> ---
>>  utilities/automake.mk                   |    1
>>  utilities/usdt-scripts/kernel_delay.py  |  127 +++++++++++++++++++++++++++++++
>>  utilities/usdt-scripts/kernel_delay.rst |    9 ++
>>  3 files changed, 136 insertions(+), 1 deletion(-)
>>
>> diff --git a/utilities/automake.mk b/utilities/automake.mk
>> index acc1af4e0..22260b254 100644
>> --- a/utilities/automake.mk
>> +++ b/utilities/automake.mk
>> @@ -149,6 +149,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \
>>  	utilities/ovs-pipegen.py \
>>  	utilities/usdt-scripts/dpif_op_nl_monitor.py \
>>  	utilities/usdt-scripts/flow_reval_monitor.py \
>> +	utilities/usdt-scripts/kernel_delay.py \
>>  	utilities/usdt-scripts/upcall_monitor.py \
>>  	utilities/usdt-scripts/upcall_cost.py
>>
>> diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py
>> index de6b0c9de..a5aa1a3d4 100755
>> --- a/utilities/usdt-scripts/kernel_delay.py
>> +++ b/utilities/usdt-scripts/kernel_delay.py
>> @@ -39,6 +39,7 @@
>>  #
>>  import argparse
>>  import datetime
>> +import errno
>>  import os
>>  import pytz
>>  import psutil
>> @@ -556,6 +557,30 @@ TRACEPOINT_PROBE(irq, softirq_exit)
>>      data->start_ns = 0;
>>      return 0;
>>  }
>> +
>> +
>> +/*
>> + * For measuring upcall statistics (per CPU).
>> + */
>> +#define MAX_ERROR 255
>
> Why this limit? The kernel seems to use 4095 in the PTR_ERR
> infrastructure. Although not exposed to user applications,
> kernel has some > 255 values. They don't seem networking related for
> now but still.
>
> In order to reduce this arbitrary limit and avoid needless iteration
> over an almost empty array, I'd make it a percpu_hash. It supports
> "lookup_or_try_init" and "increment" so it's as easy to use from ebpf
> and almost identically from python.

Good point, I will use a per-CPU hash table instead.

>> +BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 1);
>> +
>> +#if <INSTALL_OVS_DP_UPCALL_PROBE>
>> +int kretprobe__ovs_dp_upcall(struct pt_regs *ctx)
>> +{
>
> When trying the script I was a bit confused that numbers didn't add up,
> e.g:
>
> # UPCALL STATISTICS:
>   Total upcalls       : 776234 [0: 204345, 1: 38773, 2: 38206, 3:
> 38700, 4: 104040, 5: 77400, 6: 70290, 8: 69705, 9: 67410, 10: 14220,
> 11: 53145]
>   Successfull upcalls : 268640 [0: 72829, 1: 14556, 2: 12876, 3:
> 13192, 4: 34227, 5: 26480, 6: 24130, 8: 23791, 9: 23372, 10: 6164, 11:
> 17023]
>   Failed upcalls      : 508564 [0: 131516, 1: 25187, 2: 25330, 3:
> 25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11:
> 36122]
>      11, EAGAIN       : 509485 [0: 131516, 1: 26112, 2: 25330, 3:
> 25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11:
> 36122]
>
> Total of EAGAIN erros != total Failed and
> Total Failed + Total Successfull != Total.
>
> I believe the reason is that events of this kind are still being
> triggered while results are being processed. The problem seems to go
> away if I add the following lines at the top of this probe:
>
> if (!capture_enabled__())
>     return 0;

ACK, I will fix this, including resetting the values between runs…

>> +    int ret = PT_REGS_RC(ctx);
>> +
>> +    if (ret > 0)
>> +        ret = 0;
>> +
>> +    ret = -ret;
>> +    if (ret > MAX_ERROR)
>> +        ret = MAX_ERROR;
>> +
>> +    upcall_count.increment(ret);
>> +    return 0;
>> +}
>> +#endif /* For measuring upcall statistics/errors. */
>>  """
>>
>>
>> @@ -900,6 +925,95 @@ def print_timestamp(msg):
>>      print(time_string)
>>
>>
>> +#
>> +# Get errno short string
>> +#
>> +def get_errno_short(err):
>> +    try:
>> +        return errno.errorcode[err]
>> +    except KeyError:
>> +        return "_unknown_"
>> +
>> +
>> +#
>> +# Format a bfd per cpu array entry per cpu (if the count is > 0)
>> +#
>
> s/bfd/bpf?
> Also, "per cpu" twice makes the sentense a bit difficult to read.

Ack

>
>> +def format_per_cpu_array(cpu_array, key=None, skip_key=None):
>> +    per_cpu = ""
>> +
>> +    if key is not None:
>
> "is not None" is redundant: "if key:" is equivalent.

This and the one below are not the same for my code; ‘if key:’ behaves incorrectly:

# UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]):
  Total upcalls       : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...]
  Successfull upcalls : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...]
  Failed upcalls      : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...]
     11, EAGAIN       : 59526 [0: 7701, 2: 7580, 4: 4443, 6: 2745, 8: 3013,...]

vs ‘if key is not None’:

# UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]):
  Total upcalls       : 209023 [0: 16271, 2: 14830, 4: 11787, 6: 10025, 8: 4875,...]
  Successfull upcalls : 127677 [0: 6342, 2: 6250, 4: 5547, 6: 5464, 8: 4875,...]
  Failed upcalls      : 81346 [0: 9929, 2: 8580, 4: 6240, 6: 4561, 10: 4602,...]
     11, EAGAIN       : 81346 [0: 9929, 2: 8580, 4: 6240, 6: 4561, 10: 4602,...]


>> +        total = cpu_array.sum(key).value
>> +        if total > 0:
>> +            for cpu, value in enumerate(cpu_array.getvalue(key)):
>> +                if value == 0:
>> +                    continue
>> +
>> +                per_cpu += " {}: {},".format(cpu, value)
>> +    else:
>> +        total = 0
>> +        total_cpu = None
>> +
>> +        for key in cpu_array.keys():
>
> See below.
>
>> +            # I've seen extrem odd hight values, to avoid those, exclude any
>
> typo? "extremely odd high values"?
>
>> +            # index higher thant 1024 for now.
>
> What values you were getting? I'm not seing any weird value.
> Besides, isn't the maximum set in the ebpf side to 255?
> Plus, it's an array so its size is limitted and no key can exceed 256
> right?

This was a while back, I cant remember if it was a specific kernel or BCC version.
I would get indexes that look like integer overlows. However with the hash implementation this is not longer requered.

>> +            if (not (0 <= key.value <= 1024)) or (skip_key is not None and
>> +                                                  skip_key.value == key.value):
>
> ditto: key.value in range(0, 1024)
> s/skip_key is not None/skip_key/?
>
> OTOH, from include/linux/err.h:
> #define MAX_ERRNO	4095
>
> Maybe that value makes more sense although I really don't think we need
> a check here.

See above
>> +                continue
>
>> +
>> +            if total_cpu is None:
>
> if not total_cpu
>> +                total_cpu = [0] * len(cpu_array.getvalue(key))
>> +
>> +            for cpu, value in enumerate(cpu_array.getvalue(key)):
>> +                total_cpu[cpu] += value
>> +                total += value
>> +
>> +        if total >= 0:
>> +            for cpu, value in enumerate(total_cpu):
>> +                if value == 0:
>> +                    continue
>> +
>> +                per_cpu += " {}: {},".format(cpu, value)
>> +
>> +    return str(total), per_cpu.strip(", ")
>> +
>> +
>> +#
>> +# Display kernel upcall statistics
>> +#
>> +def display_upcall_results():
>> +    upcalls = bpf.get_table("upcall_count")
>
> From bcc's reference: get_table()
> Returns a table object. This is no longer used, as tables can now be
> read as items from BPF. Eg: ```BPF[name]```.
>
> Not sure if this means the API will be deprecated...

Changed it, let not wait for the deprecation notice ;)

>> +    have_upcalls = False
>> +
>> +    for k in upcalls.keys():
>
> Interestingly, "keys()" is not documented as an API. Nevertheless, it's
> not necessary, since "table" iterates over keys (same as list or dict in
> python). So it could be written as "for k in upcalls:".

Good catch; changed here and below.

>> +        if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0:
>
> I think the pythonic way is "if k.value not in range(0, 255). Still how
> would it be possible to read keys greater than 255 if it's being limited
> at the ebpf side?

See above.

>> +            continue
>> +        have_upcalls = True
>> +        break
>> +
>> +    if not have_upcalls:
>> +        return
>> +
>> +    print("\n\n# UPCALL STATISTICS:\n  Total upcalls       : {} [{}]".format(
>> +        *format_per_cpu_array(upcalls)))
>> +
>
> When looking at theese statistics, users may take a while to figure out
> what each number means, e.g:
>
> # UPCALL STATISTICS:
>   Total upcalls       : 240933 [0: 33165, 2: 27495, 3: 18270, 4:
> 15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345]
>   Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5:
> 11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475]
>   Failed upcalls      : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
> 5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870]
>      11, EAGAIN       : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
> 5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387
>
> An option could be to describe the format in the header, e.g:
>
> # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...])
>   Total upcalls       : 240933 [0: 33165, 2: 27495, 3: 18270, 4:
> 15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345]
>   Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5:
> 11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475]
>   Failed upcalls      : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
> 5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870]
>      11, EAGAIN       : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
> 5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387

I went for the above, as the CPU set might not be the same in all lines, and I would like to condense this as much as possible (I was testing on 64 CPUs ;).

> Another option could be to print them in a table:
>
>      Result       TOTAL    CPU 0   CPU 2   CPU 3    CPU 4   CPU 5
> CPU 6   CPU 8   CPU 9    CPU 10
>  -------------- --------- ------- ------- -------- ------- -------
> ------- ------- -------- --------
>   TOTAL           240933   33165   27495    18270   15300   33885
> 4050   51348    33075    24345
>    Successful     80538    11804    8382     5660    5716   11606
> 2845   13717    10333    10475
>    Failed        161587    21361   19113    12610    9584   22279
> 2397   37631    22742    13870
>     11, EAGAIN   162528    21361   19113    12610    9584   22279
> 3345   37631    22742     1387
>
>
>
>
>> +    for k in upcalls.keys():
>
> nit: Same as above, "for k in upcalls:"

Ack

>> +        error = k.value
>> +        total, per_cpu = format_per_cpu_array(upcalls, key=k)
>> +
>> +        if error != 0 and total == "0":
>
> Maybe return the total as integer to avoid this strange looking '== "0"'?

Ack, yes was too much C-minded here always returning a string…

>> +            continue
>> +
>> +        if error == 0:
>> +            print("  Successfull upcalls : {} [{}]".format(total, per_cpu))
>> +
>> +            total, per_cpu = format_per_cpu_array(upcalls, skip_key=k)
>> +            print("  Failed upcalls      : {} [{}]".format(total, per_cpu))
>> +        else:
>> +            print("    {:3}, {:13}: {} [{}]".format(error,
>> +                                                    get_errno_short(error),
>> +                                                    total, per_cpu))
>> +
>> +
>>  #
>>  # process_results()
>>  #
>> @@ -1074,7 +1188,12 @@ def process_results(syscall_events=None, trigger_delta=None):
>>                  indent, "TOTAL:", "", total_count, total_ns))
>>
>>      #
>> -    # Print events
>> +    # Print upcall statistics
>> +    #
>> +    display_upcall_results()
>> +
>> +    #
>> +    # Print syscall events
>>      #
>>      lost_stack_traces = 0
>>      if syscall_events:
>> @@ -1194,6 +1313,9 @@ def main():
>>      parser.add_argument("--skip-syscall-poll-events",
>>                          help="Skip poll() syscalls with --syscall-events",
>>                          action="store_true")
>> +    parser.add_argument("--skip-upcall-stats",
>> +                        help="Skip the collection of upcall statistics",
>> +                        action="store_true")
>>      parser.add_argument("--stack-trace-size",
>>                          help="Number of unique stack traces that can be "
>>                          "recorded, default 4096. 0 to disable",
>> @@ -1298,6 +1420,9 @@ def main():
>>      source = source.replace("<STACK_TRACE_ENABLED>", "true"
>>                              if options.stack_trace_size > 0 else "false")
>>
>> +    source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0"
>> +                            if options.skip_upcall_stats else "1")
>> +
>>      #
>>      # Handle start/stop probes
>>      #
>> diff --git a/utilities/usdt-scripts/kernel_delay.rst b/utilities/usdt-scripts/kernel_delay.rst
>> index 0f6f916a7..41620d760 100644
>> --- a/utilities/usdt-scripts/kernel_delay.rst
>> +++ b/utilities/usdt-scripts/kernel_delay.rst
>> @@ -109,6 +109,7 @@ followed by resource-specific data. Which are:
>>  - ``THREAD STOPPED STATISTICS``
>>  - ``HARD IRQ STATISTICS``
>>  - ``SOFT IRQ STATISTICS``
>> +- ``UPCALL STATISTICS``
>>
>>  The following sections will describe in detail what statistics they report.
>>
>> @@ -187,6 +188,14 @@ number of interrupts (``COUNT``), the total time spent in the interrupt
>>  handler (``TOTAL ns``), and the worst-case duration (``MAX ns``).
>>
>>
>> +``UPCALL STATISTICS``
>> +~~~~~~~~~~~~~~~~~~~~~
>> +The ``UPCALL STATISTICS`` section provides information on the number of
>> +upcalls sent by the kernel to userspace. If any upcalls fail to be sent,
>> +the specific return codes are recorded. Statistics are presented both as
>> +a total and on a per-CPU basis.
>> +
>> +
>>  The ``--syscall-events`` option
>>  -------------------------------
>>  In addition to reporting global syscall statistics in ``SYSCALL_STATISTICS``,
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
Adrián Moreno Dec. 17, 2024, 9:22 p.m. UTC | #6
On Tue, Dec 17, 2024 at 03:30:51PM +0100, Eelco Chaudron wrote:
>
>
> On 13 Dec 2024, at 13:56, Adrián Moreno wrote:
>
> > On Mon, Nov 25, 2024 at 03:14:20PM +0100, Eelco Chaudron wrote:
> >> This patch installs a kernel return probe on ovs_dp_upcall() to record
> >> all successful and failed calls per CPU. The statistics are reported
> >> when the script exits, providing insights into the upcall behavior.
> >>
> >> This is an example output:
> >>
> >>   # UPCALL STATISTICS:
> >>     Total upcalls       : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...]
> >>     Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...]
> >>     Failed upcalls      : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
> >>        11, EAGAIN       : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
> >>
> >> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> >
> > Thanks for this patch. Correlating upcall failures to kernel delays is
> > extremely useful. Some comments below.
>
> Thanks Adrian for the review. I’ll send out a v2 in a minute, but see my responses below.
>
> //Eelco
>
> >> ---
> >>  utilities/automake.mk                   |    1
> >>  utilities/usdt-scripts/kernel_delay.py  |  127 +++++++++++++++++++++++++++++++
> >>  utilities/usdt-scripts/kernel_delay.rst |    9 ++
> >>  3 files changed, 136 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/utilities/automake.mk b/utilities/automake.mk
> >> index acc1af4e0..22260b254 100644
> >> --- a/utilities/automake.mk
> >> +++ b/utilities/automake.mk
> >> @@ -149,6 +149,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \
> >>  	utilities/ovs-pipegen.py \
> >>  	utilities/usdt-scripts/dpif_op_nl_monitor.py \
> >>  	utilities/usdt-scripts/flow_reval_monitor.py \
> >> +	utilities/usdt-scripts/kernel_delay.py \
> >>  	utilities/usdt-scripts/upcall_monitor.py \
> >>  	utilities/usdt-scripts/upcall_cost.py
> >>
> >> diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py
> >> index de6b0c9de..a5aa1a3d4 100755
> >> --- a/utilities/usdt-scripts/kernel_delay.py
> >> +++ b/utilities/usdt-scripts/kernel_delay.py
> >> @@ -39,6 +39,7 @@
> >>  #
> >>  import argparse
> >>  import datetime
> >> +import errno
> >>  import os
> >>  import pytz
> >>  import psutil
> >> @@ -556,6 +557,30 @@ TRACEPOINT_PROBE(irq, softirq_exit)
> >>      data->start_ns = 0;
> >>      return 0;
> >>  }
> >> +
> >> +
> >> +/*
> >> + * For measuring upcall statistics (per CPU).
> >> + */
> >> +#define MAX_ERROR 255
> >
> > Why this limit? The kernel seems to use 4095 in the PTR_ERR
> > infrastructure. Although not exposed to user applications,
> > kernel has some > 255 values. They don't seem networking related for
> > now but still.
> >
> > In order to reduce this arbitrary limit and avoid needless iteration
> > over an almost empty array, I'd make it a percpu_hash. It supports
> > "lookup_or_try_init" and "increment" so it's as easy to use from ebpf
> > and almost identically from python.
>
> Good point, I will use a per-CPU hash table instead.
>
> >> +BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 1);
> >> +
> >> +#if <INSTALL_OVS_DP_UPCALL_PROBE>
> >> +int kretprobe__ovs_dp_upcall(struct pt_regs *ctx)
> >> +{
> >
> > When trying the script I was a bit confused that numbers didn't add up,
> > e.g:
> >
> > # UPCALL STATISTICS:
> >   Total upcalls       : 776234 [0: 204345, 1: 38773, 2: 38206, 3:
> > 38700, 4: 104040, 5: 77400, 6: 70290, 8: 69705, 9: 67410, 10: 14220,
> > 11: 53145]
> >   Successfull upcalls : 268640 [0: 72829, 1: 14556, 2: 12876, 3:
> > 13192, 4: 34227, 5: 26480, 6: 24130, 8: 23791, 9: 23372, 10: 6164, 11:
> > 17023]
> >   Failed upcalls      : 508564 [0: 131516, 1: 25187, 2: 25330, 3:
> > 25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11:
> > 36122]
> >      11, EAGAIN       : 509485 [0: 131516, 1: 26112, 2: 25330, 3:
> > 25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11:
> > 36122]
> >
> > Total of EAGAIN erros != total Failed and
> > Total Failed + Total Successfull != Total.
> >
> > I believe the reason is that events of this kind are still being
> > triggered while results are being processed. The problem seems to go
> > away if I add the following lines at the top of this probe:
> >
> > if (!capture_enabled__())
> >     return 0;
>
> ACK, I will fix this, including resetting the values between runs…
>
> >> +    int ret = PT_REGS_RC(ctx);
> >> +
> >> +    if (ret > 0)
> >> +        ret = 0;
> >> +
> >> +    ret = -ret;
> >> +    if (ret > MAX_ERROR)
> >> +        ret = MAX_ERROR;
> >> +
> >> +    upcall_count.increment(ret);
> >> +    return 0;
> >> +}
> >> +#endif /* For measuring upcall statistics/errors. */
> >>  """
> >>
> >>
> >> @@ -900,6 +925,95 @@ def print_timestamp(msg):
> >>      print(time_string)
> >>
> >>
> >> +#
> >> +# Get errno short string
> >> +#
> >> +def get_errno_short(err):
> >> +    try:
> >> +        return errno.errorcode[err]
> >> +    except KeyError:
> >> +        return "_unknown_"
> >> +
> >> +
> >> +#
> >> +# Format a bfd per cpu array entry per cpu (if the count is > 0)
> >> +#
> >
> > s/bfd/bpf?
> > Also, "per cpu" twice makes the sentense a bit difficult to read.
>
> Ack
>
> >
> >> +def format_per_cpu_array(cpu_array, key=None, skip_key=None):
> >> +    per_cpu = ""
> >> +
> >> +    if key is not None:
> >
> > "is not None" is redundant: "if key:" is equivalent.
>
> This and the one below are not the same for my code; ‘if key:’ behaves incorrectly:
>

Ugh, you're right. If key is zero the check fails.

> # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]):
>   Total upcalls       : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...]
>   Successfull upcalls : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...]
>   Failed upcalls      : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...]
>      11, EAGAIN       : 59526 [0: 7701, 2: 7580, 4: 4443, 6: 2745, 8: 3013,...]
>
> vs ‘if key is not None’:
>
> # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]):
>   Total upcalls       : 209023 [0: 16271, 2: 14830, 4: 11787, 6: 10025, 8: 4875,...]
>   Successfull upcalls : 127677 [0: 6342, 2: 6250, 4: 5547, 6: 5464, 8: 4875,...]
>   Failed upcalls      : 81346 [0: 9929, 2: 8580, 4: 6240, 6: 4561, 10: 4602,...]
>      11, EAGAIN       : 81346 [0: 9929, 2: 8580, 4: 6240, 6: 4561, 10: 4602,...]
>
>
> >> +        total = cpu_array.sum(key).value
> >> +        if total > 0:
> >> +            for cpu, value in enumerate(cpu_array.getvalue(key)):
> >> +                if value == 0:
> >> +                    continue
> >> +
> >> +                per_cpu += " {}: {},".format(cpu, value)
> >> +    else:
> >> +        total = 0
> >> +        total_cpu = None
> >> +
> >> +        for key in cpu_array.keys():
> >
> > See below.
> >
> >> +            # I've seen extrem odd hight values, to avoid those, exclude any
> >
> > typo? "extremely odd high values"?
> >
> >> +            # index higher thant 1024 for now.
> >
> > What values you were getting? I'm not seing any weird value.
> > Besides, isn't the maximum set in the ebpf side to 255?
> > Plus, it's an array so its size is limitted and no key can exceed 256
> > right?
>
> This was a while back, I cant remember if it was a specific kernel or BCC version.
> I would get indexes that look like integer overlows. However with the hash implementation this is not longer requered.
>
> >> +            if (not (0 <= key.value <= 1024)) or (skip_key is not None and
> >> +                                                  skip_key.value == key.value):
> >
> > ditto: key.value in range(0, 1024)
> > s/skip_key is not None/skip_key/?
> >
> > OTOH, from include/linux/err.h:
> > #define MAX_ERRNO	4095
> >
> > Maybe that value makes more sense although I really don't think we need
> > a check here.
>
> See above
> >> +                continue
> >
> >> +
> >> +            if total_cpu is None:
> >
> > if not total_cpu
> >> +                total_cpu = [0] * len(cpu_array.getvalue(key))
> >> +
> >> +            for cpu, value in enumerate(cpu_array.getvalue(key)):
> >> +                total_cpu[cpu] += value
> >> +                total += value
> >> +
> >> +        if total >= 0:
> >> +            for cpu, value in enumerate(total_cpu):
> >> +                if value == 0:
> >> +                    continue
> >> +
> >> +                per_cpu += " {}: {},".format(cpu, value)
> >> +
> >> +    return str(total), per_cpu.strip(", ")
> >> +
> >> +
> >> +#
> >> +# Display kernel upcall statistics
> >> +#
> >> +def display_upcall_results():
> >> +    upcalls = bpf.get_table("upcall_count")
> >
> > From bcc's reference: get_table()
> > Returns a table object. This is no longer used, as tables can now be
> > read as items from BPF. Eg: ```BPF[name]```.
> >
> > Not sure if this means the API will be deprecated...
>
> Changed it, let not wait for the deprecation notice ;)
>
> >> +    have_upcalls = False
> >> +
> >> +    for k in upcalls.keys():
> >
> > Interestingly, "keys()" is not documented as an API. Nevertheless, it's
> > not necessary, since "table" iterates over keys (same as list or dict in
> > python). So it could be written as "for k in upcalls:".
>
> Good catch; changed here and below.
>
> >> +        if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0:
> >
> > I think the pythonic way is "if k.value not in range(0, 255). Still how
> > would it be possible to read keys greater than 255 if it's being limited
> > at the ebpf side?
>
> See above.
>
> >> +            continue
> >> +        have_upcalls = True
> >> +        break
> >> +
> >> +    if not have_upcalls:
> >> +        return
> >> +
> >> +    print("\n\n# UPCALL STATISTICS:\n  Total upcalls       : {} [{}]".format(
> >> +        *format_per_cpu_array(upcalls)))
> >> +
> >
> > When looking at theese statistics, users may take a while to figure out
> > what each number means, e.g:
> >
> > # UPCALL STATISTICS:
> >   Total upcalls       : 240933 [0: 33165, 2: 27495, 3: 18270, 4:
> > 15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345]
> >   Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5:
> > 11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475]
> >   Failed upcalls      : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
> > 5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870]
> >      11, EAGAIN       : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
> > 5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387
> >
> > An option could be to describe the format in the header, e.g:
> >
> > # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...])
> >   Total upcalls       : 240933 [0: 33165, 2: 27495, 3: 18270, 4:
> > 15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345]
> >   Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5:
> > 11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475]
> >   Failed upcalls      : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
> > 5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870]
> >      11, EAGAIN       : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
> > 5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387
>
> I went for the above, as the CPU set might not be the same in all lines, and I would like to condense this as much as possible (I was testing on 64 CPUs ;).
>
> > Another option could be to print them in a table:
> >
> >      Result       TOTAL    CPU 0   CPU 2   CPU 3    CPU 4   CPU 5
> > CPU 6   CPU 8   CPU 9    CPU 10
> >  -------------- --------- ------- ------- -------- ------- -------
> > ------- ------- -------- --------
> >   TOTAL           240933   33165   27495    18270   15300   33885
> > 4050   51348    33075    24345
> >    Successful     80538    11804    8382     5660    5716   11606
> > 2845   13717    10333    10475
> >    Failed        161587    21361   19113    12610    9584   22279
> > 2397   37631    22742    13870
> >     11, EAGAIN   162528    21361   19113    12610    9584   22279
> > 3345   37631    22742     1387
> >
> >
> >
> >
> >> +    for k in upcalls.keys():
> >
> > nit: Same as above, "for k in upcalls:"
>
> Ack
>
> >> +        error = k.value
> >> +        total, per_cpu = format_per_cpu_array(upcalls, key=k)
> >> +
> >> +        if error != 0 and total == "0":
> >
> > Maybe return the total as integer to avoid this strange looking '== "0"'?
>
> Ack, yes was too much C-minded here always returning a string…
>
> >> +            continue
> >> +
> >> +        if error == 0:
> >> +            print("  Successfull upcalls : {} [{}]".format(total, per_cpu))
> >> +
> >> +            total, per_cpu = format_per_cpu_array(upcalls, skip_key=k)
> >> +            print("  Failed upcalls      : {} [{}]".format(total, per_cpu))
> >> +        else:
> >> +            print("    {:3}, {:13}: {} [{}]".format(error,
> >> +                                                    get_errno_short(error),
> >> +                                                    total, per_cpu))
> >> +
> >> +
> >>  #
> >>  # process_results()
> >>  #
> >> @@ -1074,7 +1188,12 @@ def process_results(syscall_events=None, trigger_delta=None):
> >>                  indent, "TOTAL:", "", total_count, total_ns))
> >>
> >>      #
> >> -    # Print events
> >> +    # Print upcall statistics
> >> +    #
> >> +    display_upcall_results()
> >> +
> >> +    #
> >> +    # Print syscall events
> >>      #
> >>      lost_stack_traces = 0
> >>      if syscall_events:
> >> @@ -1194,6 +1313,9 @@ def main():
> >>      parser.add_argument("--skip-syscall-poll-events",
> >>                          help="Skip poll() syscalls with --syscall-events",
> >>                          action="store_true")
> >> +    parser.add_argument("--skip-upcall-stats",
> >> +                        help="Skip the collection of upcall statistics",
> >> +                        action="store_true")
> >>      parser.add_argument("--stack-trace-size",
> >>                          help="Number of unique stack traces that can be "
> >>                          "recorded, default 4096. 0 to disable",
> >> @@ -1298,6 +1420,9 @@ def main():
> >>      source = source.replace("<STACK_TRACE_ENABLED>", "true"
> >>                              if options.stack_trace_size > 0 else "false")
> >>
> >> +    source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0"
> >> +                            if options.skip_upcall_stats else "1")
> >> +
> >>      #
> >>      # Handle start/stop probes
> >>      #
> >> diff --git a/utilities/usdt-scripts/kernel_delay.rst b/utilities/usdt-scripts/kernel_delay.rst
> >> index 0f6f916a7..41620d760 100644
> >> --- a/utilities/usdt-scripts/kernel_delay.rst
> >> +++ b/utilities/usdt-scripts/kernel_delay.rst
> >> @@ -109,6 +109,7 @@ followed by resource-specific data. Which are:
> >>  - ``THREAD STOPPED STATISTICS``
> >>  - ``HARD IRQ STATISTICS``
> >>  - ``SOFT IRQ STATISTICS``
> >> +- ``UPCALL STATISTICS``
> >>
> >>  The following sections will describe in detail what statistics they report.
> >>
> >> @@ -187,6 +188,14 @@ number of interrupts (``COUNT``), the total time spent in the interrupt
> >>  handler (``TOTAL ns``), and the worst-case duration (``MAX ns``).
> >>
> >>
> >> +``UPCALL STATISTICS``
> >> +~~~~~~~~~~~~~~~~~~~~~
> >> +The ``UPCALL STATISTICS`` section provides information on the number of
> >> +upcalls sent by the kernel to userspace. If any upcalls fail to be sent,
> >> +the specific return codes are recorded. Statistics are presented both as
> >> +a total and on a per-CPU basis.
> >> +
> >> +
> >>  The ``--syscall-events`` option
> >>  -------------------------------
> >>  In addition to reporting global syscall statistics in ``SYSCALL_STATISTICS``,
> >>
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
>
diff mbox series

Patch

diff --git a/utilities/automake.mk b/utilities/automake.mk
index acc1af4e0..22260b254 100644
--- a/utilities/automake.mk
+++ b/utilities/automake.mk
@@ -149,6 +149,7 @@  FLAKE8_PYFILES += utilities/ovs-pcap.in \
 	utilities/ovs-pipegen.py \
 	utilities/usdt-scripts/dpif_op_nl_monitor.py \
 	utilities/usdt-scripts/flow_reval_monitor.py \
+	utilities/usdt-scripts/kernel_delay.py \
 	utilities/usdt-scripts/upcall_monitor.py \
 	utilities/usdt-scripts/upcall_cost.py
 
diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py
index de6b0c9de..a5aa1a3d4 100755
--- a/utilities/usdt-scripts/kernel_delay.py
+++ b/utilities/usdt-scripts/kernel_delay.py
@@ -39,6 +39,7 @@ 
 #
 import argparse
 import datetime
+import errno
 import os
 import pytz
 import psutil
@@ -556,6 +557,30 @@  TRACEPOINT_PROBE(irq, softirq_exit)
     data->start_ns = 0;
     return 0;
 }
+
+
+/*
+ * For measuring upcall statistics (per CPU).
+ */
+#define MAX_ERROR 255
+BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 1);
+
+#if <INSTALL_OVS_DP_UPCALL_PROBE>
+int kretprobe__ovs_dp_upcall(struct pt_regs *ctx)
+{
+    int ret = PT_REGS_RC(ctx);
+
+    if (ret > 0)
+        ret = 0;
+
+    ret = -ret;
+    if (ret > MAX_ERROR)
+        ret = MAX_ERROR;
+
+    upcall_count.increment(ret);
+    return 0;
+}
+#endif /* For measuring upcall statistics/errors. */
 """
 
 
@@ -900,6 +925,95 @@  def print_timestamp(msg):
     print(time_string)
 
 
+#
+# Get errno short string
+#
+def get_errno_short(err):
+    try:
+        return errno.errorcode[err]
+    except KeyError:
+        return "_unknown_"
+
+
+#
+# Format a bfd per cpu array entry per cpu (if the count is > 0)
+#
+def format_per_cpu_array(cpu_array, key=None, skip_key=None):
+    per_cpu = ""
+
+    if key is not None:
+        total = cpu_array.sum(key).value
+        if total > 0:
+            for cpu, value in enumerate(cpu_array.getvalue(key)):
+                if value == 0:
+                    continue
+
+                per_cpu += " {}: {},".format(cpu, value)
+    else:
+        total = 0
+        total_cpu = None
+
+        for key in cpu_array.keys():
+            # I've seen extrem odd hight values, to avoid those, exclude any
+            # index higher thant 1024 for now.
+            if (not (0 <= key.value <= 1024)) or (skip_key is not None and
+                                                  skip_key.value == key.value):
+                continue
+
+            if total_cpu is None:
+                total_cpu = [0] * len(cpu_array.getvalue(key))
+
+            for cpu, value in enumerate(cpu_array.getvalue(key)):
+                total_cpu[cpu] += value
+                total += value
+
+        if total >= 0:
+            for cpu, value in enumerate(total_cpu):
+                if value == 0:
+                    continue
+
+                per_cpu += " {}: {},".format(cpu, value)
+
+    return str(total), per_cpu.strip(", ")
+
+
+#
+# Display kernel upcall statistics
+#
+def display_upcall_results():
+    upcalls = bpf.get_table("upcall_count")
+    have_upcalls = False
+
+    for k in upcalls.keys():
+        if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0:
+            continue
+        have_upcalls = True
+        break
+
+    if not have_upcalls:
+        return
+
+    print("\n\n# UPCALL STATISTICS:\n  Total upcalls       : {} [{}]".format(
+        *format_per_cpu_array(upcalls)))
+
+    for k in upcalls.keys():
+        error = k.value
+        total, per_cpu = format_per_cpu_array(upcalls, key=k)
+
+        if error != 0 and total == "0":
+            continue
+
+        if error == 0:
+            print("  Successfull upcalls : {} [{}]".format(total, per_cpu))
+
+            total, per_cpu = format_per_cpu_array(upcalls, skip_key=k)
+            print("  Failed upcalls      : {} [{}]".format(total, per_cpu))
+        else:
+            print("    {:3}, {:13}: {} [{}]".format(error,
+                                                    get_errno_short(error),
+                                                    total, per_cpu))
+
+
 #
 # process_results()
 #
@@ -1074,7 +1188,12 @@  def process_results(syscall_events=None, trigger_delta=None):
                 indent, "TOTAL:", "", total_count, total_ns))
 
     #
-    # Print events
+    # Print upcall statistics
+    #
+    display_upcall_results()
+
+    #
+    # Print syscall events
     #
     lost_stack_traces = 0
     if syscall_events:
@@ -1194,6 +1313,9 @@  def main():
     parser.add_argument("--skip-syscall-poll-events",
                         help="Skip poll() syscalls with --syscall-events",
                         action="store_true")
+    parser.add_argument("--skip-upcall-stats",
+                        help="Skip the collection of upcall statistics",
+                        action="store_true")
     parser.add_argument("--stack-trace-size",
                         help="Number of unique stack traces that can be "
                         "recorded, default 4096. 0 to disable",
@@ -1298,6 +1420,9 @@  def main():
     source = source.replace("<STACK_TRACE_ENABLED>", "true"
                             if options.stack_trace_size > 0 else "false")
 
+    source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0"
+                            if options.skip_upcall_stats else "1")
+
     #
     # Handle start/stop probes
     #
diff --git a/utilities/usdt-scripts/kernel_delay.rst b/utilities/usdt-scripts/kernel_delay.rst
index 0f6f916a7..41620d760 100644
--- a/utilities/usdt-scripts/kernel_delay.rst
+++ b/utilities/usdt-scripts/kernel_delay.rst
@@ -109,6 +109,7 @@  followed by resource-specific data. Which are:
 - ``THREAD STOPPED STATISTICS``
 - ``HARD IRQ STATISTICS``
 - ``SOFT IRQ STATISTICS``
+- ``UPCALL STATISTICS``
 
 The following sections will describe in detail what statistics they report.
 
@@ -187,6 +188,14 @@  number of interrupts (``COUNT``), the total time spent in the interrupt
 handler (``TOTAL ns``), and the worst-case duration (``MAX ns``).
 
 
+``UPCALL STATISTICS``
+~~~~~~~~~~~~~~~~~~~~~
+The ``UPCALL STATISTICS`` section provides information on the number of
+upcalls sent by the kernel to userspace. If any upcalls fail to be sent,
+the specific return codes are recorded. Statistics are presented both as
+a total and on a per-CPU basis.
+
+
 The ``--syscall-events`` option
 -------------------------------
 In addition to reporting global syscall statistics in ``SYSCALL_STATISTICS``,