| Message ID | 173254404327.545539.5146903122430077833.stgit@wsfd-advnetlab224.anl.eng.rdu2.dc.redhat.com |
|---|---|
| State | Changes Requested |
| Delegated to: | aaron conole |
| Headers | show |
| Series | [ovs-dev] utilities: Add upcall statistics to the kernel_delay.py script. | expand |
| Context | Check | Description |
|---|---|---|
| ovsrobot/apply-robot | success | apply and check: success |
| ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
| ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
On 25 Nov 2024, at 15:14, Eelco Chaudron wrote: > This patch installs a kernel return probe on ovs_dp_upcall() to record > all successful and failed calls per CPU. The statistics are reported > when the script exits, providing insights into the upcall behavior. > > This is an example output: > > # UPCALL STATISTICS: > Total upcalls : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...] > Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...] > Failed upcalls : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] > 11, EAGAIN : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] > > Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Recheck-request: github-robot
On Mon, Nov 25, 2024 at 03:14:20PM +0100, Eelco Chaudron wrote: > This patch installs a kernel return probe on ovs_dp_upcall() to record > all successful and failed calls per CPU. The statistics are reported > when the script exits, providing insights into the upcall behavior. > > This is an example output: > > # UPCALL STATISTICS: > Total upcalls : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...] > Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...] > Failed upcalls : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] > 11, EAGAIN : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] > > Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Thanks for this patch. Correlating upcall failures to kernel delays is extremely useful. Some comments below. > --- > utilities/automake.mk | 1 > utilities/usdt-scripts/kernel_delay.py | 127 +++++++++++++++++++++++++++++++ > utilities/usdt-scripts/kernel_delay.rst | 9 ++ > 3 files changed, 136 insertions(+), 1 deletion(-) > > diff --git a/utilities/automake.mk b/utilities/automake.mk > index acc1af4e0..22260b254 100644 > --- a/utilities/automake.mk > +++ b/utilities/automake.mk > @@ -149,6 +149,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \ > utilities/ovs-pipegen.py \ > utilities/usdt-scripts/dpif_op_nl_monitor.py \ > utilities/usdt-scripts/flow_reval_monitor.py \ > + utilities/usdt-scripts/kernel_delay.py \ > utilities/usdt-scripts/upcall_monitor.py \ > utilities/usdt-scripts/upcall_cost.py > > diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py > index de6b0c9de..a5aa1a3d4 100755 > --- a/utilities/usdt-scripts/kernel_delay.py > +++ b/utilities/usdt-scripts/kernel_delay.py > @@ -39,6 +39,7 @@ > # > import argparse > import datetime > +import errno > import os > import pytz > import psutil > @@ -556,6 +557,30 @@ TRACEPOINT_PROBE(irq, softirq_exit) > data->start_ns = 0; > return 0; > } > + > + > +/* > + * For measuring upcall statistics (per CPU). > + */ > +#define MAX_ERROR 255 Why this limit? The kernel seems to use 4095 in the PTR_ERR infrastructure. Although not exposed to user applications, kernel has some > 255 values. They don't seem networking related for now but still. In order to reduce this arbitrary limit and avoid needless iteration over an almost empty array, I'd make it a percpu_hash. It supports "lookup_or_try_init" and "increment" so it's as easy to use from ebpf and almost identically from python. > +BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 1); > + > +#if <INSTALL_OVS_DP_UPCALL_PROBE> > +int kretprobe__ovs_dp_upcall(struct pt_regs *ctx) > +{ When trying the script I was a bit confused that numbers didn't add up, e.g: # UPCALL STATISTICS: Total upcalls : 776234 [0: 204345, 1: 38773, 2: 38206, 3: 38700, 4: 104040, 5: 77400, 6: 70290, 8: 69705, 9: 67410, 10: 14220, 11: 53145] Successfull upcalls : 268640 [0: 72829, 1: 14556, 2: 12876, 3: 13192, 4: 34227, 5: 26480, 6: 24130, 8: 23791, 9: 23372, 10: 6164, 11: 17023] Failed upcalls : 508564 [0: 131516, 1: 25187, 2: 25330, 3: 25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11: 36122] 11, EAGAIN : 509485 [0: 131516, 1: 26112, 2: 25330, 3: 25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11: 36122] Total of EAGAIN erros != total Failed and Total Failed + Total Successfull != Total. I believe the reason is that events of this kind are still being triggered while results are being processed. The problem seems to go away if I add the following lines at the top of this probe: if (!capture_enabled__()) return 0; > + int ret = PT_REGS_RC(ctx); > + > + if (ret > 0) > + ret = 0; > + > + ret = -ret; > + if (ret > MAX_ERROR) > + ret = MAX_ERROR; > + > + upcall_count.increment(ret); > + return 0; > +} > +#endif /* For measuring upcall statistics/errors. */ > """ > > > @@ -900,6 +925,95 @@ def print_timestamp(msg): > print(time_string) > > > +# > +# Get errno short string > +# > +def get_errno_short(err): > + try: > + return errno.errorcode[err] > + except KeyError: > + return "_unknown_" > + > + > +# > +# Format a bfd per cpu array entry per cpu (if the count is > 0) > +# s/bfd/bpf? Also, "per cpu" twice makes the sentense a bit difficult to read. > +def format_per_cpu_array(cpu_array, key=None, skip_key=None): > + per_cpu = "" > + > + if key is not None: "is not None" is redundant: "if key:" is equivalent. > + total = cpu_array.sum(key).value > + if total > 0: > + for cpu, value in enumerate(cpu_array.getvalue(key)): > + if value == 0: > + continue > + > + per_cpu += " {}: {},".format(cpu, value) > + else: > + total = 0 > + total_cpu = None > + > + for key in cpu_array.keys(): See below. > + # I've seen extrem odd hight values, to avoid those, exclude any typo? "extremely odd high values"? > + # index higher thant 1024 for now. What values you were getting? I'm not seing any weird value. Besides, isn't the maximum set in the ebpf side to 255? Plus, it's an array so its size is limitted and no key can exceed 256 right? > + if (not (0 <= key.value <= 1024)) or (skip_key is not None and > + skip_key.value == key.value): ditto: key.value in range(0, 1024) s/skip_key is not None/skip_key/? OTOH, from include/linux/err.h: #define MAX_ERRNO 4095 Maybe that value makes more sense although I really don't think we need a check here. > + continue > + > + if total_cpu is None: if not total_cpu > + total_cpu = [0] * len(cpu_array.getvalue(key)) > + > + for cpu, value in enumerate(cpu_array.getvalue(key)): > + total_cpu[cpu] += value > + total += value > + > + if total >= 0: > + for cpu, value in enumerate(total_cpu): > + if value == 0: > + continue > + > + per_cpu += " {}: {},".format(cpu, value) > + > + return str(total), per_cpu.strip(", ") > + > + > +# > +# Display kernel upcall statistics > +# > +def display_upcall_results(): > + upcalls = bpf.get_table("upcall_count") From bcc's reference: get_table() Returns a table object. This is no longer used, as tables can now be read as items from BPF. Eg: ```BPF[name]```. Not sure if this means the API will be deprecated... > + have_upcalls = False > + > + for k in upcalls.keys(): Interestingly, "keys()" is not documented as an API. Nevertheless, it's not necessary, since "table" iterates over keys (same as list or dict in python). So it could be written as "for k in upcalls:". > + if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0: I think the pythonic way is "if k.value not in range(0, 255). Still how would it be possible to read keys greater than 255 if it's being limited at the ebpf side? > + continue > + have_upcalls = True > + break > + > + if not have_upcalls: > + return > + > + print("\n\n# UPCALL STATISTICS:\n Total upcalls : {} [{}]".format( > + *format_per_cpu_array(upcalls))) > + When looking at theese statistics, users may take a while to figure out what each number means, e.g: # UPCALL STATISTICS: Total upcalls : 240933 [0: 33165, 2: 27495, 3: 18270, 4: 15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345] Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5: 11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475] Failed upcalls : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584, 5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870] 11, EAGAIN : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584, 5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387 An option could be to describe the format in the header, e.g: # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]) Total upcalls : 240933 [0: 33165, 2: 27495, 3: 18270, 4: 15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345] Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5: 11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475] Failed upcalls : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584, 5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870] 11, EAGAIN : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584, 5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387 Another option could be to print them in a table: Result TOTAL CPU 0 CPU 2 CPU 3 CPU 4 CPU 5 CPU 6 CPU 8 CPU 9 CPU 10 -------------- --------- ------- ------- -------- ------- ------- ------- ------- -------- -------- TOTAL 240933 33165 27495 18270 15300 33885 4050 51348 33075 24345 Successful 80538 11804 8382 5660 5716 11606 2845 13717 10333 10475 Failed 161587 21361 19113 12610 9584 22279 2397 37631 22742 13870 11, EAGAIN 162528 21361 19113 12610 9584 22279 3345 37631 22742 1387 > + for k in upcalls.keys(): nit: Same as above, "for k in upcalls:" > + error = k.value > + total, per_cpu = format_per_cpu_array(upcalls, key=k) > + > + if error != 0 and total == "0": Maybe return the total as integer to avoid this strange looking '== "0"'? > + continue > + > + if error == 0: > + print(" Successfull upcalls : {} [{}]".format(total, per_cpu)) > + > + total, per_cpu = format_per_cpu_array(upcalls, skip_key=k) > + print(" Failed upcalls : {} [{}]".format(total, per_cpu)) > + else: > + print(" {:3}, {:13}: {} [{}]".format(error, > + get_errno_short(error), > + total, per_cpu)) > + > + > # > # process_results() > # > @@ -1074,7 +1188,12 @@ def process_results(syscall_events=None, trigger_delta=None): > indent, "TOTAL:", "", total_count, total_ns)) > > # > - # Print events > + # Print upcall statistics > + # > + display_upcall_results() > + > + # > + # Print syscall events > # > lost_stack_traces = 0 > if syscall_events: > @@ -1194,6 +1313,9 @@ def main(): > parser.add_argument("--skip-syscall-poll-events", > help="Skip poll() syscalls with --syscall-events", > action="store_true") > + parser.add_argument("--skip-upcall-stats", > + help="Skip the collection of upcall statistics", > + action="store_true") > parser.add_argument("--stack-trace-size", > help="Number of unique stack traces that can be " > "recorded, default 4096. 0 to disable", > @@ -1298,6 +1420,9 @@ def main(): > source = source.replace("<STACK_TRACE_ENABLED>", "true" > if options.stack_trace_size > 0 else "false") > > + source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0" > + if options.skip_upcall_stats else "1") > + > # > # Handle start/stop probes > # > diff --git a/utilities/usdt-scripts/kernel_delay.rst b/utilities/usdt-scripts/kernel_delay.rst > index 0f6f916a7..41620d760 100644 > --- a/utilities/usdt-scripts/kernel_delay.rst > +++ b/utilities/usdt-scripts/kernel_delay.rst > @@ -109,6 +109,7 @@ followed by resource-specific data. Which are: > - ``THREAD STOPPED STATISTICS`` > - ``HARD IRQ STATISTICS`` > - ``SOFT IRQ STATISTICS`` > +- ``UPCALL STATISTICS`` > > The following sections will describe in detail what statistics they report. > > @@ -187,6 +188,14 @@ number of interrupts (``COUNT``), the total time spent in the interrupt > handler (``TOTAL ns``), and the worst-case duration (``MAX ns``). > > > +``UPCALL STATISTICS`` > +~~~~~~~~~~~~~~~~~~~~~ > +The ``UPCALL STATISTICS`` section provides information on the number of > +upcalls sent by the kernel to userspace. If any upcalls fail to be sent, > +the specific return codes are recorded. Statistics are presented both as > +a total and on a per-CPU basis. > + > + > The ``--syscall-events`` option > ------------------------------- > In addition to reporting global syscall statistics in ``SYSCALL_STATISTICS``, > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
Eelco Chaudron <echaudro@redhat.com> writes: > This patch installs a kernel return probe on ovs_dp_upcall() to record > all successful and failed calls per CPU. The statistics are reported > when the script exits, providing insights into the upcall behavior. > > This is an example output: > > # UPCALL STATISTICS: > Total upcalls : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...] > Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...] > Failed upcalls : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] > 11, EAGAIN : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] > > Signed-off-by: Eelco Chaudron <echaudro@redhat.com> > --- Some nits below, but I don't think anything that would prevent it from being accepted. > utilities/automake.mk | 1 > utilities/usdt-scripts/kernel_delay.py | 127 +++++++++++++++++++++++++++++++ > utilities/usdt-scripts/kernel_delay.rst | 9 ++ > 3 files changed, 136 insertions(+), 1 deletion(-) > > diff --git a/utilities/automake.mk b/utilities/automake.mk > index acc1af4e0..22260b254 100644 > --- a/utilities/automake.mk > +++ b/utilities/automake.mk > @@ -149,6 +149,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \ > utilities/ovs-pipegen.py \ > utilities/usdt-scripts/dpif_op_nl_monitor.py \ > utilities/usdt-scripts/flow_reval_monitor.py \ > + utilities/usdt-scripts/kernel_delay.py \ Should this be a separate patch so we can backport it? > utilities/usdt-scripts/upcall_monitor.py \ > utilities/usdt-scripts/upcall_cost.py > > diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py > index de6b0c9de..a5aa1a3d4 100755 > --- a/utilities/usdt-scripts/kernel_delay.py > +++ b/utilities/usdt-scripts/kernel_delay.py > @@ -39,6 +39,7 @@ > # > import argparse > import datetime > +import errno > import os > import pytz > import psutil > @@ -556,6 +557,30 @@ TRACEPOINT_PROBE(irq, softirq_exit) > data->start_ns = 0; > return 0; > } > + > + > +/* > + * For measuring upcall statistics (per CPU). > + */ > +#define MAX_ERROR 255 > +BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 1); > + > +#if <INSTALL_OVS_DP_UPCALL_PROBE> > +int kretprobe__ovs_dp_upcall(struct pt_regs *ctx) > +{ > + int ret = PT_REGS_RC(ctx); > + > + if (ret > 0) > + ret = 0; > + > + ret = -ret; > + if (ret > MAX_ERROR) > + ret = MAX_ERROR; > + > + upcall_count.increment(ret); > + return 0; > +} > +#endif /* For measuring upcall statistics/errors. */ > """ > > > @@ -900,6 +925,95 @@ def print_timestamp(msg): > print(time_string) > > > +# > +# Get errno short string > +# > +def get_errno_short(err): > + try: > + return errno.errorcode[err] > + except KeyError: > + return "_unknown_" > + > + > +# > +# Format a bfd per cpu array entry per cpu (if the count is > 0) > +# > +def format_per_cpu_array(cpu_array, key=None, skip_key=None): > + per_cpu = "" > + > + if key is not None: > + total = cpu_array.sum(key).value > + if total > 0: > + for cpu, value in enumerate(cpu_array.getvalue(key)): > + if value == 0: > + continue > + > + per_cpu += " {}: {},".format(cpu, value) > + else: > + total = 0 > + total_cpu = None > + > + for key in cpu_array.keys(): > + # I've seen extrem odd hight values, to avoid those, exclude any > + # index higher thant 1024 for now. > + if (not (0 <= key.value <= 1024)) or (skip_key is not None and > + skip_key.value == key.value): > + continue > + > + if total_cpu is None: > + total_cpu = [0] * len(cpu_array.getvalue(key)) > + > + for cpu, value in enumerate(cpu_array.getvalue(key)): > + total_cpu[cpu] += value > + total += value > + > + if total >= 0: > + for cpu, value in enumerate(total_cpu): > + if value == 0: > + continue > + > + per_cpu += " {}: {},".format(cpu, value) > + > + return str(total), per_cpu.strip(", ") > + > + > +# > +# Display kernel upcall statistics > +# > +def display_upcall_results(): > + upcalls = bpf.get_table("upcall_count") > + have_upcalls = False > + > + for k in upcalls.keys(): > + if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0: > + continue > + have_upcalls = True > + break > + > + if not have_upcalls: > + return > + > + print("\n\n# UPCALL STATISTICS:\n Total upcalls : {} [{}]".format( > + *format_per_cpu_array(upcalls))) > + > + for k in upcalls.keys(): > + error = k.value > + total, per_cpu = format_per_cpu_array(upcalls, key=k) > + > + if error != 0 and total == "0": > + continue > + > + if error == 0: > + print(" Successfull upcalls : {} [{}]".format(total, per_cpu)) > + > + total, per_cpu = format_per_cpu_array(upcalls, skip_key=k) > + print(" Failed upcalls : {} [{}]".format(total, per_cpu)) > + else: > + print(" {:3}, {:13}: {} [{}]".format(error, > + get_errno_short(error), > + total, per_cpu)) > + > + > # > # process_results() > # > @@ -1074,7 +1188,12 @@ def process_results(syscall_events=None, trigger_delta=None): > indent, "TOTAL:", "", total_count, total_ns)) > > # > - # Print events > + # Print upcall statistics > + # > + display_upcall_results() > + > + # > + # Print syscall events This text changed from the original. It's okay to not separate it here I think, but that's why the diff looks the way it does. > # > lost_stack_traces = 0 > if syscall_events: > @@ -1194,6 +1313,9 @@ def main(): > parser.add_argument("--skip-syscall-poll-events", > help="Skip poll() syscalls with --syscall-events", > action="store_true") > + parser.add_argument("--skip-upcall-stats", > + help="Skip the collection of upcall statistics", > + action="store_true") > parser.add_argument("--stack-trace-size", > help="Number of unique stack traces that can be " > "recorded, default 4096. 0 to disable", > @@ -1298,6 +1420,9 @@ def main(): > source = source.replace("<STACK_TRACE_ENABLED>", "true" > if options.stack_trace_size > 0 else "false") > > + source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0" > + if options.skip_upcall_stats else "1") > + > # > # Handle start/stop probes > # > diff --git a/utilities/usdt-scripts/kernel_delay.rst b/utilities/usdt-scripts/kernel_delay.rst > index 0f6f916a7..41620d760 100644 > --- a/utilities/usdt-scripts/kernel_delay.rst > +++ b/utilities/usdt-scripts/kernel_delay.rst > @@ -109,6 +109,7 @@ followed by resource-specific data. Which are: > - ``THREAD STOPPED STATISTICS`` > - ``HARD IRQ STATISTICS`` > - ``SOFT IRQ STATISTICS`` > +- ``UPCALL STATISTICS`` > > The following sections will describe in detail what statistics they report. > > @@ -187,6 +188,14 @@ number of interrupts (``COUNT``), the total time spent in the interrupt > handler (``TOTAL ns``), and the worst-case duration (``MAX ns``). > > > +``UPCALL STATISTICS`` > +~~~~~~~~~~~~~~~~~~~~~ > +The ``UPCALL STATISTICS`` section provides information on the number of > +upcalls sent by the kernel to userspace. If any upcalls fail to be sent, > +the specific return codes are recorded. Statistics are presented both as > +a total and on a per-CPU basis. > + > + > The ``--syscall-events`` option > ------------------------------- > In addition to reporting global syscall statistics in ``SYSCALL_STATISTICS``, > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 16 Dec 2024, at 15:21, Aaron Conole wrote: > Eelco Chaudron <echaudro@redhat.com> writes: > >> This patch installs a kernel return probe on ovs_dp_upcall() to record >> all successful and failed calls per CPU. The statistics are reported >> when the script exits, providing insights into the upcall behavior. >> >> This is an example output: >> >> # UPCALL STATISTICS: >> Total upcalls : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...] >> Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...] >> Failed upcalls : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] >> 11, EAGAIN : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] >> >> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> >> --- > > Some nits below, but I don't think anything that would prevent it from > being accepted. > >> utilities/automake.mk | 1 >> utilities/usdt-scripts/kernel_delay.py | 127 +++++++++++++++++++++++++++++++ >> utilities/usdt-scripts/kernel_delay.rst | 9 ++ >> 3 files changed, 136 insertions(+), 1 deletion(-) >> >> diff --git a/utilities/automake.mk b/utilities/automake.mk >> index acc1af4e0..22260b254 100644 >> --- a/utilities/automake.mk >> +++ b/utilities/automake.mk >> @@ -149,6 +149,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \ >> utilities/ovs-pipegen.py \ >> utilities/usdt-scripts/dpif_op_nl_monitor.py \ >> utilities/usdt-scripts/flow_reval_monitor.py \ >> + utilities/usdt-scripts/kernel_delay.py \ > > Should this be a separate patch so we can backport it? I sneaked this in, as I don’t see this as a must for backporting. Changes to the script will probably only happen on the main branch, and if backported, they should have passed flake8 :) >> utilities/usdt-scripts/upcall_monitor.py \ >> utilities/usdt-scripts/upcall_cost.py >> >> diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py >> index de6b0c9de..a5aa1a3d4 100755 >> --- a/utilities/usdt-scripts/kernel_delay.py >> +++ b/utilities/usdt-scripts/kernel_delay.py >> @@ -39,6 +39,7 @@ >> # >> import argparse >> import datetime >> +import errno >> import os >> import pytz >> import psutil >> @@ -556,6 +557,30 @@ TRACEPOINT_PROBE(irq, softirq_exit) >> data->start_ns = 0; >> return 0; >> } >> + >> + >> +/* >> + * For measuring upcall statistics (per CPU). >> + */ >> +#define MAX_ERROR 255 >> +BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 1); >> + >> +#if <INSTALL_OVS_DP_UPCALL_PROBE> >> +int kretprobe__ovs_dp_upcall(struct pt_regs *ctx) >> +{ >> + int ret = PT_REGS_RC(ctx); >> + >> + if (ret > 0) >> + ret = 0; >> + >> + ret = -ret; >> + if (ret > MAX_ERROR) >> + ret = MAX_ERROR; >> + >> + upcall_count.increment(ret); >> + return 0; >> +} >> +#endif /* For measuring upcall statistics/errors. */ >> """ >> >> >> @@ -900,6 +925,95 @@ def print_timestamp(msg): >> print(time_string) >> >> >> +# >> +# Get errno short string >> +# >> +def get_errno_short(err): >> + try: >> + return errno.errorcode[err] >> + except KeyError: >> + return "_unknown_" >> + >> + >> +# >> +# Format a bfd per cpu array entry per cpu (if the count is > 0) >> +# >> +def format_per_cpu_array(cpu_array, key=None, skip_key=None): >> + per_cpu = "" >> + >> + if key is not None: >> + total = cpu_array.sum(key).value >> + if total > 0: >> + for cpu, value in enumerate(cpu_array.getvalue(key)): >> + if value == 0: >> + continue >> + >> + per_cpu += " {}: {},".format(cpu, value) >> + else: >> + total = 0 >> + total_cpu = None >> + >> + for key in cpu_array.keys(): >> + # I've seen extrem odd hight values, to avoid those, exclude any >> + # index higher thant 1024 for now. >> + if (not (0 <= key.value <= 1024)) or (skip_key is not None and >> + skip_key.value == key.value): >> + continue >> + >> + if total_cpu is None: >> + total_cpu = [0] * len(cpu_array.getvalue(key)) >> + >> + for cpu, value in enumerate(cpu_array.getvalue(key)): >> + total_cpu[cpu] += value >> + total += value >> + >> + if total >= 0: >> + for cpu, value in enumerate(total_cpu): >> + if value == 0: >> + continue >> + >> + per_cpu += " {}: {},".format(cpu, value) >> + >> + return str(total), per_cpu.strip(", ") >> + >> + >> +# >> +# Display kernel upcall statistics >> +# >> +def display_upcall_results(): >> + upcalls = bpf.get_table("upcall_count") >> + have_upcalls = False >> + >> + for k in upcalls.keys(): >> + if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0: >> + continue >> + have_upcalls = True >> + break >> + >> + if not have_upcalls: >> + return >> + >> + print("\n\n# UPCALL STATISTICS:\n Total upcalls : {} [{}]".format( >> + *format_per_cpu_array(upcalls))) >> + >> + for k in upcalls.keys(): >> + error = k.value >> + total, per_cpu = format_per_cpu_array(upcalls, key=k) >> + >> + if error != 0 and total == "0": >> + continue >> + >> + if error == 0: >> + print(" Successfull upcalls : {} [{}]".format(total, per_cpu)) >> + >> + total, per_cpu = format_per_cpu_array(upcalls, skip_key=k) >> + print(" Failed upcalls : {} [{}]".format(total, per_cpu)) >> + else: >> + print(" {:3}, {:13}: {} [{}]".format(error, >> + get_errno_short(error), >> + total, per_cpu)) >> + >> + >> # >> # process_results() >> # >> @@ -1074,7 +1188,12 @@ def process_results(syscall_events=None, trigger_delta=None): >> indent, "TOTAL:", "", total_count, total_ns)) >> >> # >> - # Print events >> + # Print upcall statistics >> + # >> + display_upcall_results() >> + >> + # >> + # Print syscall events > > This text changed from the original. It's okay to not separate it here > I think, but that's why the diff looks the way it does. Yes this looked odd, could not convince diff to show if differently. >> # >> lost_stack_traces = 0 >> if syscall_events: >> @@ -1194,6 +1313,9 @@ def main(): >> parser.add_argument("--skip-syscall-poll-events", >> help="Skip poll() syscalls with --syscall-events", >> action="store_true") >> + parser.add_argument("--skip-upcall-stats", >> + help="Skip the collection of upcall statistics", >> + action="store_true") >> parser.add_argument("--stack-trace-size", >> help="Number of unique stack traces that can be " >> "recorded, default 4096. 0 to disable", >> @@ -1298,6 +1420,9 @@ def main(): >> source = source.replace("<STACK_TRACE_ENABLED>", "true" >> if options.stack_trace_size > 0 else "false") >> >> + source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0" >> + if options.skip_upcall_stats else "1") >> + >> # >> # Handle start/stop probes >> # >> diff --git a/utilities/usdt-scripts/kernel_delay.rst b/utilities/usdt-scripts/kernel_delay.rst >> index 0f6f916a7..41620d760 100644 >> --- a/utilities/usdt-scripts/kernel_delay.rst >> +++ b/utilities/usdt-scripts/kernel_delay.rst >> @@ -109,6 +109,7 @@ followed by resource-specific data. Which are: >> - ``THREAD STOPPED STATISTICS`` >> - ``HARD IRQ STATISTICS`` >> - ``SOFT IRQ STATISTICS`` >> +- ``UPCALL STATISTICS`` >> >> The following sections will describe in detail what statistics they report. >> >> @@ -187,6 +188,14 @@ number of interrupts (``COUNT``), the total time spent in the interrupt >> handler (``TOTAL ns``), and the worst-case duration (``MAX ns``). >> >> >> +``UPCALL STATISTICS`` >> +~~~~~~~~~~~~~~~~~~~~~ >> +The ``UPCALL STATISTICS`` section provides information on the number of >> +upcalls sent by the kernel to userspace. If any upcalls fail to be sent, >> +the specific return codes are recorded. Statistics are presented both as >> +a total and on a per-CPU basis. >> + >> + >> The ``--syscall-events`` option >> ------------------------------- >> In addition to reporting global syscall statistics in ``SYSCALL_STATISTICS``, >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 13 Dec 2024, at 13:56, Adrián Moreno wrote: > On Mon, Nov 25, 2024 at 03:14:20PM +0100, Eelco Chaudron wrote: >> This patch installs a kernel return probe on ovs_dp_upcall() to record >> all successful and failed calls per CPU. The statistics are reported >> when the script exits, providing insights into the upcall behavior. >> >> This is an example output: >> >> # UPCALL STATISTICS: >> Total upcalls : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...] >> Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...] >> Failed upcalls : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] >> 11, EAGAIN : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] >> >> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> > > Thanks for this patch. Correlating upcall failures to kernel delays is > extremely useful. Some comments below. Thanks Adrian for the review. I’ll send out a v2 in a minute, but see my responses below. //Eelco >> --- >> utilities/automake.mk | 1 >> utilities/usdt-scripts/kernel_delay.py | 127 +++++++++++++++++++++++++++++++ >> utilities/usdt-scripts/kernel_delay.rst | 9 ++ >> 3 files changed, 136 insertions(+), 1 deletion(-) >> >> diff --git a/utilities/automake.mk b/utilities/automake.mk >> index acc1af4e0..22260b254 100644 >> --- a/utilities/automake.mk >> +++ b/utilities/automake.mk >> @@ -149,6 +149,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \ >> utilities/ovs-pipegen.py \ >> utilities/usdt-scripts/dpif_op_nl_monitor.py \ >> utilities/usdt-scripts/flow_reval_monitor.py \ >> + utilities/usdt-scripts/kernel_delay.py \ >> utilities/usdt-scripts/upcall_monitor.py \ >> utilities/usdt-scripts/upcall_cost.py >> >> diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py >> index de6b0c9de..a5aa1a3d4 100755 >> --- a/utilities/usdt-scripts/kernel_delay.py >> +++ b/utilities/usdt-scripts/kernel_delay.py >> @@ -39,6 +39,7 @@ >> # >> import argparse >> import datetime >> +import errno >> import os >> import pytz >> import psutil >> @@ -556,6 +557,30 @@ TRACEPOINT_PROBE(irq, softirq_exit) >> data->start_ns = 0; >> return 0; >> } >> + >> + >> +/* >> + * For measuring upcall statistics (per CPU). >> + */ >> +#define MAX_ERROR 255 > > Why this limit? The kernel seems to use 4095 in the PTR_ERR > infrastructure. Although not exposed to user applications, > kernel has some > 255 values. They don't seem networking related for > now but still. > > In order to reduce this arbitrary limit and avoid needless iteration > over an almost empty array, I'd make it a percpu_hash. It supports > "lookup_or_try_init" and "increment" so it's as easy to use from ebpf > and almost identically from python. Good point, I will use a per-CPU hash table instead. >> +BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 1); >> + >> +#if <INSTALL_OVS_DP_UPCALL_PROBE> >> +int kretprobe__ovs_dp_upcall(struct pt_regs *ctx) >> +{ > > When trying the script I was a bit confused that numbers didn't add up, > e.g: > > # UPCALL STATISTICS: > Total upcalls : 776234 [0: 204345, 1: 38773, 2: 38206, 3: > 38700, 4: 104040, 5: 77400, 6: 70290, 8: 69705, 9: 67410, 10: 14220, > 11: 53145] > Successfull upcalls : 268640 [0: 72829, 1: 14556, 2: 12876, 3: > 13192, 4: 34227, 5: 26480, 6: 24130, 8: 23791, 9: 23372, 10: 6164, 11: > 17023] > Failed upcalls : 508564 [0: 131516, 1: 25187, 2: 25330, 3: > 25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11: > 36122] > 11, EAGAIN : 509485 [0: 131516, 1: 26112, 2: 25330, 3: > 25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11: > 36122] > > Total of EAGAIN erros != total Failed and > Total Failed + Total Successfull != Total. > > I believe the reason is that events of this kind are still being > triggered while results are being processed. The problem seems to go > away if I add the following lines at the top of this probe: > > if (!capture_enabled__()) > return 0; ACK, I will fix this, including resetting the values between runs… >> + int ret = PT_REGS_RC(ctx); >> + >> + if (ret > 0) >> + ret = 0; >> + >> + ret = -ret; >> + if (ret > MAX_ERROR) >> + ret = MAX_ERROR; >> + >> + upcall_count.increment(ret); >> + return 0; >> +} >> +#endif /* For measuring upcall statistics/errors. */ >> """ >> >> >> @@ -900,6 +925,95 @@ def print_timestamp(msg): >> print(time_string) >> >> >> +# >> +# Get errno short string >> +# >> +def get_errno_short(err): >> + try: >> + return errno.errorcode[err] >> + except KeyError: >> + return "_unknown_" >> + >> + >> +# >> +# Format a bfd per cpu array entry per cpu (if the count is > 0) >> +# > > s/bfd/bpf? > Also, "per cpu" twice makes the sentense a bit difficult to read. Ack > >> +def format_per_cpu_array(cpu_array, key=None, skip_key=None): >> + per_cpu = "" >> + >> + if key is not None: > > "is not None" is redundant: "if key:" is equivalent. This and the one below are not the same for my code; ‘if key:’ behaves incorrectly: # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]): Total upcalls : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...] Successfull upcalls : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...] Failed upcalls : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...] 11, EAGAIN : 59526 [0: 7701, 2: 7580, 4: 4443, 6: 2745, 8: 3013,...] vs ‘if key is not None’: # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]): Total upcalls : 209023 [0: 16271, 2: 14830, 4: 11787, 6: 10025, 8: 4875,...] Successfull upcalls : 127677 [0: 6342, 2: 6250, 4: 5547, 6: 5464, 8: 4875,...] Failed upcalls : 81346 [0: 9929, 2: 8580, 4: 6240, 6: 4561, 10: 4602,...] 11, EAGAIN : 81346 [0: 9929, 2: 8580, 4: 6240, 6: 4561, 10: 4602,...] >> + total = cpu_array.sum(key).value >> + if total > 0: >> + for cpu, value in enumerate(cpu_array.getvalue(key)): >> + if value == 0: >> + continue >> + >> + per_cpu += " {}: {},".format(cpu, value) >> + else: >> + total = 0 >> + total_cpu = None >> + >> + for key in cpu_array.keys(): > > See below. > >> + # I've seen extrem odd hight values, to avoid those, exclude any > > typo? "extremely odd high values"? > >> + # index higher thant 1024 for now. > > What values you were getting? I'm not seing any weird value. > Besides, isn't the maximum set in the ebpf side to 255? > Plus, it's an array so its size is limitted and no key can exceed 256 > right? This was a while back, I cant remember if it was a specific kernel or BCC version. I would get indexes that look like integer overlows. However with the hash implementation this is not longer requered. >> + if (not (0 <= key.value <= 1024)) or (skip_key is not None and >> + skip_key.value == key.value): > > ditto: key.value in range(0, 1024) > s/skip_key is not None/skip_key/? > > OTOH, from include/linux/err.h: > #define MAX_ERRNO 4095 > > Maybe that value makes more sense although I really don't think we need > a check here. See above >> + continue > >> + >> + if total_cpu is None: > > if not total_cpu >> + total_cpu = [0] * len(cpu_array.getvalue(key)) >> + >> + for cpu, value in enumerate(cpu_array.getvalue(key)): >> + total_cpu[cpu] += value >> + total += value >> + >> + if total >= 0: >> + for cpu, value in enumerate(total_cpu): >> + if value == 0: >> + continue >> + >> + per_cpu += " {}: {},".format(cpu, value) >> + >> + return str(total), per_cpu.strip(", ") >> + >> + >> +# >> +# Display kernel upcall statistics >> +# >> +def display_upcall_results(): >> + upcalls = bpf.get_table("upcall_count") > > From bcc's reference: get_table() > Returns a table object. This is no longer used, as tables can now be > read as items from BPF. Eg: ```BPF[name]```. > > Not sure if this means the API will be deprecated... Changed it, let not wait for the deprecation notice ;) >> + have_upcalls = False >> + >> + for k in upcalls.keys(): > > Interestingly, "keys()" is not documented as an API. Nevertheless, it's > not necessary, since "table" iterates over keys (same as list or dict in > python). So it could be written as "for k in upcalls:". Good catch; changed here and below. >> + if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0: > > I think the pythonic way is "if k.value not in range(0, 255). Still how > would it be possible to read keys greater than 255 if it's being limited > at the ebpf side? See above. >> + continue >> + have_upcalls = True >> + break >> + >> + if not have_upcalls: >> + return >> + >> + print("\n\n# UPCALL STATISTICS:\n Total upcalls : {} [{}]".format( >> + *format_per_cpu_array(upcalls))) >> + > > When looking at theese statistics, users may take a while to figure out > what each number means, e.g: > > # UPCALL STATISTICS: > Total upcalls : 240933 [0: 33165, 2: 27495, 3: 18270, 4: > 15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345] > Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5: > 11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475] > Failed upcalls : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584, > 5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870] > 11, EAGAIN : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584, > 5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387 > > An option could be to describe the format in the header, e.g: > > # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]) > Total upcalls : 240933 [0: 33165, 2: 27495, 3: 18270, 4: > 15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345] > Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5: > 11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475] > Failed upcalls : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584, > 5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870] > 11, EAGAIN : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584, > 5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387 I went for the above, as the CPU set might not be the same in all lines, and I would like to condense this as much as possible (I was testing on 64 CPUs ;). > Another option could be to print them in a table: > > Result TOTAL CPU 0 CPU 2 CPU 3 CPU 4 CPU 5 > CPU 6 CPU 8 CPU 9 CPU 10 > -------------- --------- ------- ------- -------- ------- ------- > ------- ------- -------- -------- > TOTAL 240933 33165 27495 18270 15300 33885 > 4050 51348 33075 24345 > Successful 80538 11804 8382 5660 5716 11606 > 2845 13717 10333 10475 > Failed 161587 21361 19113 12610 9584 22279 > 2397 37631 22742 13870 > 11, EAGAIN 162528 21361 19113 12610 9584 22279 > 3345 37631 22742 1387 > > > > >> + for k in upcalls.keys(): > > nit: Same as above, "for k in upcalls:" Ack >> + error = k.value >> + total, per_cpu = format_per_cpu_array(upcalls, key=k) >> + >> + if error != 0 and total == "0": > > Maybe return the total as integer to avoid this strange looking '== "0"'? Ack, yes was too much C-minded here always returning a string… >> + continue >> + >> + if error == 0: >> + print(" Successfull upcalls : {} [{}]".format(total, per_cpu)) >> + >> + total, per_cpu = format_per_cpu_array(upcalls, skip_key=k) >> + print(" Failed upcalls : {} [{}]".format(total, per_cpu)) >> + else: >> + print(" {:3}, {:13}: {} [{}]".format(error, >> + get_errno_short(error), >> + total, per_cpu)) >> + >> + >> # >> # process_results() >> # >> @@ -1074,7 +1188,12 @@ def process_results(syscall_events=None, trigger_delta=None): >> indent, "TOTAL:", "", total_count, total_ns)) >> >> # >> - # Print events >> + # Print upcall statistics >> + # >> + display_upcall_results() >> + >> + # >> + # Print syscall events >> # >> lost_stack_traces = 0 >> if syscall_events: >> @@ -1194,6 +1313,9 @@ def main(): >> parser.add_argument("--skip-syscall-poll-events", >> help="Skip poll() syscalls with --syscall-events", >> action="store_true") >> + parser.add_argument("--skip-upcall-stats", >> + help="Skip the collection of upcall statistics", >> + action="store_true") >> parser.add_argument("--stack-trace-size", >> help="Number of unique stack traces that can be " >> "recorded, default 4096. 0 to disable", >> @@ -1298,6 +1420,9 @@ def main(): >> source = source.replace("<STACK_TRACE_ENABLED>", "true" >> if options.stack_trace_size > 0 else "false") >> >> + source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0" >> + if options.skip_upcall_stats else "1") >> + >> # >> # Handle start/stop probes >> # >> diff --git a/utilities/usdt-scripts/kernel_delay.rst b/utilities/usdt-scripts/kernel_delay.rst >> index 0f6f916a7..41620d760 100644 >> --- a/utilities/usdt-scripts/kernel_delay.rst >> +++ b/utilities/usdt-scripts/kernel_delay.rst >> @@ -109,6 +109,7 @@ followed by resource-specific data. Which are: >> - ``THREAD STOPPED STATISTICS`` >> - ``HARD IRQ STATISTICS`` >> - ``SOFT IRQ STATISTICS`` >> +- ``UPCALL STATISTICS`` >> >> The following sections will describe in detail what statistics they report. >> >> @@ -187,6 +188,14 @@ number of interrupts (``COUNT``), the total time spent in the interrupt >> handler (``TOTAL ns``), and the worst-case duration (``MAX ns``). >> >> >> +``UPCALL STATISTICS`` >> +~~~~~~~~~~~~~~~~~~~~~ >> +The ``UPCALL STATISTICS`` section provides information on the number of >> +upcalls sent by the kernel to userspace. If any upcalls fail to be sent, >> +the specific return codes are recorded. Statistics are presented both as >> +a total and on a per-CPU basis. >> + >> + >> The ``--syscall-events`` option >> ------------------------------- >> In addition to reporting global syscall statistics in ``SYSCALL_STATISTICS``, >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>
On Tue, Dec 17, 2024 at 03:30:51PM +0100, Eelco Chaudron wrote: > > > On 13 Dec 2024, at 13:56, Adrián Moreno wrote: > > > On Mon, Nov 25, 2024 at 03:14:20PM +0100, Eelco Chaudron wrote: > >> This patch installs a kernel return probe on ovs_dp_upcall() to record > >> all successful and failed calls per CPU. The statistics are reported > >> when the script exits, providing insights into the upcall behavior. > >> > >> This is an example output: > >> > >> # UPCALL STATISTICS: > >> Total upcalls : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...] > >> Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...] > >> Failed upcalls : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] > >> 11, EAGAIN : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] > >> > >> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> > > > > Thanks for this patch. Correlating upcall failures to kernel delays is > > extremely useful. Some comments below. > > Thanks Adrian for the review. I’ll send out a v2 in a minute, but see my responses below. > > //Eelco > > >> --- > >> utilities/automake.mk | 1 > >> utilities/usdt-scripts/kernel_delay.py | 127 +++++++++++++++++++++++++++++++ > >> utilities/usdt-scripts/kernel_delay.rst | 9 ++ > >> 3 files changed, 136 insertions(+), 1 deletion(-) > >> > >> diff --git a/utilities/automake.mk b/utilities/automake.mk > >> index acc1af4e0..22260b254 100644 > >> --- a/utilities/automake.mk > >> +++ b/utilities/automake.mk > >> @@ -149,6 +149,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \ > >> utilities/ovs-pipegen.py \ > >> utilities/usdt-scripts/dpif_op_nl_monitor.py \ > >> utilities/usdt-scripts/flow_reval_monitor.py \ > >> + utilities/usdt-scripts/kernel_delay.py \ > >> utilities/usdt-scripts/upcall_monitor.py \ > >> utilities/usdt-scripts/upcall_cost.py > >> > >> diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py > >> index de6b0c9de..a5aa1a3d4 100755 > >> --- a/utilities/usdt-scripts/kernel_delay.py > >> +++ b/utilities/usdt-scripts/kernel_delay.py > >> @@ -39,6 +39,7 @@ > >> # > >> import argparse > >> import datetime > >> +import errno > >> import os > >> import pytz > >> import psutil > >> @@ -556,6 +557,30 @@ TRACEPOINT_PROBE(irq, softirq_exit) > >> data->start_ns = 0; > >> return 0; > >> } > >> + > >> + > >> +/* > >> + * For measuring upcall statistics (per CPU). > >> + */ > >> +#define MAX_ERROR 255 > > > > Why this limit? The kernel seems to use 4095 in the PTR_ERR > > infrastructure. Although not exposed to user applications, > > kernel has some > 255 values. They don't seem networking related for > > now but still. > > > > In order to reduce this arbitrary limit and avoid needless iteration > > over an almost empty array, I'd make it a percpu_hash. It supports > > "lookup_or_try_init" and "increment" so it's as easy to use from ebpf > > and almost identically from python. > > Good point, I will use a per-CPU hash table instead. > > >> +BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 1); > >> + > >> +#if <INSTALL_OVS_DP_UPCALL_PROBE> > >> +int kretprobe__ovs_dp_upcall(struct pt_regs *ctx) > >> +{ > > > > When trying the script I was a bit confused that numbers didn't add up, > > e.g: > > > > # UPCALL STATISTICS: > > Total upcalls : 776234 [0: 204345, 1: 38773, 2: 38206, 3: > > 38700, 4: 104040, 5: 77400, 6: 70290, 8: 69705, 9: 67410, 10: 14220, > > 11: 53145] > > Successfull upcalls : 268640 [0: 72829, 1: 14556, 2: 12876, 3: > > 13192, 4: 34227, 5: 26480, 6: 24130, 8: 23791, 9: 23372, 10: 6164, 11: > > 17023] > > Failed upcalls : 508564 [0: 131516, 1: 25187, 2: 25330, 3: > > 25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11: > > 36122] > > 11, EAGAIN : 509485 [0: 131516, 1: 26112, 2: 25330, 3: > > 25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11: > > 36122] > > > > Total of EAGAIN erros != total Failed and > > Total Failed + Total Successfull != Total. > > > > I believe the reason is that events of this kind are still being > > triggered while results are being processed. The problem seems to go > > away if I add the following lines at the top of this probe: > > > > if (!capture_enabled__()) > > return 0; > > ACK, I will fix this, including resetting the values between runs… > > >> + int ret = PT_REGS_RC(ctx); > >> + > >> + if (ret > 0) > >> + ret = 0; > >> + > >> + ret = -ret; > >> + if (ret > MAX_ERROR) > >> + ret = MAX_ERROR; > >> + > >> + upcall_count.increment(ret); > >> + return 0; > >> +} > >> +#endif /* For measuring upcall statistics/errors. */ > >> """ > >> > >> > >> @@ -900,6 +925,95 @@ def print_timestamp(msg): > >> print(time_string) > >> > >> > >> +# > >> +# Get errno short string > >> +# > >> +def get_errno_short(err): > >> + try: > >> + return errno.errorcode[err] > >> + except KeyError: > >> + return "_unknown_" > >> + > >> + > >> +# > >> +# Format a bfd per cpu array entry per cpu (if the count is > 0) > >> +# > > > > s/bfd/bpf? > > Also, "per cpu" twice makes the sentense a bit difficult to read. > > Ack > > > > >> +def format_per_cpu_array(cpu_array, key=None, skip_key=None): > >> + per_cpu = "" > >> + > >> + if key is not None: > > > > "is not None" is redundant: "if key:" is equivalent. > > This and the one below are not the same for my code; ‘if key:’ behaves incorrectly: > Ugh, you're right. If key is zero the check fails. > # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]): > Total upcalls : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...] > Successfull upcalls : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...] > Failed upcalls : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...] > 11, EAGAIN : 59526 [0: 7701, 2: 7580, 4: 4443, 6: 2745, 8: 3013,...] > > vs ‘if key is not None’: > > # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]): > Total upcalls : 209023 [0: 16271, 2: 14830, 4: 11787, 6: 10025, 8: 4875,...] > Successfull upcalls : 127677 [0: 6342, 2: 6250, 4: 5547, 6: 5464, 8: 4875,...] > Failed upcalls : 81346 [0: 9929, 2: 8580, 4: 6240, 6: 4561, 10: 4602,...] > 11, EAGAIN : 81346 [0: 9929, 2: 8580, 4: 6240, 6: 4561, 10: 4602,...] > > > >> + total = cpu_array.sum(key).value > >> + if total > 0: > >> + for cpu, value in enumerate(cpu_array.getvalue(key)): > >> + if value == 0: > >> + continue > >> + > >> + per_cpu += " {}: {},".format(cpu, value) > >> + else: > >> + total = 0 > >> + total_cpu = None > >> + > >> + for key in cpu_array.keys(): > > > > See below. > > > >> + # I've seen extrem odd hight values, to avoid those, exclude any > > > > typo? "extremely odd high values"? > > > >> + # index higher thant 1024 for now. > > > > What values you were getting? I'm not seing any weird value. > > Besides, isn't the maximum set in the ebpf side to 255? > > Plus, it's an array so its size is limitted and no key can exceed 256 > > right? > > This was a while back, I cant remember if it was a specific kernel or BCC version. > I would get indexes that look like integer overlows. However with the hash implementation this is not longer requered. > > >> + if (not (0 <= key.value <= 1024)) or (skip_key is not None and > >> + skip_key.value == key.value): > > > > ditto: key.value in range(0, 1024) > > s/skip_key is not None/skip_key/? > > > > OTOH, from include/linux/err.h: > > #define MAX_ERRNO 4095 > > > > Maybe that value makes more sense although I really don't think we need > > a check here. > > See above > >> + continue > > > >> + > >> + if total_cpu is None: > > > > if not total_cpu > >> + total_cpu = [0] * len(cpu_array.getvalue(key)) > >> + > >> + for cpu, value in enumerate(cpu_array.getvalue(key)): > >> + total_cpu[cpu] += value > >> + total += value > >> + > >> + if total >= 0: > >> + for cpu, value in enumerate(total_cpu): > >> + if value == 0: > >> + continue > >> + > >> + per_cpu += " {}: {},".format(cpu, value) > >> + > >> + return str(total), per_cpu.strip(", ") > >> + > >> + > >> +# > >> +# Display kernel upcall statistics > >> +# > >> +def display_upcall_results(): > >> + upcalls = bpf.get_table("upcall_count") > > > > From bcc's reference: get_table() > > Returns a table object. This is no longer used, as tables can now be > > read as items from BPF. Eg: ```BPF[name]```. > > > > Not sure if this means the API will be deprecated... > > Changed it, let not wait for the deprecation notice ;) > > >> + have_upcalls = False > >> + > >> + for k in upcalls.keys(): > > > > Interestingly, "keys()" is not documented as an API. Nevertheless, it's > > not necessary, since "table" iterates over keys (same as list or dict in > > python). So it could be written as "for k in upcalls:". > > Good catch; changed here and below. > > >> + if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0: > > > > I think the pythonic way is "if k.value not in range(0, 255). Still how > > would it be possible to read keys greater than 255 if it's being limited > > at the ebpf side? > > See above. > > >> + continue > >> + have_upcalls = True > >> + break > >> + > >> + if not have_upcalls: > >> + return > >> + > >> + print("\n\n# UPCALL STATISTICS:\n Total upcalls : {} [{}]".format( > >> + *format_per_cpu_array(upcalls))) > >> + > > > > When looking at theese statistics, users may take a while to figure out > > what each number means, e.g: > > > > # UPCALL STATISTICS: > > Total upcalls : 240933 [0: 33165, 2: 27495, 3: 18270, 4: > > 15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345] > > Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5: > > 11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475] > > Failed upcalls : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584, > > 5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870] > > 11, EAGAIN : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584, > > 5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387 > > > > An option could be to describe the format in the header, e.g: > > > > # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]) > > Total upcalls : 240933 [0: 33165, 2: 27495, 3: 18270, 4: > > 15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345] > > Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5: > > 11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475] > > Failed upcalls : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584, > > 5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870] > > 11, EAGAIN : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584, > > 5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387 > > I went for the above, as the CPU set might not be the same in all lines, and I would like to condense this as much as possible (I was testing on 64 CPUs ;). > > > Another option could be to print them in a table: > > > > Result TOTAL CPU 0 CPU 2 CPU 3 CPU 4 CPU 5 > > CPU 6 CPU 8 CPU 9 CPU 10 > > -------------- --------- ------- ------- -------- ------- ------- > > ------- ------- -------- -------- > > TOTAL 240933 33165 27495 18270 15300 33885 > > 4050 51348 33075 24345 > > Successful 80538 11804 8382 5660 5716 11606 > > 2845 13717 10333 10475 > > Failed 161587 21361 19113 12610 9584 22279 > > 2397 37631 22742 13870 > > 11, EAGAIN 162528 21361 19113 12610 9584 22279 > > 3345 37631 22742 1387 > > > > > > > > > >> + for k in upcalls.keys(): > > > > nit: Same as above, "for k in upcalls:" > > Ack > > >> + error = k.value > >> + total, per_cpu = format_per_cpu_array(upcalls, key=k) > >> + > >> + if error != 0 and total == "0": > > > > Maybe return the total as integer to avoid this strange looking '== "0"'? > > Ack, yes was too much C-minded here always returning a string… > > >> + continue > >> + > >> + if error == 0: > >> + print(" Successfull upcalls : {} [{}]".format(total, per_cpu)) > >> + > >> + total, per_cpu = format_per_cpu_array(upcalls, skip_key=k) > >> + print(" Failed upcalls : {} [{}]".format(total, per_cpu)) > >> + else: > >> + print(" {:3}, {:13}: {} [{}]".format(error, > >> + get_errno_short(error), > >> + total, per_cpu)) > >> + > >> + > >> # > >> # process_results() > >> # > >> @@ -1074,7 +1188,12 @@ def process_results(syscall_events=None, trigger_delta=None): > >> indent, "TOTAL:", "", total_count, total_ns)) > >> > >> # > >> - # Print events > >> + # Print upcall statistics > >> + # > >> + display_upcall_results() > >> + > >> + # > >> + # Print syscall events > >> # > >> lost_stack_traces = 0 > >> if syscall_events: > >> @@ -1194,6 +1313,9 @@ def main(): > >> parser.add_argument("--skip-syscall-poll-events", > >> help="Skip poll() syscalls with --syscall-events", > >> action="store_true") > >> + parser.add_argument("--skip-upcall-stats", > >> + help="Skip the collection of upcall statistics", > >> + action="store_true") > >> parser.add_argument("--stack-trace-size", > >> help="Number of unique stack traces that can be " > >> "recorded, default 4096. 0 to disable", > >> @@ -1298,6 +1420,9 @@ def main(): > >> source = source.replace("<STACK_TRACE_ENABLED>", "true" > >> if options.stack_trace_size > 0 else "false") > >> > >> + source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0" > >> + if options.skip_upcall_stats else "1") > >> + > >> # > >> # Handle start/stop probes > >> # > >> diff --git a/utilities/usdt-scripts/kernel_delay.rst b/utilities/usdt-scripts/kernel_delay.rst > >> index 0f6f916a7..41620d760 100644 > >> --- a/utilities/usdt-scripts/kernel_delay.rst > >> +++ b/utilities/usdt-scripts/kernel_delay.rst > >> @@ -109,6 +109,7 @@ followed by resource-specific data. Which are: > >> - ``THREAD STOPPED STATISTICS`` > >> - ``HARD IRQ STATISTICS`` > >> - ``SOFT IRQ STATISTICS`` > >> +- ``UPCALL STATISTICS`` > >> > >> The following sections will describe in detail what statistics they report. > >> > >> @@ -187,6 +188,14 @@ number of interrupts (``COUNT``), the total time spent in the interrupt > >> handler (``TOTAL ns``), and the worst-case duration (``MAX ns``). > >> > >> > >> +``UPCALL STATISTICS`` > >> +~~~~~~~~~~~~~~~~~~~~~ > >> +The ``UPCALL STATISTICS`` section provides information on the number of > >> +upcalls sent by the kernel to userspace. If any upcalls fail to be sent, > >> +the specific return codes are recorded. Statistics are presented both as > >> +a total and on a per-CPU basis. > >> + > >> + > >> The ``--syscall-events`` option > >> ------------------------------- > >> In addition to reporting global syscall statistics in ``SYSCALL_STATISTICS``, > >> > >> _______________________________________________ > >> dev mailing list > >> dev@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >> >
diff --git a/utilities/automake.mk b/utilities/automake.mk index acc1af4e0..22260b254 100644 --- a/utilities/automake.mk +++ b/utilities/automake.mk @@ -149,6 +149,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \ utilities/ovs-pipegen.py \ utilities/usdt-scripts/dpif_op_nl_monitor.py \ utilities/usdt-scripts/flow_reval_monitor.py \ + utilities/usdt-scripts/kernel_delay.py \ utilities/usdt-scripts/upcall_monitor.py \ utilities/usdt-scripts/upcall_cost.py diff --git a/utilities/usdt-scripts/kernel_delay.py b/utilities/usdt-scripts/kernel_delay.py index de6b0c9de..a5aa1a3d4 100755 --- a/utilities/usdt-scripts/kernel_delay.py +++ b/utilities/usdt-scripts/kernel_delay.py @@ -39,6 +39,7 @@ # import argparse import datetime +import errno import os import pytz import psutil @@ -556,6 +557,30 @@ TRACEPOINT_PROBE(irq, softirq_exit) data->start_ns = 0; return 0; } + + +/* + * For measuring upcall statistics (per CPU). + */ +#define MAX_ERROR 255 +BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 1); + +#if <INSTALL_OVS_DP_UPCALL_PROBE> +int kretprobe__ovs_dp_upcall(struct pt_regs *ctx) +{ + int ret = PT_REGS_RC(ctx); + + if (ret > 0) + ret = 0; + + ret = -ret; + if (ret > MAX_ERROR) + ret = MAX_ERROR; + + upcall_count.increment(ret); + return 0; +} +#endif /* For measuring upcall statistics/errors. */ """ @@ -900,6 +925,95 @@ def print_timestamp(msg): print(time_string) +# +# Get errno short string +# +def get_errno_short(err): + try: + return errno.errorcode[err] + except KeyError: + return "_unknown_" + + +# +# Format a bfd per cpu array entry per cpu (if the count is > 0) +# +def format_per_cpu_array(cpu_array, key=None, skip_key=None): + per_cpu = "" + + if key is not None: + total = cpu_array.sum(key).value + if total > 0: + for cpu, value in enumerate(cpu_array.getvalue(key)): + if value == 0: + continue + + per_cpu += " {}: {},".format(cpu, value) + else: + total = 0 + total_cpu = None + + for key in cpu_array.keys(): + # I've seen extrem odd hight values, to avoid those, exclude any + # index higher thant 1024 for now. + if (not (0 <= key.value <= 1024)) or (skip_key is not None and + skip_key.value == key.value): + continue + + if total_cpu is None: + total_cpu = [0] * len(cpu_array.getvalue(key)) + + for cpu, value in enumerate(cpu_array.getvalue(key)): + total_cpu[cpu] += value + total += value + + if total >= 0: + for cpu, value in enumerate(total_cpu): + if value == 0: + continue + + per_cpu += " {}: {},".format(cpu, value) + + return str(total), per_cpu.strip(", ") + + +# +# Display kernel upcall statistics +# +def display_upcall_results(): + upcalls = bpf.get_table("upcall_count") + have_upcalls = False + + for k in upcalls.keys(): + if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0: + continue + have_upcalls = True + break + + if not have_upcalls: + return + + print("\n\n# UPCALL STATISTICS:\n Total upcalls : {} [{}]".format( + *format_per_cpu_array(upcalls))) + + for k in upcalls.keys(): + error = k.value + total, per_cpu = format_per_cpu_array(upcalls, key=k) + + if error != 0 and total == "0": + continue + + if error == 0: + print(" Successfull upcalls : {} [{}]".format(total, per_cpu)) + + total, per_cpu = format_per_cpu_array(upcalls, skip_key=k) + print(" Failed upcalls : {} [{}]".format(total, per_cpu)) + else: + print(" {:3}, {:13}: {} [{}]".format(error, + get_errno_short(error), + total, per_cpu)) + + # # process_results() # @@ -1074,7 +1188,12 @@ def process_results(syscall_events=None, trigger_delta=None): indent, "TOTAL:", "", total_count, total_ns)) # - # Print events + # Print upcall statistics + # + display_upcall_results() + + # + # Print syscall events # lost_stack_traces = 0 if syscall_events: @@ -1194,6 +1313,9 @@ def main(): parser.add_argument("--skip-syscall-poll-events", help="Skip poll() syscalls with --syscall-events", action="store_true") + parser.add_argument("--skip-upcall-stats", + help="Skip the collection of upcall statistics", + action="store_true") parser.add_argument("--stack-trace-size", help="Number of unique stack traces that can be " "recorded, default 4096. 0 to disable", @@ -1298,6 +1420,9 @@ def main(): source = source.replace("<STACK_TRACE_ENABLED>", "true" if options.stack_trace_size > 0 else "false") + source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0" + if options.skip_upcall_stats else "1") + # # Handle start/stop probes # diff --git a/utilities/usdt-scripts/kernel_delay.rst b/utilities/usdt-scripts/kernel_delay.rst index 0f6f916a7..41620d760 100644 --- a/utilities/usdt-scripts/kernel_delay.rst +++ b/utilities/usdt-scripts/kernel_delay.rst @@ -109,6 +109,7 @@ followed by resource-specific data. Which are: - ``THREAD STOPPED STATISTICS`` - ``HARD IRQ STATISTICS`` - ``SOFT IRQ STATISTICS`` +- ``UPCALL STATISTICS`` The following sections will describe in detail what statistics they report. @@ -187,6 +188,14 @@ number of interrupts (``COUNT``), the total time spent in the interrupt handler (``TOTAL ns``), and the worst-case duration (``MAX ns``). +``UPCALL STATISTICS`` +~~~~~~~~~~~~~~~~~~~~~ +The ``UPCALL STATISTICS`` section provides information on the number of +upcalls sent by the kernel to userspace. If any upcalls fail to be sent, +the specific return codes are recorded. Statistics are presented both as +a total and on a per-CPU basis. + + The ``--syscall-events`` option ------------------------------- In addition to reporting global syscall statistics in ``SYSCALL_STATISTICS``,
This patch installs a kernel return probe on ovs_dp_upcall() to record all successful and failed calls per CPU. The statistics are reported when the script exits, providing insights into the upcall behavior. This is an example output: # UPCALL STATISTICS: Total upcalls : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...] Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...] Failed upcalls : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] 11, EAGAIN : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] Signed-off-by: Eelco Chaudron <echaudro@redhat.com> --- utilities/automake.mk | 1 utilities/usdt-scripts/kernel_delay.py | 127 +++++++++++++++++++++++++++++++ utilities/usdt-scripts/kernel_delay.rst | 9 ++ 3 files changed, 136 insertions(+), 1 deletion(-)