Message ID | 1515656386-21036-2-git-send-email-jan.scheurich@ericsson.com |
---|---|
State | Superseded |
Delegated to: | Ian Stokes |
Headers | show |
Series | Refactor PMD stats and cycle counting | expand |
On 11.01.2018 10:39, Jan Scheurich wrote: > Add module dpif-netdev-perf to host all PMD performance-related > data structures and functions in dpif-netdev. Refactor the PMD > stats handling in dpif-netdev and delegate whatever possible into > the new module, using clean interfaces to shield dpif-netdev from > the implementation details. Accordingly, the all PMD statistics > members are moved from the main struct dp_netdev_pmd_thread into > a dedicated member of type struct pmd_perf_stats. > > Include Darrel's prior refactoring of PMD stats contained in > [PATCH v5,2/3] dpif-netdev: Refactor some pmd stats: > > 1. The cycles per packet counts are now based on packets > received rather than packet passes through the datapath. > > 2. Packet counters are now kept for packets received and > packets recirculated. These are kept as separate counters for > maintainability reasons. The cost of incrementing these counters > is negligible. These new counters are also displayed to the user. > > 3. A display statistic is added for the average number of > datapath passes per packet. This should be useful for user > debugging and understanding of packet processing. > > 4. The user visible 'miss' counter is used for successful upcalls, > rather than the sum of sucessful and unsuccessful upcalls. Hence, > this becomes what user historically understands by OVS 'miss upcall'. > The user display is annotated to make this clear as well. > > 5. The user visible 'lost' counter remains as failed upcalls, but > is annotated to make it clear what the meaning is. > > 6. The enum pmd_stat_type is annotated to make the usage of the > stats counters clear. > > 7. The subtable lookup stats is renamed to make it clear that it > relates to masked lookups. > > 8. The PMD stats test is updated to handle the new user stats of > packets received, packets recirculated and average number of datapath > passes per packet. > > On top of that introduce a "-pmd <core>" option to the PMD info > commands to filter the output for a single PMD. > > Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> > Co-authored-by: Darrell Ball <dlu998@gmail.com> > Signed-off-by: Darrell Ball <dlu998@gmail.com> > --- > lib/automake.mk | 2 + > lib/dpif-netdev-perf.c | 67 +++++++++ > lib/dpif-netdev-perf.h | 151 +++++++++++++++++++++ > lib/dpif-netdev.c | 360 +++++++++++++++++++++---------------------------- > tests/pmd.at | 22 +-- > 5 files changed, 384 insertions(+), 218 deletions(-) > create mode 100644 lib/dpif-netdev-perf.c > create mode 100644 lib/dpif-netdev-perf.h > > diff --git a/lib/automake.mk b/lib/automake.mk > index 4b38a11..159319f 100644 > --- a/lib/automake.mk > +++ b/lib/automake.mk > @@ -80,6 +80,8 @@ lib_libopenvswitch_la_SOURCES = \ > lib/dpdk.h \ > lib/dpif-netdev.c \ > lib/dpif-netdev.h \ > + lib/dpif-netdev-perf.c \ > + lib/dpif-netdev-perf.h \ > lib/dpif-provider.h \ > lib/dpif.c \ > lib/dpif.h \ > diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c > new file mode 100644 > index 0000000..f853fd8 > --- /dev/null > +++ b/lib/dpif-netdev-perf.c > @@ -0,0 +1,67 @@ > +/* > + * Copyright (c) 2017 Ericsson AB. > + * > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#include <config.h> > + > +#ifdef DPDK_NETDEV > +#include <rte_cycles.h> > +#endif > + > +#include "openvswitch/dynamic-string.h" > +#include "openvswitch/vlog.h" > +#include "dpif-netdev-perf.h" > +#include "timeval.h" > + > +VLOG_DEFINE_THIS_MODULE(pmd_perf); > + > +void > +pmd_perf_stats_init(struct pmd_perf_stats *s) > +{ > + memset(s, 0 , sizeof(*s)); > + s->start_ms = time_msec(); > +} > + > +void > +pmd_perf_read_counters(struct pmd_perf_stats *s, > + uint64_t stats[PMD_N_STATS]) > +{ > + uint64_t val; > + > + /* These loops subtracts reference values (.zero[*]) from the counters. > + * Since loads and stores are relaxed, it might be possible for a .zero[*] > + * value to be more recent than the current value we're reading from the > + * counter. This is not a big problem, since these numbers are not > + * supposed to be 100% accurate, but we should at least make sure that > + * the result is not negative. */ > + for (int i = 0; i < PMD_N_STATS; i++) { > + atomic_read_relaxed(&s->counters.n[i], &val); > + if (val > s->counters.zero[i]) { > + stats[i] = val - s->counters.zero[i]; You're reading the 's->counters.zero[i]' twice. This makes checking useless, and the resulted value could be negative. Of course, compiler will likely read it only once, but we can never be sure. > + } else { > + stats[i] = 0; > + } > + } > +} > + > +void > +pmd_perf_stats_clear(struct pmd_perf_stats *s) > +{ > + s->start_ms = 0; /* Marks start of clearing. */ > + for (int i = 0; i < PMD_N_STATS; i++) { > + atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]); > + } > + s->start_ms = time_msec(); /* Clearing finished. */ > +} > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h > new file mode 100644 > index 0000000..41f4e85 > --- /dev/null > +++ b/lib/dpif-netdev-perf.h > @@ -0,0 +1,151 @@ > +/* > + * Copyright (c) 2017 Ericsson AB. > + * > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#ifndef DPIF_NETDEV_PERF_H > +#define DPIF_NETDEV_PERF_H 1 > + > +#include <stdbool.h> > +#include <stddef.h> > +#include <stdint.h> > +#include <string.h> > +#include <math.h> > + > +#include "openvswitch/vlog.h" > +#include "ovs-atomic.h" > +#include "timeval.h" > +#include "unixctl.h" > +#include "util.h" > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +/* This module encapsulates data structures and functions to maintain PMD > + * performance metrics such as packet counters, execution cycles. It > + * provides a clean API for dpif-netdev to initialize, update and read and > + * reset these metrics. > + */ > + > +/* Set of counter types maintained in pmd_perf_stats. */ > + > +enum pmd_stat_type { > + PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ > + PMD_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ > + PMD_STAT_MISS, /* Packets that did not match and upcall was ok. */ > + PMD_STAT_LOST, /* Packets that did not match and upcall failed. */ > + /* The above statistics account for the total > + * number of packet passes through the datapath > + * pipeline and should not be overlapping with each > + * other. */ > + PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table > + hits. Each MASKED_HIT hit will have >= 1 > + MASKED_LOOKUP(s). */ > + PMD_STAT_RECV, /* Packets entering the datapath pipeline from an > + * interface. */ > + PMD_STAT_RECIRC, /* Packets reentering the datapath pipeline due to > + * recirculation. */ > + PMD_STAT_SENT_PKTS, /* Packets that have been sent. */ > + PMD_STAT_SENT_BATCHES, /* Number of batches sent. */ > + PMD_CYCLES_POLL_IDLE, /* Cycles spent unsuccessful polling. */ > + PMD_CYCLES_POLL_BUSY, /* Cycles spent successfully polling and > + * processing polled packets. */ > + PMD_CYCLES_OVERHEAD, /* Cycles spent for other tasks. */ > + PMD_CYCLES_ITER_IDLE, /* Cycles spent in idle iterations. */ > + PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */ > + PMD_N_STATS > +}; > + > +/* Array of PMD counters indexed by enum pmd_stat_type. > + * The n[] array contains the actual counter values since initialization > + * of the PMD. Counters are atomically updated from the PMD but are > + * read and cleared also from other processes. To clear the counters at > + * PMD run-time, the current counter values are copied over to the zero[] > + * array. To read counters we subtract zero[] value from n[]. */ > + > +struct pmd_counters { > + atomic_uint64_t n[PMD_N_STATS]; /* Value since _init(). */ > + uint64_t zero[PMD_N_STATS]; /* Value at last _clear(). */ > +}; > + > +/* Container for all performance metrics of a PMD. > + * Part of the struct dp_netdev_pmd_thread. */ > + > +struct pmd_perf_stats { > + /* Start of the current performance measurement period. > + * While a new measurement period is being started with > + * ovs-appctl dpif-netdev/pmd-stats-clear, start_ms is set > + * to zero to lock out operations from accessing inconsistent > + * data. */ > + uint64_t start_ms; See my comments for v6. > + /* Start of the current PMD iteration in TSC cycles.*/ > + uint64_t last_tsc; > + /* Set of PMD counters with their zero offsets. */ > + struct pmd_counters counters; > +}; > + > +void pmd_perf_stats_init(struct pmd_perf_stats *s); > +void pmd_perf_stats_clear(struct pmd_perf_stats *s); > +void pmd_perf_read_counters(struct pmd_perf_stats *s, > + uint64_t stats[PMD_N_STATS]); > + > +/* PMD performance counters are updated lock-less. For real PMDs > + * they are only updated from the PMD thread itself. In the case of the > + * NON-PMD they might be updated from multiple threads, but we can live > + * with losing a rare update as 100% accuracy is not required. > + * However, as counters are read for display from outside the PMD thread > + * with e.g. pmd-stats-show, we make sure that the 64-bit read and store > + * operations are atomic also on 32-bit systems so that readers cannot > + * not read garbage. On 64-bit systems this incurs no overhead. */ > + > +static inline void > +pmd_perf_update_counter(struct pmd_perf_stats *s, > + enum pmd_stat_type counter, int delta) > +{ > + uint64_t tmp; > + atomic_read_relaxed(&s->counters.n[counter], &tmp); > + tmp += delta; > + atomic_store_relaxed(&s->counters.n[counter], tmp); > +} > + > +static inline void > +pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc) > +{ > + if (OVS_UNLIKELY(s->start_ms == 0)) { > + /* Stats not yet fully initialized. */ > + return; > + } > + s->last_tsc = now_tsc; > +} > + > +static inline void > +pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc, > + int packets) > +{ > + uint64_t cycles = now_tsc - s->last_tsc; > + > + /* No need for atomic updates as only invoked within PMD thread. */ > + if (packets > 0) { > + s->counters.n[PMD_CYCLES_ITER_BUSY] += cycles; > + } else { > + s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles; > + } > +} > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* DPIF_NETDEV_PERF_H */ > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index c7d157a..538d5ce 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -44,6 +44,7 @@ > #include "csum.h" > #include "dp-packet.h" > #include "dpif.h" > +#include "dpif-netdev-perf.h" > #include "dpif-provider.h" > #include "dummy.h" > #include "fat-rwlock.h" > @@ -331,25 +332,6 @@ static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp, > odp_port_t) > OVS_REQUIRES(dp->port_mutex); > > -enum dp_stat_type { > - DP_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ > - DP_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ > - DP_STAT_MISS, /* Packets that did not match. */ > - DP_STAT_LOST, /* Packets not passed up to the client. */ > - DP_STAT_LOOKUP_HIT, /* Number of subtable lookups for flow table > - hits */ > - DP_STAT_SENT_PKTS, /* Packets that has been sent. */ > - DP_STAT_SENT_BATCHES, /* Number of batches sent. */ > - DP_N_STATS > -}; > - > -enum pmd_cycles_counter_type { > - PMD_CYCLES_IDLE, /* Cycles spent idle or unsuccessful polling */ > - PMD_CYCLES_PROCESSING, /* Cycles spent successfully polling and > - * processing polled packets */ > - PMD_N_CYCLES > -}; > - > enum rxq_cycles_counter_type { > RXQ_CYCLES_PROC_CURR, /* Cycles spent successfully polling and > processing packets during the current > @@ -499,21 +481,6 @@ struct dp_netdev_actions *dp_netdev_flow_get_actions( > const struct dp_netdev_flow *); > static void dp_netdev_actions_free(struct dp_netdev_actions *); > > -/* Contained by struct dp_netdev_pmd_thread's 'stats' member. */ > -struct dp_netdev_pmd_stats { > - /* Indexed by DP_STAT_*. */ > - atomic_ullong n[DP_N_STATS]; > -}; > - > -/* Contained by struct dp_netdev_pmd_thread's 'cycle' member. */ > -struct dp_netdev_pmd_cycles { > - /* Indexed by PMD_CYCLES_*. */ > - atomic_ullong n[PMD_N_CYCLES]; > -}; > - > -static void dp_netdev_count_packet(struct dp_netdev_pmd_thread *, > - enum dp_stat_type type, int cnt); > - > struct polled_queue { > struct dp_netdev_rxq *rxq; > odp_port_t port_no; > @@ -595,12 +562,6 @@ struct dp_netdev_pmd_thread { > are stored for each polled rxq. */ > long long int rxq_next_cycle_store; > > - /* Statistics. */ > - struct dp_netdev_pmd_stats stats; > - > - /* Cycles counters */ > - struct dp_netdev_pmd_cycles cycles; > - > /* Current context of the PMD thread. */ > struct dp_netdev_pmd_thread_ctx ctx; > > @@ -638,12 +599,8 @@ struct dp_netdev_pmd_thread { > struct hmap tnl_port_cache; > struct hmap send_port_cache; > > - /* Only a pmd thread can write on its own 'cycles' and 'stats'. > - * The main thread keeps 'stats_zero' and 'cycles_zero' as base > - * values and subtracts them from 'stats' and 'cycles' before > - * reporting to the user */ > - unsigned long long stats_zero[DP_N_STATS]; > - uint64_t cycles_zero[PMD_N_CYCLES]; > + /* Keep track of detailed PMD performance statistics. */ > + struct pmd_perf_stats perf_stats; > > /* Set to true if the pmd thread needs to be reloaded. */ > bool need_reload; > @@ -833,47 +790,10 @@ enum pmd_info_type { > }; > > static void > -pmd_info_show_stats(struct ds *reply, > - struct dp_netdev_pmd_thread *pmd, > - unsigned long long stats[DP_N_STATS], > - uint64_t cycles[PMD_N_CYCLES]) > +format_pmd_thread(struct ds *reply, struct dp_netdev_pmd_thread *pmd) > { > - unsigned long long total_packets; > - uint64_t total_cycles = 0; > - double lookups_per_hit = 0, packets_per_batch = 0; > - int i; > - > - /* These loops subtracts reference values ('*_zero') from the counters. > - * Since loads and stores are relaxed, it might be possible for a '*_zero' > - * value to be more recent than the current value we're reading from the > - * counter. This is not a big problem, since these numbers are not > - * supposed to be too accurate, but we should at least make sure that > - * the result is not negative. */ > - for (i = 0; i < DP_N_STATS; i++) { > - if (stats[i] > pmd->stats_zero[i]) { > - stats[i] -= pmd->stats_zero[i]; > - } else { > - stats[i] = 0; > - } > - } > - > - /* Sum of all the matched and not matched packets gives the total. */ > - total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT] > - + stats[DP_STAT_MISS]; > - > - for (i = 0; i < PMD_N_CYCLES; i++) { > - if (cycles[i] > pmd->cycles_zero[i]) { > - cycles[i] -= pmd->cycles_zero[i]; > - } else { > - cycles[i] = 0; > - } > - > - total_cycles += cycles[i]; > - } > - > ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID) > ? "main thread" : "pmd thread"); > - > if (pmd->numa_id != OVS_NUMA_UNSPEC) { > ds_put_format(reply, " numa_id %d", pmd->numa_id); > } > @@ -881,23 +801,52 @@ pmd_info_show_stats(struct ds *reply, > ds_put_format(reply, " core_id %u", pmd->core_id); > } > ds_put_cstr(reply, ":\n"); > +} > + > +static void > +pmd_info_show_stats(struct ds *reply, > + struct dp_netdev_pmd_thread *pmd) > +{ > + uint64_t stats[PMD_N_STATS]; > + uint64_t total_cycles, total_packets; > + double passes_per_pkt = 0; > + double lookups_per_hit = 0; > + double packets_per_batch = 0; > + > + pmd_perf_read_counters(&pmd->perf_stats, stats); > + total_cycles = stats[PMD_CYCLES_ITER_IDLE] > + + stats[PMD_CYCLES_ITER_BUSY]; > + total_packets = stats[PMD_STAT_RECV]; > + > + format_pmd_thread(reply, pmd); > > - if (stats[DP_STAT_MASKED_HIT] > 0) { > - lookups_per_hit = stats[DP_STAT_LOOKUP_HIT] > - / (double) stats[DP_STAT_MASKED_HIT]; > + if (total_packets > 0) { > + passes_per_pkt = (total_packets + stats[PMD_STAT_RECIRC]) > + / (double) total_packets; > } > - if (stats[DP_STAT_SENT_BATCHES] > 0) { > - packets_per_batch = stats[DP_STAT_SENT_PKTS] > - / (double) stats[DP_STAT_SENT_BATCHES]; > + if (stats[PMD_STAT_MASKED_HIT] > 0) { > + lookups_per_hit = stats[PMD_STAT_MASKED_LOOKUP] > + / (double) stats[PMD_STAT_MASKED_HIT]; > + } > + if (stats[PMD_STAT_SENT_BATCHES] > 0) { > + packets_per_batch = stats[PMD_STAT_SENT_PKTS] > + / (double) stats[PMD_STAT_SENT_BATCHES]; > } > > ds_put_format(reply, > - "\temc hits:%llu\n\tmegaflow hits:%llu\n" > - "\tavg. subtable lookups per hit:%.2f\n" > - "\tmiss:%llu\n\tlost:%llu\n" > - "\tavg. packets per output batch: %.2f\n", > - stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT], > - lookups_per_hit, stats[DP_STAT_MISS], stats[DP_STAT_LOST], > + "\tpackets received:%"PRIu64"\n" > + "\tpacket recirculations:%"PRIu64"\n" > + "\tavg. datapath passes per packet:%.02f\n" > + "\temc hits:%"PRIu64"\n" > + "\tmegaflow hits:%"PRIu64"\n" > + "\tavg. subtable lookups per megaflow hit:%.02f\n" > + "\tmiss with success upcall:%"PRIu64"\n" > + "\tmiss with failed upcall:%"PRIu64"\n" > + "\tavg. packets per output batch:%.02f\n", > + total_packets, stats[PMD_STAT_RECIRC], > + passes_per_pkt, stats[PMD_STAT_EXACT_HIT], > + stats[PMD_STAT_MASKED_HIT], lookups_per_hit, > + stats[PMD_STAT_MISS], stats[PMD_STAT_LOST], > packets_per_batch); Commented for v6. > > if (total_cycles == 0) { > @@ -907,46 +856,25 @@ pmd_info_show_stats(struct ds *reply, > ds_put_format(reply, > "\tidle cycles:%"PRIu64" (%.02f%%)\n" > "\tprocessing cycles:%"PRIu64" (%.02f%%)\n", > - cycles[PMD_CYCLES_IDLE], > - cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100, > - cycles[PMD_CYCLES_PROCESSING], > - cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100); > + stats[PMD_CYCLES_ITER_IDLE], > + stats[PMD_CYCLES_ITER_IDLE] / (double) total_cycles * 100, > + stats[PMD_CYCLES_ITER_BUSY], > + stats[PMD_CYCLES_ITER_BUSY] / (double) total_cycles * 100); > > if (total_packets == 0) { > return; > } > > ds_put_format(reply, > - "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n", > - total_cycles / (double)total_packets, > + "\tavg cycles per packet:%.02f (%"PRIu64"/%"PRIu64")\n", > + total_cycles / (double) total_packets, > total_cycles, total_packets); > > ds_put_format(reply, > - "\tavg processing cycles per packet: " > - "%.02f (%"PRIu64"/%llu)\n", > - cycles[PMD_CYCLES_PROCESSING] / (double)total_packets, > - cycles[PMD_CYCLES_PROCESSING], total_packets); > -} > - > -static void > -pmd_info_clear_stats(struct ds *reply OVS_UNUSED, > - struct dp_netdev_pmd_thread *pmd, > - unsigned long long stats[DP_N_STATS], > - uint64_t cycles[PMD_N_CYCLES]) > -{ > - int i; > - > - /* We cannot write 'stats' and 'cycles' (because they're written by other > - * threads) and we shouldn't change 'stats' (because they're used to count > - * datapath stats, which must not be cleared here). Instead, we save the > - * current values and subtract them from the values to be displayed in the > - * future */ > - for (i = 0; i < DP_N_STATS; i++) { > - pmd->stats_zero[i] = stats[i]; > - } > - for (i = 0; i < PMD_N_CYCLES; i++) { > - pmd->cycles_zero[i] = cycles[i]; > - } > + "\tavg processing cycles per packet:" > + "%.02f (%"PRIu64"/%"PRIu64")\n", > + stats[PMD_CYCLES_ITER_BUSY] / (double) total_packets, > + stats[PMD_CYCLES_ITER_BUSY], total_packets); > } > > static int > @@ -1106,23 +1034,37 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], > struct ds reply = DS_EMPTY_INITIALIZER; > struct dp_netdev_pmd_thread **pmd_list; > struct dp_netdev *dp = NULL; > - size_t n; > enum pmd_info_type type = *(enum pmd_info_type *) aux; > + unsigned int core_id; > + bool filter_on_pmd = false; > + size_t n; > > ovs_mutex_lock(&dp_netdev_mutex); > > - if (argc == 2) { > - dp = shash_find_data(&dp_netdevs, argv[1]); > - } else if (shash_count(&dp_netdevs) == 1) { > - /* There's only one datapath */ > - dp = shash_first(&dp_netdevs)->data; > + while (argc > 0) { > + if (!strcmp(argv[1], "-pmd") && argc >= 2) { > + if (str_to_uint(argv[2], 10, &core_id)) { > + filter_on_pmd = true; > + } > + argc -= 2; > + argv += 2; > + } else { > + dp = shash_find_data(&dp_netdevs, argv[1]); > + argc -= 1; > + argv += 1; > + } > } > > if (!dp) { > - ovs_mutex_unlock(&dp_netdev_mutex); > - unixctl_command_reply_error(conn, > - "please specify an existing datapath"); > - return; > + if (shash_count(&dp_netdevs) == 1) { > + /* There's only one datapath */ > + dp = shash_first(&dp_netdevs)->data; > + } else { > + ovs_mutex_unlock(&dp_netdev_mutex); > + unixctl_command_reply_error(conn, > + "please specify an existing datapath"); > + return; > + } > } > > sorted_poll_thread_list(dp, &pmd_list, &n); > @@ -1131,26 +1073,15 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], > if (!pmd) { > break; > } > - > + if (filter_on_pmd && pmd->core_id != core_id) { > + continue; > + } > if (type == PMD_INFO_SHOW_RXQ) { > pmd_info_show_rxq(&reply, pmd); > - } else { > - unsigned long long stats[DP_N_STATS]; > - uint64_t cycles[PMD_N_CYCLES]; > - > - /* Read current stats and cycle counters */ > - for (size_t j = 0; j < ARRAY_SIZE(stats); j++) { > - atomic_read_relaxed(&pmd->stats.n[j], &stats[j]); > - } > - for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) { > - atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]); > - } > - > - if (type == PMD_INFO_CLEAR_STATS) { > - pmd_info_clear_stats(&reply, pmd, stats, cycles); > - } else if (type == PMD_INFO_SHOW_STATS) { > - pmd_info_show_stats(&reply, pmd, stats, cycles); > - } > + } else if (type == PMD_INFO_CLEAR_STATS) { > + pmd_perf_stats_clear(&pmd->perf_stats); > + } else if (type == PMD_INFO_SHOW_STATS) { > + pmd_info_show_stats(&reply, pmd); > } > } > free(pmd_list); > @@ -1168,14 +1099,14 @@ dpif_netdev_init(void) > clear_aux = PMD_INFO_CLEAR_STATS, > poll_aux = PMD_INFO_SHOW_RXQ; > > - unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]", > - 0, 1, dpif_netdev_pmd_info, > + unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core] [dp]", > + 0, 2, dpif_netdev_pmd_info, > (void *)&show_aux); > - unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]", > - 0, 1, dpif_netdev_pmd_info, > + unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core] [dp]", > + 0, 2, dpif_netdev_pmd_info, > (void *)&clear_aux); > - unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]", > - 0, 1, dpif_netdev_pmd_info, > + unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] [dp]", > + 0, 2, dpif_netdev_pmd_info, > (void *)&poll_aux); > unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", > 0, 1, dpif_netdev_pmd_rebalance, > @@ -1511,23 +1442,19 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) > { > struct dp_netdev *dp = get_dp_netdev(dpif); > struct dp_netdev_pmd_thread *pmd; > + uint64_t pmd_stats[PMD_N_STATS]; > > - stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost = 0; > + stats->n_flows = stats->n_hit = > + stats->n_mask_hit = stats->n_missed = stats->n_lost = 0; > CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { > - unsigned long long n; > stats->n_flows += cmap_count(&pmd->flow_table); > - > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n); > - stats->n_hit += n; > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n); > - stats->n_hit += n; > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n); > - stats->n_missed += n; > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n); > - stats->n_lost += n; > + pmd_perf_read_counters(&pmd->perf_stats, pmd_stats); > + stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT]; > + stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT]; > + stats->n_missed += pmd_stats[PMD_STAT_MISS]; > + stats->n_lost += pmd_stats[PMD_STAT_LOST]; > } > stats->n_masks = UINT32_MAX; > - stats->n_mask_hit = UINT64_MAX; > > return 0; > } > @@ -3209,28 +3136,28 @@ cycles_count_start(struct dp_netdev_pmd_thread *pmd) > /* Stop counting cycles and add them to the counter 'type' */ > static inline void > cycles_count_end(struct dp_netdev_pmd_thread *pmd, > - enum pmd_cycles_counter_type type) > + enum pmd_stat_type type) > OVS_RELEASES(&cycles_counter_fake_mutex) > OVS_NO_THREAD_SAFETY_ANALYSIS > { > unsigned long long interval = cycles_counter() - pmd->ctx.last_cycles; > > - non_atomic_ullong_add(&pmd->cycles.n[type], interval); > + pmd_perf_update_counter(&pmd->perf_stats, type, interval); > } > > /* Calculate the intermediate cycle result and add to the counter 'type' */ > static inline void > cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, > struct dp_netdev_rxq *rxq, > - enum pmd_cycles_counter_type type) > + enum pmd_stat_type type) > OVS_NO_THREAD_SAFETY_ANALYSIS > { > unsigned long long new_cycles = cycles_counter(); > unsigned long long interval = new_cycles - pmd->ctx.last_cycles; > pmd->ctx.last_cycles = new_cycles; > > - non_atomic_ullong_add(&pmd->cycles.n[type], interval); > - if (rxq && (type == PMD_CYCLES_PROCESSING)) { > + pmd_perf_update_counter(&pmd->perf_stats, type, interval); > + if (rxq && (type == PMD_CYCLES_POLL_BUSY)) { > /* Add to the amount of current processing cycles. */ > non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval); > } > @@ -3289,8 +3216,8 @@ dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd, > netdev_send(p->port->netdev, tx_qid, &p->output_pkts, dynamic_txqs); > dp_packet_batch_init(&p->output_pkts); > > - dp_netdev_count_packet(pmd, DP_STAT_SENT_PKTS, output_cnt); > - dp_netdev_count_packet(pmd, DP_STAT_SENT_BATCHES, 1); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SENT_PKTS, output_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SENT_BATCHES, 1); > } > > static void > @@ -3971,12 +3898,12 @@ dpif_netdev_run(struct dpif *dpif) > port->port_no); > cycles_count_intermediate(non_pmd, NULL, > process_packets > - ? PMD_CYCLES_PROCESSING > - : PMD_CYCLES_IDLE); > + ? PMD_CYCLES_POLL_BUSY > + : PMD_CYCLES_POLL_IDLE); > } > } > } > - cycles_count_end(non_pmd, PMD_CYCLES_IDLE); > + cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE); > pmd_thread_ctx_time_update(non_pmd); > dpif_netdev_xps_revalidate_pmd(non_pmd, false); > ovs_mutex_unlock(&dp->non_pmd_mutex); > @@ -4121,6 +4048,7 @@ static void * > pmd_thread_main(void *f_) > { > struct dp_netdev_pmd_thread *pmd = f_; > + struct pmd_perf_stats *s = &pmd->perf_stats; > unsigned int lc = 0; > struct polled_queue *poll_list; > bool exiting; > @@ -4156,13 +4084,17 @@ reload: > > cycles_count_start(pmd); > for (;;) { > + uint64_t iter_packets = 0; > + pmd_perf_start_iteration(s, pmd->ctx.last_cycles); > for (i = 0; i < poll_cnt; i++) { > process_packets = > dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, > poll_list[i].port_no); > cycles_count_intermediate(pmd, poll_list[i].rxq, > - process_packets ? PMD_CYCLES_PROCESSING > - : PMD_CYCLES_IDLE); > + process_packets > + ? PMD_CYCLES_POLL_BUSY > + : PMD_CYCLES_POLL_IDLE); > + iter_packets += process_packets; > } > > if (lc++ > 1024) { > @@ -4183,10 +4115,12 @@ reload: > if (reload) { > break; > } > + cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD); > } > + pmd_perf_end_iteration(s, pmd->ctx.last_cycles, iter_packets); > } > > - cycles_count_end(pmd, PMD_CYCLES_IDLE); > + cycles_count_end(pmd, PMD_CYCLES_OVERHEAD); > > poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); > exiting = latch_is_set(&pmd->exit_latch); > @@ -4638,6 +4572,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, > emc_cache_init(&pmd->flow_cache); > pmd_alloc_static_tx_qid(pmd); > } > + pmd_perf_stats_init(&pmd->perf_stats); > cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node), > hash_int(core_id, 0)); > } > @@ -4838,13 +4773,6 @@ dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow, int cnt, int size, > atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags); > } > > -static void > -dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd, > - enum dp_stat_type type, int cnt) > -{ > - non_atomic_ullong_add(&pmd->stats.n[type], cnt); > -} > - > static int > dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_, > struct flow *flow, struct flow_wildcards *wc, ovs_u128 *ufid, > @@ -5017,6 +4945,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, > int i; > > atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min); > + pmd_perf_update_counter(&pmd->perf_stats, > + md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV, > + cnt); > > DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) { > struct dp_netdev_flow *flow; > @@ -5065,18 +4996,17 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, > } > } > > - dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT, > - cnt - n_dropped - n_missed); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT, > + cnt - n_dropped - n_missed); > > return dp_packet_batch_size(packets_); > } > > -static inline void > +static inline int > handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, > struct dp_packet *packet, > const struct netdev_flow_key *key, > - struct ofpbuf *actions, struct ofpbuf *put_actions, > - int *lost_cnt) > + struct ofpbuf *actions, struct ofpbuf *put_actions) > { > struct ofpbuf *add_actions; > struct dp_packet_batch b; > @@ -5096,8 +5026,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, > put_actions); > if (OVS_UNLIKELY(error && error != ENOSPC)) { > dp_packet_delete(packet); > - (*lost_cnt)++; > - return; > + return error; > } > > /* The Netlink encoding of datapath flow keys cannot express > @@ -5137,6 +5066,9 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, > ovs_mutex_unlock(&pmd->flow_mutex); > emc_probabilistic_insert(pmd, key, netdev_flow); > } > + /* Only error ENOSPC can reach here. We process the packet but do not > + * install a datapath flow. Treat as successful. */ > + return 0; > } > > static inline void > @@ -5158,7 +5090,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > struct dpcls *cls; > struct dpcls_rule *rules[PKT_ARRAY_SIZE]; > struct dp_netdev *dp = pmd->dp; > - int miss_cnt = 0, lost_cnt = 0; > + int upcall_ok_cnt = 0, upcall_fail_cnt = 0; > int lookup_cnt = 0, add_lookup_cnt; > bool any_miss; > size_t i; > @@ -5200,9 +5132,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > continue; > } > > - miss_cnt++; > - handle_packet_upcall(pmd, packet, &keys[i], &actions, > - &put_actions, &lost_cnt); > + int error = handle_packet_upcall(pmd, packet, &keys[i], > + &actions, &put_actions); > + > + if (OVS_UNLIKELY(error)) { > + upcall_fail_cnt++; > + } else { > + upcall_ok_cnt++; > + } > } > > ofpbuf_uninit(&actions); > @@ -5212,8 +5149,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > DP_PACKET_BATCH_FOR_EACH (packet, packets_) { > if (OVS_UNLIKELY(!rules[i])) { > dp_packet_delete(packet); > - lost_cnt++; > - miss_cnt++; > + upcall_fail_cnt++; > } > } > } > @@ -5231,10 +5167,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches); > } > > - dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt); > - dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt); > - dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt); > - dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT, > + cnt - upcall_ok_cnt - upcall_fail_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_LOOKUP, > + lookup_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS, > + upcall_ok_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST, > + upcall_fail_cnt); > } > > /* Packets enter the datapath from a port (or from recirculation) here. > diff --git a/tests/pmd.at b/tests/pmd.at > index e39a23a..0356f87 100644 > --- a/tests/pmd.at > +++ b/tests/pmd.at > @@ -170,13 +170,16 @@ dummy@ovs-dummy: hit:0 missed:0 > p0 7/1: (dummy-pmd: configured_rx_queues=4, configured_tx_queues=<cleared>, requested_rx_queues=4, requested_tx_queues=<cleared>) > ]) > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl > pmd thread numa_id <cleared> core_id <cleared>: > + packets received:0 > + packet recirculations:0 > + avg. datapath passes per packet:0.00 > emc hits:0 > megaflow hits:0 > - avg. subtable lookups per hit:0.00 > - miss:0 > - lost:0 > + avg. subtable lookups per megaflow hit:0.00 > + miss(i.e. lookup miss with success upcall):0 > + lost(i.e. lookup miss with failed upcall):0 > ]) > > ovs-appctl time/stop > @@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | strip_xout], [0], [dnl > recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), actions: <del> > ]) > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl > pmd thread numa_id <cleared> core_id <cleared>: > + packets received:20 > + packet recirculations:0 > + avg. datapath passes per packet:1.00 > emc hits:19 > megaflow hits:0 > - avg. subtable lookups per hit:0.00 > - miss:1 > - lost:0 > + avg. subtable lookups per megaflow hit:0.00 > + miss(i.e. lookup miss with success upcall):1 > + lost(i.e. lookup miss with failed upcall):0 > ]) > > OVS_VSWITCHD_STOP > Looks like tests was not updated.
I guess, this version was compile tested only. Comments inline. Best regards, Ilya Maximets. On 11.01.2018 10:39, Jan Scheurich wrote: > Add module dpif-netdev-perf to host all PMD performance-related > data structures and functions in dpif-netdev. Refactor the PMD > stats handling in dpif-netdev and delegate whatever possible into > the new module, using clean interfaces to shield dpif-netdev from > the implementation details. Accordingly, the all PMD statistics > members are moved from the main struct dp_netdev_pmd_thread into > a dedicated member of type struct pmd_perf_stats. > > Include Darrel's prior refactoring of PMD stats contained in > [PATCH v5,2/3] dpif-netdev: Refactor some pmd stats: > > 1. The cycles per packet counts are now based on packets > received rather than packet passes through the datapath. > > 2. Packet counters are now kept for packets received and > packets recirculated. These are kept as separate counters for > maintainability reasons. The cost of incrementing these counters > is negligible. These new counters are also displayed to the user. > > 3. A display statistic is added for the average number of > datapath passes per packet. This should be useful for user > debugging and understanding of packet processing. > > 4. The user visible 'miss' counter is used for successful upcalls, > rather than the sum of sucessful and unsuccessful upcalls. Hence, > this becomes what user historically understands by OVS 'miss upcall'. > The user display is annotated to make this clear as well. > > 5. The user visible 'lost' counter remains as failed upcalls, but > is annotated to make it clear what the meaning is. > > 6. The enum pmd_stat_type is annotated to make the usage of the > stats counters clear. > > 7. The subtable lookup stats is renamed to make it clear that it > relates to masked lookups. > > 8. The PMD stats test is updated to handle the new user stats of > packets received, packets recirculated and average number of datapath > passes per packet. > > On top of that introduce a "-pmd <core>" option to the PMD info > commands to filter the output for a single PMD. > > Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> > Co-authored-by: Darrell Ball <dlu998@gmail.com> > Signed-off-by: Darrell Ball <dlu998@gmail.com> > --- > lib/automake.mk | 2 + > lib/dpif-netdev-perf.c | 67 +++++++++ > lib/dpif-netdev-perf.h | 151 +++++++++++++++++++++ > lib/dpif-netdev.c | 360 +++++++++++++++++++++---------------------------- > tests/pmd.at | 22 +-- > 5 files changed, 384 insertions(+), 218 deletions(-) > create mode 100644 lib/dpif-netdev-perf.c > create mode 100644 lib/dpif-netdev-perf.h > > diff --git a/lib/automake.mk b/lib/automake.mk > index 4b38a11..159319f 100644 > --- a/lib/automake.mk > +++ b/lib/automake.mk > @@ -80,6 +80,8 @@ lib_libopenvswitch_la_SOURCES = \ > lib/dpdk.h \ > lib/dpif-netdev.c \ > lib/dpif-netdev.h \ > + lib/dpif-netdev-perf.c \ > + lib/dpif-netdev-perf.h \ > lib/dpif-provider.h \ > lib/dpif.c \ > lib/dpif.h \ > diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c > new file mode 100644 > index 0000000..f853fd8 > --- /dev/null > +++ b/lib/dpif-netdev-perf.c > @@ -0,0 +1,67 @@ > +/* > + * Copyright (c) 2017 Ericsson AB. > + * > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#include <config.h> > + > +#ifdef DPDK_NETDEV > +#include <rte_cycles.h> > +#endif > + > +#include "openvswitch/dynamic-string.h" > +#include "openvswitch/vlog.h" > +#include "dpif-netdev-perf.h" > +#include "timeval.h" > + > +VLOG_DEFINE_THIS_MODULE(pmd_perf); > + > +void > +pmd_perf_stats_init(struct pmd_perf_stats *s) > +{ > + memset(s, 0 , sizeof(*s)); > + s->start_ms = time_msec(); > +} > + > +void > +pmd_perf_read_counters(struct pmd_perf_stats *s, > + uint64_t stats[PMD_N_STATS]) > +{ > + uint64_t val; > + > + /* These loops subtracts reference values (.zero[*]) from the counters. > + * Since loads and stores are relaxed, it might be possible for a .zero[*] > + * value to be more recent than the current value we're reading from the > + * counter. This is not a big problem, since these numbers are not > + * supposed to be 100% accurate, but we should at least make sure that > + * the result is not negative. */ > + for (int i = 0; i < PMD_N_STATS; i++) { > + atomic_read_relaxed(&s->counters.n[i], &val); > + if (val > s->counters.zero[i]) { > + stats[i] = val - s->counters.zero[i]; > + } else { > + stats[i] = 0; > + } > + } > +} > + > +void > +pmd_perf_stats_clear(struct pmd_perf_stats *s) > +{ > + s->start_ms = 0; /* Marks start of clearing. */ > + for (int i = 0; i < PMD_N_STATS; i++) { > + atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]); > + } > + s->start_ms = time_msec(); /* Clearing finished. */ > +} > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h > new file mode 100644 > index 0000000..41f4e85 > --- /dev/null > +++ b/lib/dpif-netdev-perf.h > @@ -0,0 +1,151 @@ > +/* > + * Copyright (c) 2017 Ericsson AB. > + * > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#ifndef DPIF_NETDEV_PERF_H > +#define DPIF_NETDEV_PERF_H 1 > + > +#include <stdbool.h> > +#include <stddef.h> > +#include <stdint.h> > +#include <string.h> > +#include <math.h> > + > +#include "openvswitch/vlog.h" > +#include "ovs-atomic.h" > +#include "timeval.h" > +#include "unixctl.h" > +#include "util.h" > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +/* This module encapsulates data structures and functions to maintain PMD > + * performance metrics such as packet counters, execution cycles. It > + * provides a clean API for dpif-netdev to initialize, update and read and > + * reset these metrics. > + */ > + > +/* Set of counter types maintained in pmd_perf_stats. */ > + > +enum pmd_stat_type { > + PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ > + PMD_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ > + PMD_STAT_MISS, /* Packets that did not match and upcall was ok. */ > + PMD_STAT_LOST, /* Packets that did not match and upcall failed. */ > + /* The above statistics account for the total > + * number of packet passes through the datapath > + * pipeline and should not be overlapping with each > + * other. */ > + PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table > + hits. Each MASKED_HIT hit will have >= 1 > + MASKED_LOOKUP(s). */ > + PMD_STAT_RECV, /* Packets entering the datapath pipeline from an > + * interface. */ > + PMD_STAT_RECIRC, /* Packets reentering the datapath pipeline due to > + * recirculation. */ > + PMD_STAT_SENT_PKTS, /* Packets that have been sent. */ > + PMD_STAT_SENT_BATCHES, /* Number of batches sent. */ > + PMD_CYCLES_POLL_IDLE, /* Cycles spent unsuccessful polling. */ > + PMD_CYCLES_POLL_BUSY, /* Cycles spent successfully polling and > + * processing polled packets. */ > + PMD_CYCLES_OVERHEAD, /* Cycles spent for other tasks. */ > + PMD_CYCLES_ITER_IDLE, /* Cycles spent in idle iterations. */ > + PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */ > + PMD_N_STATS > +}; > + > +/* Array of PMD counters indexed by enum pmd_stat_type. > + * The n[] array contains the actual counter values since initialization > + * of the PMD. Counters are atomically updated from the PMD but are > + * read and cleared also from other processes. To clear the counters at > + * PMD run-time, the current counter values are copied over to the zero[] > + * array. To read counters we subtract zero[] value from n[]. */ > + > +struct pmd_counters { > + atomic_uint64_t n[PMD_N_STATS]; /* Value since _init(). */ > + uint64_t zero[PMD_N_STATS]; /* Value at last _clear(). */ > +}; > + > +/* Container for all performance metrics of a PMD. > + * Part of the struct dp_netdev_pmd_thread. */ > + > +struct pmd_perf_stats { > + /* Start of the current performance measurement period. > + * While a new measurement period is being started with > + * ovs-appctl dpif-netdev/pmd-stats-clear, start_ms is set > + * to zero to lock out operations from accessing inconsistent > + * data. */ > + uint64_t start_ms; > + /* Start of the current PMD iteration in TSC cycles.*/ > + uint64_t last_tsc; > + /* Set of PMD counters with their zero offsets. */ > + struct pmd_counters counters; > +}; > + > +void pmd_perf_stats_init(struct pmd_perf_stats *s); > +void pmd_perf_stats_clear(struct pmd_perf_stats *s); > +void pmd_perf_read_counters(struct pmd_perf_stats *s, > + uint64_t stats[PMD_N_STATS]); > + > +/* PMD performance counters are updated lock-less. For real PMDs > + * they are only updated from the PMD thread itself. In the case of the > + * NON-PMD they might be updated from multiple threads, but we can live > + * with losing a rare update as 100% accuracy is not required. > + * However, as counters are read for display from outside the PMD thread > + * with e.g. pmd-stats-show, we make sure that the 64-bit read and store > + * operations are atomic also on 32-bit systems so that readers cannot > + * not read garbage. On 64-bit systems this incurs no overhead. */ > + > +static inline void > +pmd_perf_update_counter(struct pmd_perf_stats *s, > + enum pmd_stat_type counter, int delta) > +{ > + uint64_t tmp; > + atomic_read_relaxed(&s->counters.n[counter], &tmp); > + tmp += delta; > + atomic_store_relaxed(&s->counters.n[counter], tmp); > +} > + > +static inline void > +pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc) > +{ > + if (OVS_UNLIKELY(s->start_ms == 0)) { > + /* Stats not yet fully initialized. */ > + return; > + } > + s->last_tsc = now_tsc; > +} > + > +static inline void > +pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc, > + int packets) > +{ > + uint64_t cycles = now_tsc - s->last_tsc; > + > + /* No need for atomic updates as only invoked within PMD thread. */ > + if (packets > 0) { > + s->counters.n[PMD_CYCLES_ITER_BUSY] += cycles; > + } else { > + s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles; > + } > +} > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* DPIF_NETDEV_PERF_H */ > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index c7d157a..538d5ce 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -44,6 +44,7 @@ > #include "csum.h" > #include "dp-packet.h" > #include "dpif.h" > +#include "dpif-netdev-perf.h" > #include "dpif-provider.h" > #include "dummy.h" > #include "fat-rwlock.h" > @@ -331,25 +332,6 @@ static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp, > odp_port_t) > OVS_REQUIRES(dp->port_mutex); > > -enum dp_stat_type { > - DP_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ > - DP_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ > - DP_STAT_MISS, /* Packets that did not match. */ > - DP_STAT_LOST, /* Packets not passed up to the client. */ > - DP_STAT_LOOKUP_HIT, /* Number of subtable lookups for flow table > - hits */ > - DP_STAT_SENT_PKTS, /* Packets that has been sent. */ > - DP_STAT_SENT_BATCHES, /* Number of batches sent. */ > - DP_N_STATS > -}; > - > -enum pmd_cycles_counter_type { > - PMD_CYCLES_IDLE, /* Cycles spent idle or unsuccessful polling */ > - PMD_CYCLES_PROCESSING, /* Cycles spent successfully polling and > - * processing polled packets */ > - PMD_N_CYCLES > -}; > - > enum rxq_cycles_counter_type { > RXQ_CYCLES_PROC_CURR, /* Cycles spent successfully polling and > processing packets during the current > @@ -499,21 +481,6 @@ struct dp_netdev_actions *dp_netdev_flow_get_actions( > const struct dp_netdev_flow *); > static void dp_netdev_actions_free(struct dp_netdev_actions *); > > -/* Contained by struct dp_netdev_pmd_thread's 'stats' member. */ > -struct dp_netdev_pmd_stats { > - /* Indexed by DP_STAT_*. */ > - atomic_ullong n[DP_N_STATS]; > -}; > - > -/* Contained by struct dp_netdev_pmd_thread's 'cycle' member. */ > -struct dp_netdev_pmd_cycles { > - /* Indexed by PMD_CYCLES_*. */ > - atomic_ullong n[PMD_N_CYCLES]; > -}; > - > -static void dp_netdev_count_packet(struct dp_netdev_pmd_thread *, > - enum dp_stat_type type, int cnt); > - > struct polled_queue { > struct dp_netdev_rxq *rxq; > odp_port_t port_no; > @@ -595,12 +562,6 @@ struct dp_netdev_pmd_thread { > are stored for each polled rxq. */ > long long int rxq_next_cycle_store; > > - /* Statistics. */ > - struct dp_netdev_pmd_stats stats; > - > - /* Cycles counters */ > - struct dp_netdev_pmd_cycles cycles; > - > /* Current context of the PMD thread. */ > struct dp_netdev_pmd_thread_ctx ctx; > > @@ -638,12 +599,8 @@ struct dp_netdev_pmd_thread { > struct hmap tnl_port_cache; > struct hmap send_port_cache; > > - /* Only a pmd thread can write on its own 'cycles' and 'stats'. > - * The main thread keeps 'stats_zero' and 'cycles_zero' as base > - * values and subtracts them from 'stats' and 'cycles' before > - * reporting to the user */ > - unsigned long long stats_zero[DP_N_STATS]; > - uint64_t cycles_zero[PMD_N_CYCLES]; > + /* Keep track of detailed PMD performance statistics. */ > + struct pmd_perf_stats perf_stats; > > /* Set to true if the pmd thread needs to be reloaded. */ > bool need_reload; > @@ -833,47 +790,10 @@ enum pmd_info_type { > }; > > static void > -pmd_info_show_stats(struct ds *reply, > - struct dp_netdev_pmd_thread *pmd, > - unsigned long long stats[DP_N_STATS], > - uint64_t cycles[PMD_N_CYCLES]) > +format_pmd_thread(struct ds *reply, struct dp_netdev_pmd_thread *pmd) > { > - unsigned long long total_packets; > - uint64_t total_cycles = 0; > - double lookups_per_hit = 0, packets_per_batch = 0; > - int i; > - > - /* These loops subtracts reference values ('*_zero') from the counters. > - * Since loads and stores are relaxed, it might be possible for a '*_zero' > - * value to be more recent than the current value we're reading from the > - * counter. This is not a big problem, since these numbers are not > - * supposed to be too accurate, but we should at least make sure that > - * the result is not negative. */ > - for (i = 0; i < DP_N_STATS; i++) { > - if (stats[i] > pmd->stats_zero[i]) { > - stats[i] -= pmd->stats_zero[i]; > - } else { > - stats[i] = 0; > - } > - } > - > - /* Sum of all the matched and not matched packets gives the total. */ > - total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT] > - + stats[DP_STAT_MISS]; > - > - for (i = 0; i < PMD_N_CYCLES; i++) { > - if (cycles[i] > pmd->cycles_zero[i]) { > - cycles[i] -= pmd->cycles_zero[i]; > - } else { > - cycles[i] = 0; > - } > - > - total_cycles += cycles[i]; > - } > - > ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID) > ? "main thread" : "pmd thread"); > - > if (pmd->numa_id != OVS_NUMA_UNSPEC) { > ds_put_format(reply, " numa_id %d", pmd->numa_id); > } > @@ -881,23 +801,52 @@ pmd_info_show_stats(struct ds *reply, > ds_put_format(reply, " core_id %u", pmd->core_id); > } > ds_put_cstr(reply, ":\n"); > +} > + > +static void > +pmd_info_show_stats(struct ds *reply, > + struct dp_netdev_pmd_thread *pmd) > +{ > + uint64_t stats[PMD_N_STATS]; > + uint64_t total_cycles, total_packets; > + double passes_per_pkt = 0; > + double lookups_per_hit = 0; > + double packets_per_batch = 0; > + > + pmd_perf_read_counters(&pmd->perf_stats, stats); > + total_cycles = stats[PMD_CYCLES_ITER_IDLE] > + + stats[PMD_CYCLES_ITER_BUSY]; > + total_packets = stats[PMD_STAT_RECV]; > + > + format_pmd_thread(reply, pmd); > > - if (stats[DP_STAT_MASKED_HIT] > 0) { > - lookups_per_hit = stats[DP_STAT_LOOKUP_HIT] > - / (double) stats[DP_STAT_MASKED_HIT]; > + if (total_packets > 0) { > + passes_per_pkt = (total_packets + stats[PMD_STAT_RECIRC]) > + / (double) total_packets; > } > - if (stats[DP_STAT_SENT_BATCHES] > 0) { > - packets_per_batch = stats[DP_STAT_SENT_PKTS] > - / (double) stats[DP_STAT_SENT_BATCHES]; > + if (stats[PMD_STAT_MASKED_HIT] > 0) { > + lookups_per_hit = stats[PMD_STAT_MASKED_LOOKUP] > + / (double) stats[PMD_STAT_MASKED_HIT]; > + } > + if (stats[PMD_STAT_SENT_BATCHES] > 0) { > + packets_per_batch = stats[PMD_STAT_SENT_PKTS] > + / (double) stats[PMD_STAT_SENT_BATCHES]; > } > > ds_put_format(reply, > - "\temc hits:%llu\n\tmegaflow hits:%llu\n" > - "\tavg. subtable lookups per hit:%.2f\n" > - "\tmiss:%llu\n\tlost:%llu\n" > - "\tavg. packets per output batch: %.2f\n", > - stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT], > - lookups_per_hit, stats[DP_STAT_MISS], stats[DP_STAT_LOST], > + "\tpackets received:%"PRIu64"\n" > + "\tpacket recirculations:%"PRIu64"\n" > + "\tavg. datapath passes per packet:%.02f\n" > + "\temc hits:%"PRIu64"\n" > + "\tmegaflow hits:%"PRIu64"\n" > + "\tavg. subtable lookups per megaflow hit:%.02f\n" > + "\tmiss with success upcall:%"PRIu64"\n" > + "\tmiss with failed upcall:%"PRIu64"\n" > + "\tavg. packets per output batch:%.02f\n", > + total_packets, stats[PMD_STAT_RECIRC], > + passes_per_pkt, stats[PMD_STAT_EXACT_HIT], > + stats[PMD_STAT_MASKED_HIT], lookups_per_hit, > + stats[PMD_STAT_MISS], stats[PMD_STAT_LOST], > packets_per_batch); > > if (total_cycles == 0) { > @@ -907,46 +856,25 @@ pmd_info_show_stats(struct ds *reply, > ds_put_format(reply, > "\tidle cycles:%"PRIu64" (%.02f%%)\n" > "\tprocessing cycles:%"PRIu64" (%.02f%%)\n", > - cycles[PMD_CYCLES_IDLE], > - cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100, > - cycles[PMD_CYCLES_PROCESSING], > - cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100); > + stats[PMD_CYCLES_ITER_IDLE], > + stats[PMD_CYCLES_ITER_IDLE] / (double) total_cycles * 100, > + stats[PMD_CYCLES_ITER_BUSY], > + stats[PMD_CYCLES_ITER_BUSY] / (double) total_cycles * 100); > > if (total_packets == 0) { > return; > } > > ds_put_format(reply, > - "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n", > - total_cycles / (double)total_packets, > + "\tavg cycles per packet:%.02f (%"PRIu64"/%"PRIu64")\n", > + total_cycles / (double) total_packets, > total_cycles, total_packets); > > ds_put_format(reply, > - "\tavg processing cycles per packet: " > - "%.02f (%"PRIu64"/%llu)\n", > - cycles[PMD_CYCLES_PROCESSING] / (double)total_packets, > - cycles[PMD_CYCLES_PROCESSING], total_packets); > -} > - > -static void > -pmd_info_clear_stats(struct ds *reply OVS_UNUSED, > - struct dp_netdev_pmd_thread *pmd, > - unsigned long long stats[DP_N_STATS], > - uint64_t cycles[PMD_N_CYCLES]) > -{ > - int i; > - > - /* We cannot write 'stats' and 'cycles' (because they're written by other > - * threads) and we shouldn't change 'stats' (because they're used to count > - * datapath stats, which must not be cleared here). Instead, we save the > - * current values and subtract them from the values to be displayed in the > - * future */ > - for (i = 0; i < DP_N_STATS; i++) { > - pmd->stats_zero[i] = stats[i]; > - } > - for (i = 0; i < PMD_N_CYCLES; i++) { > - pmd->cycles_zero[i] = cycles[i]; > - } > + "\tavg processing cycles per packet:" > + "%.02f (%"PRIu64"/%"PRIu64")\n", > + stats[PMD_CYCLES_ITER_BUSY] / (double) total_packets, > + stats[PMD_CYCLES_ITER_BUSY], total_packets); > } > > static int > @@ -1106,23 +1034,37 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], > struct ds reply = DS_EMPTY_INITIALIZER; > struct dp_netdev_pmd_thread **pmd_list; > struct dp_netdev *dp = NULL; > - size_t n; > enum pmd_info_type type = *(enum pmd_info_type *) aux; > + unsigned int core_id; > + bool filter_on_pmd = false; > + size_t n; > > ovs_mutex_lock(&dp_netdev_mutex); > > - if (argc == 2) { > - dp = shash_find_data(&dp_netdevs, argv[1]); > - } else if (shash_count(&dp_netdevs) == 1) { > - /* There's only one datapath */ > - dp = shash_first(&dp_netdevs)->data; > + while (argc > 0) { This causes OVS crash on any invoked appctl. Should be "argc > 1" > + if (!strcmp(argv[1], "-pmd") && argc >= 2) { > + if (str_to_uint(argv[2], 10, &core_id)) { > + filter_on_pmd = true; > + } > + argc -= 2; > + argv += 2; > + } else { > + dp = shash_find_data(&dp_netdevs, argv[1]); > + argc -= 1; > + argv += 1; > + } > } > > if (!dp) { > - ovs_mutex_unlock(&dp_netdev_mutex); > - unixctl_command_reply_error(conn, > - "please specify an existing datapath"); > - return; > + if (shash_count(&dp_netdevs) == 1) { > + /* There's only one datapath */ > + dp = shash_first(&dp_netdevs)->data; > + } else { > + ovs_mutex_unlock(&dp_netdev_mutex); > + unixctl_command_reply_error(conn, > + "please specify an existing datapath"); > + return; > + } > } > > sorted_poll_thread_list(dp, &pmd_list, &n); > @@ -1131,26 +1073,15 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], > if (!pmd) { > break; > } > - > + if (filter_on_pmd && pmd->core_id != core_id) { > + continue; > + } > if (type == PMD_INFO_SHOW_RXQ) { > pmd_info_show_rxq(&reply, pmd); > - } else { > - unsigned long long stats[DP_N_STATS]; > - uint64_t cycles[PMD_N_CYCLES]; > - > - /* Read current stats and cycle counters */ > - for (size_t j = 0; j < ARRAY_SIZE(stats); j++) { > - atomic_read_relaxed(&pmd->stats.n[j], &stats[j]); > - } > - for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) { > - atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]); > - } > - > - if (type == PMD_INFO_CLEAR_STATS) { > - pmd_info_clear_stats(&reply, pmd, stats, cycles); > - } else if (type == PMD_INFO_SHOW_STATS) { > - pmd_info_show_stats(&reply, pmd, stats, cycles); > - } > + } else if (type == PMD_INFO_CLEAR_STATS) { > + pmd_perf_stats_clear(&pmd->perf_stats); > + } else if (type == PMD_INFO_SHOW_STATS) { > + pmd_info_show_stats(&reply, pmd); > } > } > free(pmd_list); > @@ -1168,14 +1099,14 @@ dpif_netdev_init(void) > clear_aux = PMD_INFO_CLEAR_STATS, > poll_aux = PMD_INFO_SHOW_RXQ; > > - unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]", > - 0, 1, dpif_netdev_pmd_info, > + unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core] [dp]", > + 0, 2, dpif_netdev_pmd_info, > (void *)&show_aux); > - unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]", > - 0, 1, dpif_netdev_pmd_info, > + unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core] [dp]", > + 0, 2, dpif_netdev_pmd_info, > (void *)&clear_aux); > - unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]", > - 0, 1, dpif_netdev_pmd_info, > + unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] [dp]", > + 0, 2, dpif_netdev_pmd_info, > (void *)&poll_aux); > unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", > 0, 1, dpif_netdev_pmd_rebalance, > @@ -1511,23 +1442,19 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) > { > struct dp_netdev *dp = get_dp_netdev(dpif); > struct dp_netdev_pmd_thread *pmd; > + uint64_t pmd_stats[PMD_N_STATS]; > > - stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost = 0; > + stats->n_flows = stats->n_hit = > + stats->n_mask_hit = stats->n_missed = stats->n_lost = 0; > CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { > - unsigned long long n; > stats->n_flows += cmap_count(&pmd->flow_table); > - > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n); > - stats->n_hit += n; > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n); > - stats->n_hit += n; > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n); > - stats->n_missed += n; > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n); > - stats->n_lost += n; > + pmd_perf_read_counters(&pmd->perf_stats, pmd_stats); > + stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT]; > + stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT]; > + stats->n_missed += pmd_stats[PMD_STAT_MISS]; > + stats->n_lost += pmd_stats[PMD_STAT_LOST]; > } > stats->n_masks = UINT32_MAX; > - stats->n_mask_hit = UINT64_MAX; > > return 0; > } > @@ -3209,28 +3136,28 @@ cycles_count_start(struct dp_netdev_pmd_thread *pmd) > /* Stop counting cycles and add them to the counter 'type' */ > static inline void > cycles_count_end(struct dp_netdev_pmd_thread *pmd, > - enum pmd_cycles_counter_type type) > + enum pmd_stat_type type) > OVS_RELEASES(&cycles_counter_fake_mutex) > OVS_NO_THREAD_SAFETY_ANALYSIS > { > unsigned long long interval = cycles_counter() - pmd->ctx.last_cycles; > > - non_atomic_ullong_add(&pmd->cycles.n[type], interval); > + pmd_perf_update_counter(&pmd->perf_stats, type, interval); > } > > /* Calculate the intermediate cycle result and add to the counter 'type' */ > static inline void > cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, > struct dp_netdev_rxq *rxq, > - enum pmd_cycles_counter_type type) > + enum pmd_stat_type type) > OVS_NO_THREAD_SAFETY_ANALYSIS > { > unsigned long long new_cycles = cycles_counter(); > unsigned long long interval = new_cycles - pmd->ctx.last_cycles; > pmd->ctx.last_cycles = new_cycles; > > - non_atomic_ullong_add(&pmd->cycles.n[type], interval); > - if (rxq && (type == PMD_CYCLES_PROCESSING)) { > + pmd_perf_update_counter(&pmd->perf_stats, type, interval); > + if (rxq && (type == PMD_CYCLES_POLL_BUSY)) { > /* Add to the amount of current processing cycles. */ > non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval); > } > @@ -3289,8 +3216,8 @@ dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd, > netdev_send(p->port->netdev, tx_qid, &p->output_pkts, dynamic_txqs); > dp_packet_batch_init(&p->output_pkts); > > - dp_netdev_count_packet(pmd, DP_STAT_SENT_PKTS, output_cnt); > - dp_netdev_count_packet(pmd, DP_STAT_SENT_BATCHES, 1); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SENT_PKTS, output_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SENT_BATCHES, 1); > } > > static void > @@ -3971,12 +3898,12 @@ dpif_netdev_run(struct dpif *dpif) > port->port_no); > cycles_count_intermediate(non_pmd, NULL, > process_packets > - ? PMD_CYCLES_PROCESSING > - : PMD_CYCLES_IDLE); > + ? PMD_CYCLES_POLL_BUSY > + : PMD_CYCLES_POLL_IDLE); > } > } > } > - cycles_count_end(non_pmd, PMD_CYCLES_IDLE); > + cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE); > pmd_thread_ctx_time_update(non_pmd); > dpif_netdev_xps_revalidate_pmd(non_pmd, false); > ovs_mutex_unlock(&dp->non_pmd_mutex); > @@ -4121,6 +4048,7 @@ static void * > pmd_thread_main(void *f_) > { > struct dp_netdev_pmd_thread *pmd = f_; > + struct pmd_perf_stats *s = &pmd->perf_stats; > unsigned int lc = 0; > struct polled_queue *poll_list; > bool exiting; > @@ -4156,13 +4084,17 @@ reload: > > cycles_count_start(pmd); > for (;;) { > + uint64_t iter_packets = 0; > + pmd_perf_start_iteration(s, pmd->ctx.last_cycles); > for (i = 0; i < poll_cnt; i++) { > process_packets = > dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, > poll_list[i].port_no); > cycles_count_intermediate(pmd, poll_list[i].rxq, > - process_packets ? PMD_CYCLES_PROCESSING > - : PMD_CYCLES_IDLE); > + process_packets > + ? PMD_CYCLES_POLL_BUSY > + : PMD_CYCLES_POLL_IDLE); > + iter_packets += process_packets; > } > > if (lc++ > 1024) { > @@ -4183,10 +4115,12 @@ reload: > if (reload) { > break; > } > + cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD); > } > + pmd_perf_end_iteration(s, pmd->ctx.last_cycles, iter_packets); > } > > - cycles_count_end(pmd, PMD_CYCLES_IDLE); > + cycles_count_end(pmd, PMD_CYCLES_OVERHEAD); > > poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); > exiting = latch_is_set(&pmd->exit_latch); > @@ -4638,6 +4572,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, > emc_cache_init(&pmd->flow_cache); > pmd_alloc_static_tx_qid(pmd); > } > + pmd_perf_stats_init(&pmd->perf_stats); > cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node), > hash_int(core_id, 0)); > } > @@ -4838,13 +4773,6 @@ dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow, int cnt, int size, > atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags); > } > > -static void > -dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd, > - enum dp_stat_type type, int cnt) > -{ > - non_atomic_ullong_add(&pmd->stats.n[type], cnt); > -} > - > static int > dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_, > struct flow *flow, struct flow_wildcards *wc, ovs_u128 *ufid, > @@ -5017,6 +4945,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, > int i; > > atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min); > + pmd_perf_update_counter(&pmd->perf_stats, > + md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV, > + cnt); > > DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) { > struct dp_netdev_flow *flow; > @@ -5065,18 +4996,17 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, > } > } > > - dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT, > - cnt - n_dropped - n_missed); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT, > + cnt - n_dropped - n_missed); > > return dp_packet_batch_size(packets_); > } > > -static inline void > +static inline int > handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, > struct dp_packet *packet, > const struct netdev_flow_key *key, > - struct ofpbuf *actions, struct ofpbuf *put_actions, > - int *lost_cnt) > + struct ofpbuf *actions, struct ofpbuf *put_actions) > { > struct ofpbuf *add_actions; > struct dp_packet_batch b; > @@ -5096,8 +5026,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, > put_actions); > if (OVS_UNLIKELY(error && error != ENOSPC)) { > dp_packet_delete(packet); > - (*lost_cnt)++; > - return; > + return error; > } > > /* The Netlink encoding of datapath flow keys cannot express > @@ -5137,6 +5066,9 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, > ovs_mutex_unlock(&pmd->flow_mutex); > emc_probabilistic_insert(pmd, key, netdev_flow); > } > + /* Only error ENOSPC can reach here. We process the packet but do not > + * install a datapath flow. Treat as successful. */ > + return 0; > } > > static inline void > @@ -5158,7 +5090,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > struct dpcls *cls; > struct dpcls_rule *rules[PKT_ARRAY_SIZE]; > struct dp_netdev *dp = pmd->dp; > - int miss_cnt = 0, lost_cnt = 0; > + int upcall_ok_cnt = 0, upcall_fail_cnt = 0; > int lookup_cnt = 0, add_lookup_cnt; > bool any_miss; > size_t i; > @@ -5200,9 +5132,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > continue; > } > > - miss_cnt++; > - handle_packet_upcall(pmd, packet, &keys[i], &actions, > - &put_actions, &lost_cnt); > + int error = handle_packet_upcall(pmd, packet, &keys[i], > + &actions, &put_actions); > + > + if (OVS_UNLIKELY(error)) { > + upcall_fail_cnt++; > + } else { > + upcall_ok_cnt++; > + } > } > > ofpbuf_uninit(&actions); > @@ -5212,8 +5149,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > DP_PACKET_BATCH_FOR_EACH (packet, packets_) { > if (OVS_UNLIKELY(!rules[i])) { > dp_packet_delete(packet); > - lost_cnt++; > - miss_cnt++; > + upcall_fail_cnt++; > } > } > } > @@ -5231,10 +5167,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches); > } > > - dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt); > - dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt); > - dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt); > - dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT, > + cnt - upcall_ok_cnt - upcall_fail_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_LOOKUP, > + lookup_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS, > + upcall_ok_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST, > + upcall_fail_cnt); > } > > /* Packets enter the datapath from a port (or from recirculation) here. > diff --git a/tests/pmd.at b/tests/pmd.at > index e39a23a..0356f87 100644 > --- a/tests/pmd.at > +++ b/tests/pmd.at > @@ -170,13 +170,16 @@ dummy@ovs-dummy: hit:0 missed:0 > p0 7/1: (dummy-pmd: configured_rx_queues=4, configured_tx_queues=<cleared>, requested_rx_queues=4, requested_tx_queues=<cleared>) > ]) > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl > pmd thread numa_id <cleared> core_id <cleared>: > + packets received:0 > + packet recirculations:0 > + avg. datapath passes per packet:0.00 > emc hits:0 > megaflow hits:0 > - avg. subtable lookups per hit:0.00 > - miss:0 > - lost:0 > + avg. subtable lookups per megaflow hit:0.00 > + miss(i.e. lookup miss with success upcall):0 > + lost(i.e. lookup miss with failed upcall):0 > ]) > > ovs-appctl time/stop > @@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | strip_xout], [0], [dnl > recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), actions: <del> > ]) > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl > pmd thread numa_id <cleared> core_id <cleared>: > + packets received:20 > + packet recirculations:0 > + avg. datapath passes per packet:1.00 > emc hits:19 > megaflow hits:0 > - avg. subtable lookups per hit:0.00 > - miss:1 > - lost:0 > + avg. subtable lookups per megaflow hit:0.00 > + miss(i.e. lookup miss with success upcall):1 > + lost(i.e. lookup miss with failed upcall):0 > ]) > > OVS_VSWITCHD_STOP > In addition to not updated tests above, I also see failures of test "1165. ofproto-dpif.at:7478: testing ofproto-dpif - patch ports": ./ofproto-dpif.at:7514: ovs-appctl dpif/show --- - 2018-01-11 15:06:46.324839417 +0300 +++ /tests/testsuite.dir/at-groups/1165/stdout @@ -1,4 +1,4 @@ -dummy@ovs-dummy: hit:13 missed:2 +dummy@ovs-dummy: hit:0 missed:2 br0: br0 65534/100: (dummy-internal) p2 2/2: (dummy)
On 11.01.2018 15:15, Ilya Maximets wrote: > I guess, this version was compile tested only. > > Comments inline. > > Best regards, Ilya Maximets. > > On 11.01.2018 10:39, Jan Scheurich wrote: >> Add module dpif-netdev-perf to host all PMD performance-related >> data structures and functions in dpif-netdev. Refactor the PMD >> stats handling in dpif-netdev and delegate whatever possible into >> the new module, using clean interfaces to shield dpif-netdev from >> the implementation details. Accordingly, the all PMD statistics >> members are moved from the main struct dp_netdev_pmd_thread into >> a dedicated member of type struct pmd_perf_stats. >> >> Include Darrel's prior refactoring of PMD stats contained in >> [PATCH v5,2/3] dpif-netdev: Refactor some pmd stats: >> >> 1. The cycles per packet counts are now based on packets >> received rather than packet passes through the datapath. >> >> 2. Packet counters are now kept for packets received and >> packets recirculated. These are kept as separate counters for >> maintainability reasons. The cost of incrementing these counters >> is negligible. These new counters are also displayed to the user. >> >> 3. A display statistic is added for the average number of >> datapath passes per packet. This should be useful for user >> debugging and understanding of packet processing. >> >> 4. The user visible 'miss' counter is used for successful upcalls, >> rather than the sum of sucessful and unsuccessful upcalls. Hence, >> this becomes what user historically understands by OVS 'miss upcall'. >> The user display is annotated to make this clear as well. >> >> 5. The user visible 'lost' counter remains as failed upcalls, but >> is annotated to make it clear what the meaning is. >> >> 6. The enum pmd_stat_type is annotated to make the usage of the >> stats counters clear. >> >> 7. The subtable lookup stats is renamed to make it clear that it >> relates to masked lookups. >> >> 8. The PMD stats test is updated to handle the new user stats of >> packets received, packets recirculated and average number of datapath >> passes per packet. >> >> On top of that introduce a "-pmd <core>" option to the PMD info >> commands to filter the output for a single PMD. >> >> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> >> Co-authored-by: Darrell Ball <dlu998@gmail.com> >> Signed-off-by: Darrell Ball <dlu998@gmail.com> >> --- >> lib/automake.mk | 2 + >> lib/dpif-netdev-perf.c | 67 +++++++++ >> lib/dpif-netdev-perf.h | 151 +++++++++++++++++++++ >> lib/dpif-netdev.c | 360 +++++++++++++++++++++---------------------------- >> tests/pmd.at | 22 +-- >> 5 files changed, 384 insertions(+), 218 deletions(-) >> create mode 100644 lib/dpif-netdev-perf.c >> create mode 100644 lib/dpif-netdev-perf.h >> >> diff --git a/lib/automake.mk b/lib/automake.mk >> index 4b38a11..159319f 100644 >> --- a/lib/automake.mk >> +++ b/lib/automake.mk >> @@ -80,6 +80,8 @@ lib_libopenvswitch_la_SOURCES = \ >> lib/dpdk.h \ >> lib/dpif-netdev.c \ >> lib/dpif-netdev.h \ >> + lib/dpif-netdev-perf.c \ >> + lib/dpif-netdev-perf.h \ >> lib/dpif-provider.h \ >> lib/dpif.c \ >> lib/dpif.h \ >> diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c >> new file mode 100644 >> index 0000000..f853fd8 >> --- /dev/null >> +++ b/lib/dpif-netdev-perf.c >> @@ -0,0 +1,67 @@ >> +/* >> + * Copyright (c) 2017 Ericsson AB. >> + * >> + * Licensed under the Apache License, Version 2.0 (the "License"); >> + * you may not use this file except in compliance with the License. >> + * You may obtain a copy of the License at: >> + * >> + * http://www.apache.org/licenses/LICENSE-2.0 >> + * >> + * Unless required by applicable law or agreed to in writing, software >> + * distributed under the License is distributed on an "AS IS" BASIS, >> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. >> + * See the License for the specific language governing permissions and >> + * limitations under the License. >> + */ >> + >> +#include <config.h> >> + >> +#ifdef DPDK_NETDEV >> +#include <rte_cycles.h> >> +#endif >> + >> +#include "openvswitch/dynamic-string.h" >> +#include "openvswitch/vlog.h" >> +#include "dpif-netdev-perf.h" >> +#include "timeval.h" >> + >> +VLOG_DEFINE_THIS_MODULE(pmd_perf); >> + >> +void >> +pmd_perf_stats_init(struct pmd_perf_stats *s) >> +{ >> + memset(s, 0 , sizeof(*s)); >> + s->start_ms = time_msec(); >> +} >> + >> +void >> +pmd_perf_read_counters(struct pmd_perf_stats *s, >> + uint64_t stats[PMD_N_STATS]) >> +{ >> + uint64_t val; >> + >> + /* These loops subtracts reference values (.zero[*]) from the counters. >> + * Since loads and stores are relaxed, it might be possible for a .zero[*] >> + * value to be more recent than the current value we're reading from the >> + * counter. This is not a big problem, since these numbers are not >> + * supposed to be 100% accurate, but we should at least make sure that >> + * the result is not negative. */ >> + for (int i = 0; i < PMD_N_STATS; i++) { >> + atomic_read_relaxed(&s->counters.n[i], &val); >> + if (val > s->counters.zero[i]) { >> + stats[i] = val - s->counters.zero[i]; >> + } else { >> + stats[i] = 0; >> + } >> + } >> +} >> + >> +void >> +pmd_perf_stats_clear(struct pmd_perf_stats *s) >> +{ >> + s->start_ms = 0; /* Marks start of clearing. */ >> + for (int i = 0; i < PMD_N_STATS; i++) { >> + atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]); >> + } >> + s->start_ms = time_msec(); /* Clearing finished. */ >> +} >> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h >> new file mode 100644 >> index 0000000..41f4e85 >> --- /dev/null >> +++ b/lib/dpif-netdev-perf.h >> @@ -0,0 +1,151 @@ >> +/* >> + * Copyright (c) 2017 Ericsson AB. >> + * >> + * Licensed under the Apache License, Version 2.0 (the "License"); >> + * you may not use this file except in compliance with the License. >> + * You may obtain a copy of the License at: >> + * >> + * http://www.apache.org/licenses/LICENSE-2.0 >> + * >> + * Unless required by applicable law or agreed to in writing, software >> + * distributed under the License is distributed on an "AS IS" BASIS, >> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. >> + * See the License for the specific language governing permissions and >> + * limitations under the License. >> + */ >> + >> +#ifndef DPIF_NETDEV_PERF_H >> +#define DPIF_NETDEV_PERF_H 1 >> + >> +#include <stdbool.h> >> +#include <stddef.h> >> +#include <stdint.h> >> +#include <string.h> >> +#include <math.h> >> + >> +#include "openvswitch/vlog.h" >> +#include "ovs-atomic.h" >> +#include "timeval.h" >> +#include "unixctl.h" >> +#include "util.h" >> + >> +#ifdef __cplusplus >> +extern "C" { >> +#endif >> + >> +/* This module encapsulates data structures and functions to maintain PMD >> + * performance metrics such as packet counters, execution cycles. It >> + * provides a clean API for dpif-netdev to initialize, update and read and >> + * reset these metrics. >> + */ >> + >> +/* Set of counter types maintained in pmd_perf_stats. */ >> + >> +enum pmd_stat_type { >> + PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ >> + PMD_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ >> + PMD_STAT_MISS, /* Packets that did not match and upcall was ok. */ >> + PMD_STAT_LOST, /* Packets that did not match and upcall failed. */ >> + /* The above statistics account for the total >> + * number of packet passes through the datapath >> + * pipeline and should not be overlapping with each >> + * other. */ >> + PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table >> + hits. Each MASKED_HIT hit will have >= 1 >> + MASKED_LOOKUP(s). */ >> + PMD_STAT_RECV, /* Packets entering the datapath pipeline from an >> + * interface. */ >> + PMD_STAT_RECIRC, /* Packets reentering the datapath pipeline due to >> + * recirculation. */ >> + PMD_STAT_SENT_PKTS, /* Packets that have been sent. */ >> + PMD_STAT_SENT_BATCHES, /* Number of batches sent. */ >> + PMD_CYCLES_POLL_IDLE, /* Cycles spent unsuccessful polling. */ >> + PMD_CYCLES_POLL_BUSY, /* Cycles spent successfully polling and >> + * processing polled packets. */ >> + PMD_CYCLES_OVERHEAD, /* Cycles spent for other tasks. */ >> + PMD_CYCLES_ITER_IDLE, /* Cycles spent in idle iterations. */ >> + PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */ >> + PMD_N_STATS >> +}; >> + >> +/* Array of PMD counters indexed by enum pmd_stat_type. >> + * The n[] array contains the actual counter values since initialization >> + * of the PMD. Counters are atomically updated from the PMD but are >> + * read and cleared also from other processes. To clear the counters at >> + * PMD run-time, the current counter values are copied over to the zero[] >> + * array. To read counters we subtract zero[] value from n[]. */ >> + >> +struct pmd_counters { >> + atomic_uint64_t n[PMD_N_STATS]; /* Value since _init(). */ >> + uint64_t zero[PMD_N_STATS]; /* Value at last _clear(). */ >> +}; >> + >> +/* Container for all performance metrics of a PMD. >> + * Part of the struct dp_netdev_pmd_thread. */ >> + >> +struct pmd_perf_stats { >> + /* Start of the current performance measurement period. >> + * While a new measurement period is being started with >> + * ovs-appctl dpif-netdev/pmd-stats-clear, start_ms is set >> + * to zero to lock out operations from accessing inconsistent >> + * data. */ >> + uint64_t start_ms; >> + /* Start of the current PMD iteration in TSC cycles.*/ >> + uint64_t last_tsc; >> + /* Set of PMD counters with their zero offsets. */ >> + struct pmd_counters counters; >> +}; >> + >> +void pmd_perf_stats_init(struct pmd_perf_stats *s); >> +void pmd_perf_stats_clear(struct pmd_perf_stats *s); >> +void pmd_perf_read_counters(struct pmd_perf_stats *s, >> + uint64_t stats[PMD_N_STATS]); >> + >> +/* PMD performance counters are updated lock-less. For real PMDs >> + * they are only updated from the PMD thread itself. In the case of the >> + * NON-PMD they might be updated from multiple threads, but we can live >> + * with losing a rare update as 100% accuracy is not required. >> + * However, as counters are read for display from outside the PMD thread >> + * with e.g. pmd-stats-show, we make sure that the 64-bit read and store >> + * operations are atomic also on 32-bit systems so that readers cannot >> + * not read garbage. On 64-bit systems this incurs no overhead. */ >> + >> +static inline void >> +pmd_perf_update_counter(struct pmd_perf_stats *s, >> + enum pmd_stat_type counter, int delta) >> +{ >> + uint64_t tmp; >> + atomic_read_relaxed(&s->counters.n[counter], &tmp); >> + tmp += delta; >> + atomic_store_relaxed(&s->counters.n[counter], tmp); >> +} >> + >> +static inline void >> +pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc) >> +{ >> + if (OVS_UNLIKELY(s->start_ms == 0)) { >> + /* Stats not yet fully initialized. */ >> + return; >> + } >> + s->last_tsc = now_tsc; >> +} >> + >> +static inline void >> +pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc, >> + int packets) >> +{ >> + uint64_t cycles = now_tsc - s->last_tsc; >> + >> + /* No need for atomic updates as only invoked within PMD thread. */ >> + if (packets > 0) { >> + s->counters.n[PMD_CYCLES_ITER_BUSY] += cycles; >> + } else { >> + s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles; >> + } >> +} >> + >> +#ifdef __cplusplus >> +} >> +#endif >> + >> +#endif /* DPIF_NETDEV_PERF_H */ >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index c7d157a..538d5ce 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -44,6 +44,7 @@ >> #include "csum.h" >> #include "dp-packet.h" >> #include "dpif.h" >> +#include "dpif-netdev-perf.h" >> #include "dpif-provider.h" >> #include "dummy.h" >> #include "fat-rwlock.h" >> @@ -331,25 +332,6 @@ static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp, >> odp_port_t) >> OVS_REQUIRES(dp->port_mutex); >> >> -enum dp_stat_type { >> - DP_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ >> - DP_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ >> - DP_STAT_MISS, /* Packets that did not match. */ >> - DP_STAT_LOST, /* Packets not passed up to the client. */ >> - DP_STAT_LOOKUP_HIT, /* Number of subtable lookups for flow table >> - hits */ >> - DP_STAT_SENT_PKTS, /* Packets that has been sent. */ >> - DP_STAT_SENT_BATCHES, /* Number of batches sent. */ >> - DP_N_STATS >> -}; >> - >> -enum pmd_cycles_counter_type { >> - PMD_CYCLES_IDLE, /* Cycles spent idle or unsuccessful polling */ >> - PMD_CYCLES_PROCESSING, /* Cycles spent successfully polling and >> - * processing polled packets */ >> - PMD_N_CYCLES >> -}; >> - >> enum rxq_cycles_counter_type { >> RXQ_CYCLES_PROC_CURR, /* Cycles spent successfully polling and >> processing packets during the current >> @@ -499,21 +481,6 @@ struct dp_netdev_actions *dp_netdev_flow_get_actions( >> const struct dp_netdev_flow *); >> static void dp_netdev_actions_free(struct dp_netdev_actions *); >> >> -/* Contained by struct dp_netdev_pmd_thread's 'stats' member. */ >> -struct dp_netdev_pmd_stats { >> - /* Indexed by DP_STAT_*. */ >> - atomic_ullong n[DP_N_STATS]; >> -}; >> - >> -/* Contained by struct dp_netdev_pmd_thread's 'cycle' member. */ >> -struct dp_netdev_pmd_cycles { >> - /* Indexed by PMD_CYCLES_*. */ >> - atomic_ullong n[PMD_N_CYCLES]; >> -}; >> - >> -static void dp_netdev_count_packet(struct dp_netdev_pmd_thread *, >> - enum dp_stat_type type, int cnt); >> - >> struct polled_queue { >> struct dp_netdev_rxq *rxq; >> odp_port_t port_no; >> @@ -595,12 +562,6 @@ struct dp_netdev_pmd_thread { >> are stored for each polled rxq. */ >> long long int rxq_next_cycle_store; >> >> - /* Statistics. */ >> - struct dp_netdev_pmd_stats stats; >> - >> - /* Cycles counters */ >> - struct dp_netdev_pmd_cycles cycles; >> - >> /* Current context of the PMD thread. */ >> struct dp_netdev_pmd_thread_ctx ctx; >> >> @@ -638,12 +599,8 @@ struct dp_netdev_pmd_thread { >> struct hmap tnl_port_cache; >> struct hmap send_port_cache; >> >> - /* Only a pmd thread can write on its own 'cycles' and 'stats'. >> - * The main thread keeps 'stats_zero' and 'cycles_zero' as base >> - * values and subtracts them from 'stats' and 'cycles' before >> - * reporting to the user */ >> - unsigned long long stats_zero[DP_N_STATS]; >> - uint64_t cycles_zero[PMD_N_CYCLES]; >> + /* Keep track of detailed PMD performance statistics. */ >> + struct pmd_perf_stats perf_stats; >> >> /* Set to true if the pmd thread needs to be reloaded. */ >> bool need_reload; >> @@ -833,47 +790,10 @@ enum pmd_info_type { >> }; >> >> static void >> -pmd_info_show_stats(struct ds *reply, >> - struct dp_netdev_pmd_thread *pmd, >> - unsigned long long stats[DP_N_STATS], >> - uint64_t cycles[PMD_N_CYCLES]) >> +format_pmd_thread(struct ds *reply, struct dp_netdev_pmd_thread *pmd) >> { >> - unsigned long long total_packets; >> - uint64_t total_cycles = 0; >> - double lookups_per_hit = 0, packets_per_batch = 0; >> - int i; >> - >> - /* These loops subtracts reference values ('*_zero') from the counters. >> - * Since loads and stores are relaxed, it might be possible for a '*_zero' >> - * value to be more recent than the current value we're reading from the >> - * counter. This is not a big problem, since these numbers are not >> - * supposed to be too accurate, but we should at least make sure that >> - * the result is not negative. */ >> - for (i = 0; i < DP_N_STATS; i++) { >> - if (stats[i] > pmd->stats_zero[i]) { >> - stats[i] -= pmd->stats_zero[i]; >> - } else { >> - stats[i] = 0; >> - } >> - } >> - >> - /* Sum of all the matched and not matched packets gives the total. */ >> - total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT] >> - + stats[DP_STAT_MISS]; >> - >> - for (i = 0; i < PMD_N_CYCLES; i++) { >> - if (cycles[i] > pmd->cycles_zero[i]) { >> - cycles[i] -= pmd->cycles_zero[i]; >> - } else { >> - cycles[i] = 0; >> - } >> - >> - total_cycles += cycles[i]; >> - } >> - >> ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID) >> ? "main thread" : "pmd thread"); >> - >> if (pmd->numa_id != OVS_NUMA_UNSPEC) { >> ds_put_format(reply, " numa_id %d", pmd->numa_id); >> } >> @@ -881,23 +801,52 @@ pmd_info_show_stats(struct ds *reply, >> ds_put_format(reply, " core_id %u", pmd->core_id); >> } >> ds_put_cstr(reply, ":\n"); >> +} >> + >> +static void >> +pmd_info_show_stats(struct ds *reply, >> + struct dp_netdev_pmd_thread *pmd) >> +{ >> + uint64_t stats[PMD_N_STATS]; >> + uint64_t total_cycles, total_packets; >> + double passes_per_pkt = 0; >> + double lookups_per_hit = 0; >> + double packets_per_batch = 0; >> + >> + pmd_perf_read_counters(&pmd->perf_stats, stats); >> + total_cycles = stats[PMD_CYCLES_ITER_IDLE] >> + + stats[PMD_CYCLES_ITER_BUSY]; >> + total_packets = stats[PMD_STAT_RECV]; >> + >> + format_pmd_thread(reply, pmd); >> >> - if (stats[DP_STAT_MASKED_HIT] > 0) { >> - lookups_per_hit = stats[DP_STAT_LOOKUP_HIT] >> - / (double) stats[DP_STAT_MASKED_HIT]; >> + if (total_packets > 0) { >> + passes_per_pkt = (total_packets + stats[PMD_STAT_RECIRC]) >> + / (double) total_packets; >> } >> - if (stats[DP_STAT_SENT_BATCHES] > 0) { >> - packets_per_batch = stats[DP_STAT_SENT_PKTS] >> - / (double) stats[DP_STAT_SENT_BATCHES]; >> + if (stats[PMD_STAT_MASKED_HIT] > 0) { >> + lookups_per_hit = stats[PMD_STAT_MASKED_LOOKUP] >> + / (double) stats[PMD_STAT_MASKED_HIT]; >> + } >> + if (stats[PMD_STAT_SENT_BATCHES] > 0) { >> + packets_per_batch = stats[PMD_STAT_SENT_PKTS] >> + / (double) stats[PMD_STAT_SENT_BATCHES]; >> } >> >> ds_put_format(reply, >> - "\temc hits:%llu\n\tmegaflow hits:%llu\n" >> - "\tavg. subtable lookups per hit:%.2f\n" >> - "\tmiss:%llu\n\tlost:%llu\n" >> - "\tavg. packets per output batch: %.2f\n", >> - stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT], >> - lookups_per_hit, stats[DP_STAT_MISS], stats[DP_STAT_LOST], >> + "\tpackets received:%"PRIu64"\n" >> + "\tpacket recirculations:%"PRIu64"\n" >> + "\tavg. datapath passes per packet:%.02f\n" >> + "\temc hits:%"PRIu64"\n" >> + "\tmegaflow hits:%"PRIu64"\n" >> + "\tavg. subtable lookups per megaflow hit:%.02f\n" >> + "\tmiss with success upcall:%"PRIu64"\n" >> + "\tmiss with failed upcall:%"PRIu64"\n" >> + "\tavg. packets per output batch:%.02f\n", >> + total_packets, stats[PMD_STAT_RECIRC], >> + passes_per_pkt, stats[PMD_STAT_EXACT_HIT], >> + stats[PMD_STAT_MASKED_HIT], lookups_per_hit, >> + stats[PMD_STAT_MISS], stats[PMD_STAT_LOST], >> packets_per_batch); >> >> if (total_cycles == 0) { >> @@ -907,46 +856,25 @@ pmd_info_show_stats(struct ds *reply, >> ds_put_format(reply, >> "\tidle cycles:%"PRIu64" (%.02f%%)\n" >> "\tprocessing cycles:%"PRIu64" (%.02f%%)\n", >> - cycles[PMD_CYCLES_IDLE], >> - cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100, >> - cycles[PMD_CYCLES_PROCESSING], >> - cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100); >> + stats[PMD_CYCLES_ITER_IDLE], >> + stats[PMD_CYCLES_ITER_IDLE] / (double) total_cycles * 100, >> + stats[PMD_CYCLES_ITER_BUSY], >> + stats[PMD_CYCLES_ITER_BUSY] / (double) total_cycles * 100); >> >> if (total_packets == 0) { >> return; >> } >> >> ds_put_format(reply, >> - "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n", >> - total_cycles / (double)total_packets, >> + "\tavg cycles per packet:%.02f (%"PRIu64"/%"PRIu64")\n", >> + total_cycles / (double) total_packets, >> total_cycles, total_packets); >> >> ds_put_format(reply, >> - "\tavg processing cycles per packet: " >> - "%.02f (%"PRIu64"/%llu)\n", >> - cycles[PMD_CYCLES_PROCESSING] / (double)total_packets, >> - cycles[PMD_CYCLES_PROCESSING], total_packets); >> -} >> - >> -static void >> -pmd_info_clear_stats(struct ds *reply OVS_UNUSED, >> - struct dp_netdev_pmd_thread *pmd, >> - unsigned long long stats[DP_N_STATS], >> - uint64_t cycles[PMD_N_CYCLES]) >> -{ >> - int i; >> - >> - /* We cannot write 'stats' and 'cycles' (because they're written by other >> - * threads) and we shouldn't change 'stats' (because they're used to count >> - * datapath stats, which must not be cleared here). Instead, we save the >> - * current values and subtract them from the values to be displayed in the >> - * future */ >> - for (i = 0; i < DP_N_STATS; i++) { >> - pmd->stats_zero[i] = stats[i]; >> - } >> - for (i = 0; i < PMD_N_CYCLES; i++) { >> - pmd->cycles_zero[i] = cycles[i]; >> - } >> + "\tavg processing cycles per packet:" >> + "%.02f (%"PRIu64"/%"PRIu64")\n", >> + stats[PMD_CYCLES_ITER_BUSY] / (double) total_packets, >> + stats[PMD_CYCLES_ITER_BUSY], total_packets); >> } >> >> static int >> @@ -1106,23 +1034,37 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], >> struct ds reply = DS_EMPTY_INITIALIZER; >> struct dp_netdev_pmd_thread **pmd_list; >> struct dp_netdev *dp = NULL; >> - size_t n; >> enum pmd_info_type type = *(enum pmd_info_type *) aux; >> + unsigned int core_id; >> + bool filter_on_pmd = false; >> + size_t n; >> >> ovs_mutex_lock(&dp_netdev_mutex); >> >> - if (argc == 2) { >> - dp = shash_find_data(&dp_netdevs, argv[1]); >> - } else if (shash_count(&dp_netdevs) == 1) { >> - /* There's only one datapath */ >> - dp = shash_first(&dp_netdevs)->data; >> + while (argc > 0) { > > This causes OVS crash on any invoked appctl. > Should be "argc > 1" > >> + if (!strcmp(argv[1], "-pmd") && argc >= 2) { >> + if (str_to_uint(argv[2], 10, &core_id)) { >> + filter_on_pmd = true; >> + } >> + argc -= 2; >> + argv += 2; >> + } else { >> + dp = shash_find_data(&dp_netdevs, argv[1]); >> + argc -= 1; >> + argv += 1; >> + } >> } >> >> if (!dp) { >> - ovs_mutex_unlock(&dp_netdev_mutex); >> - unixctl_command_reply_error(conn, >> - "please specify an existing datapath"); >> - return; >> + if (shash_count(&dp_netdevs) == 1) { >> + /* There's only one datapath */ >> + dp = shash_first(&dp_netdevs)->data; >> + } else { >> + ovs_mutex_unlock(&dp_netdev_mutex); >> + unixctl_command_reply_error(conn, >> + "please specify an existing datapath"); >> + return; >> + } >> } >> >> sorted_poll_thread_list(dp, &pmd_list, &n); >> @@ -1131,26 +1073,15 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], >> if (!pmd) { >> break; >> } >> - >> + if (filter_on_pmd && pmd->core_id != core_id) { >> + continue; >> + } >> if (type == PMD_INFO_SHOW_RXQ) { >> pmd_info_show_rxq(&reply, pmd); >> - } else { >> - unsigned long long stats[DP_N_STATS]; >> - uint64_t cycles[PMD_N_CYCLES]; >> - >> - /* Read current stats and cycle counters */ >> - for (size_t j = 0; j < ARRAY_SIZE(stats); j++) { >> - atomic_read_relaxed(&pmd->stats.n[j], &stats[j]); >> - } >> - for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) { >> - atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]); >> - } >> - >> - if (type == PMD_INFO_CLEAR_STATS) { >> - pmd_info_clear_stats(&reply, pmd, stats, cycles); >> - } else if (type == PMD_INFO_SHOW_STATS) { >> - pmd_info_show_stats(&reply, pmd, stats, cycles); >> - } >> + } else if (type == PMD_INFO_CLEAR_STATS) { >> + pmd_perf_stats_clear(&pmd->perf_stats); >> + } else if (type == PMD_INFO_SHOW_STATS) { >> + pmd_info_show_stats(&reply, pmd); >> } >> } >> free(pmd_list); >> @@ -1168,14 +1099,14 @@ dpif_netdev_init(void) >> clear_aux = PMD_INFO_CLEAR_STATS, >> poll_aux = PMD_INFO_SHOW_RXQ; >> >> - unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]", >> - 0, 1, dpif_netdev_pmd_info, >> + unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core] [dp]", >> + 0, 2, dpif_netdev_pmd_info, >> (void *)&show_aux); >> - unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]", >> - 0, 1, dpif_netdev_pmd_info, >> + unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core] [dp]", >> + 0, 2, dpif_netdev_pmd_info, >> (void *)&clear_aux); >> - unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]", >> - 0, 1, dpif_netdev_pmd_info, >> + unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] [dp]", >> + 0, 2, dpif_netdev_pmd_info, >> (void *)&poll_aux); >> unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", >> 0, 1, dpif_netdev_pmd_rebalance, >> @@ -1511,23 +1442,19 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) >> { >> struct dp_netdev *dp = get_dp_netdev(dpif); >> struct dp_netdev_pmd_thread *pmd; >> + uint64_t pmd_stats[PMD_N_STATS]; >> >> - stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost = 0; >> + stats->n_flows = stats->n_hit = >> + stats->n_mask_hit = stats->n_missed = stats->n_lost = 0; >> CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { >> - unsigned long long n; >> stats->n_flows += cmap_count(&pmd->flow_table); >> - >> - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n); >> - stats->n_hit += n; >> - atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n); >> - stats->n_hit += n; >> - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n); >> - stats->n_missed += n; >> - atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n); >> - stats->n_lost += n; >> + pmd_perf_read_counters(&pmd->perf_stats, pmd_stats); >> + stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT]; >> + stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT]; >> + stats->n_missed += pmd_stats[PMD_STAT_MISS]; >> + stats->n_lost += pmd_stats[PMD_STAT_LOST]; >> } >> stats->n_masks = UINT32_MAX; >> - stats->n_mask_hit = UINT64_MAX; >> >> return 0; >> } >> @@ -3209,28 +3136,28 @@ cycles_count_start(struct dp_netdev_pmd_thread *pmd) >> /* Stop counting cycles and add them to the counter 'type' */ >> static inline void >> cycles_count_end(struct dp_netdev_pmd_thread *pmd, >> - enum pmd_cycles_counter_type type) >> + enum pmd_stat_type type) >> OVS_RELEASES(&cycles_counter_fake_mutex) >> OVS_NO_THREAD_SAFETY_ANALYSIS >> { >> unsigned long long interval = cycles_counter() - pmd->ctx.last_cycles; >> >> - non_atomic_ullong_add(&pmd->cycles.n[type], interval); >> + pmd_perf_update_counter(&pmd->perf_stats, type, interval); >> } >> >> /* Calculate the intermediate cycle result and add to the counter 'type' */ >> static inline void >> cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, >> struct dp_netdev_rxq *rxq, >> - enum pmd_cycles_counter_type type) >> + enum pmd_stat_type type) >> OVS_NO_THREAD_SAFETY_ANALYSIS >> { >> unsigned long long new_cycles = cycles_counter(); >> unsigned long long interval = new_cycles - pmd->ctx.last_cycles; >> pmd->ctx.last_cycles = new_cycles; >> >> - non_atomic_ullong_add(&pmd->cycles.n[type], interval); >> - if (rxq && (type == PMD_CYCLES_PROCESSING)) { >> + pmd_perf_update_counter(&pmd->perf_stats, type, interval); >> + if (rxq && (type == PMD_CYCLES_POLL_BUSY)) { >> /* Add to the amount of current processing cycles. */ >> non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval); >> } >> @@ -3289,8 +3216,8 @@ dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd, >> netdev_send(p->port->netdev, tx_qid, &p->output_pkts, dynamic_txqs); >> dp_packet_batch_init(&p->output_pkts); >> >> - dp_netdev_count_packet(pmd, DP_STAT_SENT_PKTS, output_cnt); >> - dp_netdev_count_packet(pmd, DP_STAT_SENT_BATCHES, 1); >> + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SENT_PKTS, output_cnt); >> + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SENT_BATCHES, 1); >> } >> >> static void >> @@ -3971,12 +3898,12 @@ dpif_netdev_run(struct dpif *dpif) >> port->port_no); >> cycles_count_intermediate(non_pmd, NULL, >> process_packets >> - ? PMD_CYCLES_PROCESSING >> - : PMD_CYCLES_IDLE); >> + ? PMD_CYCLES_POLL_BUSY >> + : PMD_CYCLES_POLL_IDLE); >> } >> } >> } >> - cycles_count_end(non_pmd, PMD_CYCLES_IDLE); >> + cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE); >> pmd_thread_ctx_time_update(non_pmd); >> dpif_netdev_xps_revalidate_pmd(non_pmd, false); >> ovs_mutex_unlock(&dp->non_pmd_mutex); >> @@ -4121,6 +4048,7 @@ static void * >> pmd_thread_main(void *f_) >> { >> struct dp_netdev_pmd_thread *pmd = f_; >> + struct pmd_perf_stats *s = &pmd->perf_stats; >> unsigned int lc = 0; >> struct polled_queue *poll_list; >> bool exiting; >> @@ -4156,13 +4084,17 @@ reload: >> >> cycles_count_start(pmd); >> for (;;) { >> + uint64_t iter_packets = 0; >> + pmd_perf_start_iteration(s, pmd->ctx.last_cycles); >> for (i = 0; i < poll_cnt; i++) { >> process_packets = >> dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, >> poll_list[i].port_no); >> cycles_count_intermediate(pmd, poll_list[i].rxq, >> - process_packets ? PMD_CYCLES_PROCESSING >> - : PMD_CYCLES_IDLE); >> + process_packets >> + ? PMD_CYCLES_POLL_BUSY >> + : PMD_CYCLES_POLL_IDLE); >> + iter_packets += process_packets; >> } >> >> if (lc++ > 1024) { >> @@ -4183,10 +4115,12 @@ reload: >> if (reload) { >> break; >> } >> + cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD); >> } >> + pmd_perf_end_iteration(s, pmd->ctx.last_cycles, iter_packets); >> } >> >> - cycles_count_end(pmd, PMD_CYCLES_IDLE); >> + cycles_count_end(pmd, PMD_CYCLES_OVERHEAD); >> >> poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); >> exiting = latch_is_set(&pmd->exit_latch); >> @@ -4638,6 +4572,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, >> emc_cache_init(&pmd->flow_cache); >> pmd_alloc_static_tx_qid(pmd); >> } >> + pmd_perf_stats_init(&pmd->perf_stats); >> cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node), >> hash_int(core_id, 0)); >> } >> @@ -4838,13 +4773,6 @@ dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow, int cnt, int size, >> atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags); >> } >> >> -static void >> -dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd, >> - enum dp_stat_type type, int cnt) >> -{ >> - non_atomic_ullong_add(&pmd->stats.n[type], cnt); >> -} >> - >> static int >> dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_, >> struct flow *flow, struct flow_wildcards *wc, ovs_u128 *ufid, >> @@ -5017,6 +4945,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, >> int i; >> >> atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min); >> + pmd_perf_update_counter(&pmd->perf_stats, >> + md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV, >> + cnt); >> >> DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) { >> struct dp_netdev_flow *flow; >> @@ -5065,18 +4996,17 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, >> } >> } >> >> - dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT, >> - cnt - n_dropped - n_missed); >> + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT, >> + cnt - n_dropped - n_missed); >> >> return dp_packet_batch_size(packets_); >> } >> >> -static inline void >> +static inline int >> handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, >> struct dp_packet *packet, >> const struct netdev_flow_key *key, >> - struct ofpbuf *actions, struct ofpbuf *put_actions, >> - int *lost_cnt) >> + struct ofpbuf *actions, struct ofpbuf *put_actions) >> { >> struct ofpbuf *add_actions; >> struct dp_packet_batch b; >> @@ -5096,8 +5026,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, >> put_actions); >> if (OVS_UNLIKELY(error && error != ENOSPC)) { >> dp_packet_delete(packet); >> - (*lost_cnt)++; >> - return; >> + return error; >> } >> >> /* The Netlink encoding of datapath flow keys cannot express >> @@ -5137,6 +5066,9 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, >> ovs_mutex_unlock(&pmd->flow_mutex); >> emc_probabilistic_insert(pmd, key, netdev_flow); >> } >> + /* Only error ENOSPC can reach here. We process the packet but do not >> + * install a datapath flow. Treat as successful. */ >> + return 0; >> } >> >> static inline void >> @@ -5158,7 +5090,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, >> struct dpcls *cls; >> struct dpcls_rule *rules[PKT_ARRAY_SIZE]; >> struct dp_netdev *dp = pmd->dp; >> - int miss_cnt = 0, lost_cnt = 0; >> + int upcall_ok_cnt = 0, upcall_fail_cnt = 0; >> int lookup_cnt = 0, add_lookup_cnt; >> bool any_miss; >> size_t i; >> @@ -5200,9 +5132,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, >> continue; >> } >> >> - miss_cnt++; >> - handle_packet_upcall(pmd, packet, &keys[i], &actions, >> - &put_actions, &lost_cnt); >> + int error = handle_packet_upcall(pmd, packet, &keys[i], >> + &actions, &put_actions); >> + >> + if (OVS_UNLIKELY(error)) { >> + upcall_fail_cnt++; >> + } else { >> + upcall_ok_cnt++; >> + } >> } >> >> ofpbuf_uninit(&actions); >> @@ -5212,8 +5149,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, >> DP_PACKET_BATCH_FOR_EACH (packet, packets_) { >> if (OVS_UNLIKELY(!rules[i])) { >> dp_packet_delete(packet); >> - lost_cnt++; >> - miss_cnt++; >> + upcall_fail_cnt++; >> } >> } >> } >> @@ -5231,10 +5167,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, >> dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches); >> } >> >> - dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt); >> - dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt); >> - dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt); >> - dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt); >> + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT, >> + cnt - upcall_ok_cnt - upcall_fail_cnt); >> + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_LOOKUP, >> + lookup_cnt); >> + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS, >> + upcall_ok_cnt); >> + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST, >> + upcall_fail_cnt); >> } >> >> /* Packets enter the datapath from a port (or from recirculation) here. >> diff --git a/tests/pmd.at b/tests/pmd.at >> index e39a23a..0356f87 100644 >> --- a/tests/pmd.at >> +++ b/tests/pmd.at >> @@ -170,13 +170,16 @@ dummy@ovs-dummy: hit:0 missed:0 >> p0 7/1: (dummy-pmd: configured_rx_queues=4, configured_tx_queues=<cleared>, requested_rx_queues=4, requested_tx_queues=<cleared>) >> ]) >> >> -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl >> +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl >> pmd thread numa_id <cleared> core_id <cleared>: >> + packets received:0 >> + packet recirculations:0 >> + avg. datapath passes per packet:0.00 >> emc hits:0 >> megaflow hits:0 >> - avg. subtable lookups per hit:0.00 >> - miss:0 >> - lost:0 >> + avg. subtable lookups per megaflow hit:0.00 >> + miss(i.e. lookup miss with success upcall):0 >> + lost(i.e. lookup miss with failed upcall):0 >> ]) >> >> ovs-appctl time/stop >> @@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | strip_xout], [0], [dnl >> recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), actions: <del> >> ]) >> >> -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl >> +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl >> pmd thread numa_id <cleared> core_id <cleared>: >> + packets received:20 >> + packet recirculations:0 >> + avg. datapath passes per packet:1.00 >> emc hits:19 >> megaflow hits:0 >> - avg. subtable lookups per hit:0.00 >> - miss:1 >> - lost:0 >> + avg. subtable lookups per megaflow hit:0.00 >> + miss(i.e. lookup miss with success upcall):1 >> + lost(i.e. lookup miss with failed upcall):0 >> ]) >> >> OVS_VSWITCHD_STOP >> > > In addition to not updated tests above, I also see failures of > test "1165. ofproto-dpif.at:7478: testing ofproto-dpif - patch ports": > > ./ofproto-dpif.at:7514: ovs-appctl dpif/show > --- - 2018-01-11 15:06:46.324839417 +0300 > +++ /tests/testsuite.dir/at-groups/1165/stdout > @@ -1,4 +1,4 @@ > -dummy@ovs-dummy: hit:13 missed:2 > +dummy@ovs-dummy: hit:0 missed:2 > br0: > br0 65534/100: (dummy-internal) > p2 2/2: (dummy) > > I found the reason why this test fails. It happens because 'stats->n_hit' in 'dpif_netdev_get_stats' should be the sum of PMD_STAT_EXACT_HIT and PMD_STAT_MASKED_HIT. Additionally, no need to set 'stats->n_mask_hit' because it only useful if 'stats->n_masks' set properly. So, following incremental required to make at least unit tests work: -------------------------------------------------------------------- diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index c7a4a21..ed14dde 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -1054,7 +1054,7 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], ovs_mutex_lock(&dp_netdev_mutex); - while (argc > 0) { + while (argc > 1) { if (!strcmp(argv[1], "-pmd") && argc >= 2) { if (str_to_uint(argv[2], 10, &core_id)) { filter_on_pmd = true; @@ -1464,11 +1464,12 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) stats->n_flows += cmap_count(&pmd->flow_table); pmd_perf_read_counters(&pmd->perf_stats, pmd_stats); stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT]; - stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT]; + stats->n_hit += pmd_stats[PMD_STAT_MASKED_HIT]; stats->n_missed += pmd_stats[PMD_STAT_MISS]; stats->n_lost += pmd_stats[PMD_STAT_LOST]; } stats->n_masks = UINT32_MAX; + stats->n_mask_hit = UINT64_MAX; return 0; } diff --git a/tests/pmd.at b/tests/pmd.at index 6e7ba50..80f7414 100644 --- a/tests/pmd.at +++ b/tests/pmd.at @@ -178,8 +178,8 @@ pmd thread numa_id <cleared> core_id <cleared>: emc hits:0 megaflow hits:0 avg. subtable lookups per megaflow hit:0.00 - miss(i.e. lookup miss with success upcall):0 - lost(i.e. lookup miss with failed upcall):0 + miss with success upcall:0 + miss with failed upcall:0 ]) ovs-appctl time/stop @@ -208,8 +208,8 @@ pmd thread numa_id <cleared> core_id <cleared>: emc hits:19 megaflow hits:0 avg. subtable lookups per megaflow hit:0.00 - miss(i.e. lookup miss with success upcall):1 - lost(i.e. lookup miss with failed upcall):0 + miss with success upcall:1 + miss with failed upcall:0 ]) OVS_VSWITCHD_STOP --------------------------------------------------------------------
> -----Original Message----- > From: Ilya Maximets [mailto:i.maximets@samsung.com] > Sent: Thursday, 11 January, 2018 13:15 > To: Jan Scheurich <jan.scheurich@ericsson.com>; dev@openvswitch.org > Cc: ktraynor@redhat.com; ian.stokes@intel.com; billy.o.mahony@intel.com; Darrell Ball <dlu998@gmail.com> > Subject: Re: [PATCH v7 1/2] dpif-netdev: Refactor PMD performance into dpif-netdev-perf > > I guess, this version was compile tested only. Yes, unfortunately that's right. It was late last night, sorry! I will send a corrected version, once we have agreed on: a) Whether or not to add a blank after the colons in pmds-stats-show output b) The approach the fix the failing ofproto-dpif test case (see below) BR, Jan > > > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl > > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl > > pmd thread numa_id <cleared> core_id <cleared>: > > + packets received:0 > > + packet recirculations:0 > > + avg. datapath passes per packet:0.00 > > emc hits:0 > > megaflow hits:0 > > - avg. subtable lookups per hit:0.00 > > - miss:0 > > - lost:0 > > + avg. subtable lookups per megaflow hit:0.00 > > + miss(i.e. lookup miss with success upcall):0 > > + lost(i.e. lookup miss with failed upcall):0 > > ]) > > > > ovs-appctl time/stop > > @@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | strip_xout], [0], [dnl > > recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), > actions: <del> > > ]) > > > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl > > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl > > pmd thread numa_id <cleared> core_id <cleared>: > > + packets received:20 > > + packet recirculations:0 > > + avg. datapath passes per packet:1.00 > > emc hits:19 > > megaflow hits:0 > > - avg. subtable lookups per hit:0.00 > > - miss:1 > > - lost:0 > > + avg. subtable lookups per megaflow hit:0.00 > > + miss(i.e. lookup miss with success upcall):1 > > + lost(i.e. lookup miss with failed upcall):0 > > ]) > > > > OVS_VSWITCHD_STOP > > Tests will be adjusted. > > In addition to not updated tests above, I also see failures of > test "1165. ofproto-dpif.at:7478: testing ofproto-dpif - patch ports": > > ./ofproto-dpif.at:7514: ovs-appctl dpif/show > --- - 2018-01-11 15:06:46.324839417 +0300 > +++ /tests/testsuite.dir/at-groups/1165/stdout > @@ -1,4 +1,4 @@ > -dummy@ovs-dummy: hit:13 missed:2 > +dummy@ovs-dummy: hit:0 missed:2 > br0: > br0 65534/100: (dummy-internal) > p2 2/2: (dummy) The root cause for this is two-fold: 1. The injected packet do not hit the EMC because of random insertion 2. The incorrect reporting of datapath stats of dpif-netdev.c: Ad 1: The unit test is run with default emc-insert-inv-prob=100, so that it is random whether an EMC is created or not. When I change the test to set emc-insert-inv-prob=1, the test passes. Ad 2: This is the code in ofproto_dpif printing the fetched dp_stats values: dpif_get_dp_stats(backer->dpif, &dp_stats); ds_put_format(ds, "%s: hit:%"PRIu64" missed:%"PRIu64"\n", dpif_name(backer->dpif), dp_stats.n_hit, dp_stats.n_missed); This is the current incorrect implementation in dpif-netdev.c: static int dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) { struct dp_netdev *dp = get_dp_netdev(dpif); struct dp_netdev_pmd_thread *pmd; uint64_t pmd_stats[PMD_N_STATS]; stats->n_flows = stats->n_hit = stats->n_mask_hit = stats->n_missed = stats->n_lost = 0; CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { stats->n_flows += cmap_count(&pmd->flow_table); pmd_perf_read_counters(&pmd->perf_stats, pmd_stats); stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT]; stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT]; stats->n_missed += pmd_stats[PMD_STAT_MISS]; stats->n_lost += pmd_stats[PMD_STAT_LOST]; } stats->n_masks = UINT32_MAX; return 0; } So the hits reported are only the EMC hits. Megaflow hits are ignored as dpif-netdev reports them as n_mask_hit, which according to the dpif-provider header file (and kernel datapath implementation!) should rather contain our PMD_STAT_MASKED_LOOKUP counter. With the following correction on dpif-netdev.c the test passes even with emc-insert-inv-prob=100: diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index edc5e2e..7f51469 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -1447,8 +1447,8 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { stats->n_flows += cmap_count(&pmd->flow_table); pmd_perf_read_counters(&pmd->perf_stats, pmd_stats); - stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT]; - stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT]; + stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT] + pmd_stats[PMD_STAT_MASKED_HIT]; + stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_LOOKUP]; stats->n_missed += pmd_stats[PMD_STAT_MISS]; stats->n_lost += pmd_stats[PMD_STAT_LOST]; } As this is really a bug in dpif-netdev.c, I suggest we fix it in a separate commit. BR, Jan
😊! Came to the same conclusion... I think we should report PMD_STAT_MASKED_LOOKUP as n_mask_hit. According to my understanding it count exactly what is required. Where do you see the dependency to n_mask (which we don't have easily as we have independent masks per PMD)? I will include a bugfix as separate commit in the next version. /Jan > -----Original Message----- > From: Ilya Maximets [mailto:i.maximets@samsung.com] > Sent: Thursday, 11 January, 2018 15:10 > > I found the reason why this test fails. It happens because 'stats->n_hit' > in 'dpif_netdev_get_stats' should be the sum of PMD_STAT_EXACT_HIT and > PMD_STAT_MASKED_HIT. > Additionally, no need to set 'stats->n_mask_hit' because it only useful > if 'stats->n_masks' set properly. > > > So, following incremental required to make at least unit tests work: > -------------------------------------------------------------------- > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index c7a4a21..ed14dde 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -1054,7 +1054,7 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], > > ovs_mutex_lock(&dp_netdev_mutex); > > - while (argc > 0) { > + while (argc > 1) { > if (!strcmp(argv[1], "-pmd") && argc >= 2) { > if (str_to_uint(argv[2], 10, &core_id)) { > filter_on_pmd = true; > @@ -1464,11 +1464,12 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) > stats->n_flows += cmap_count(&pmd->flow_table); > pmd_perf_read_counters(&pmd->perf_stats, pmd_stats); > stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT]; > - stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT]; > + stats->n_hit += pmd_stats[PMD_STAT_MASKED_HIT]; > stats->n_missed += pmd_stats[PMD_STAT_MISS]; > stats->n_lost += pmd_stats[PMD_STAT_LOST]; > } > stats->n_masks = UINT32_MAX; > + stats->n_mask_hit = UINT64_MAX; > > return 0; > } > diff --git a/tests/pmd.at b/tests/pmd.at > index 6e7ba50..80f7414 100644 > --- a/tests/pmd.at > +++ b/tests/pmd.at > @@ -178,8 +178,8 @@ pmd thread numa_id <cleared> core_id <cleared>: > emc hits:0 > megaflow hits:0 > avg. subtable lookups per megaflow hit:0.00 > - miss(i.e. lookup miss with success upcall):0 > - lost(i.e. lookup miss with failed upcall):0 > + miss with success upcall:0 > + miss with failed upcall:0 > ]) > > ovs-appctl time/stop > @@ -208,8 +208,8 @@ pmd thread numa_id <cleared> core_id <cleared>: > emc hits:19 > megaflow hits:0 > avg. subtable lookups per megaflow hit:0.00 > - miss(i.e. lookup miss with success upcall):1 > - lost(i.e. lookup miss with failed upcall):0 > + miss with success upcall:1 > + miss with failed upcall:0 > ]) > > OVS_VSWITCHD_STOP > --------------------------------------------------------------------
On 11.01.2018 17:19, Jan Scheurich wrote: > 😊! > > Came to the same conclusion... > > I think we should report PMD_STAT_MASKED_LOOKUP as n_mask_hit. According to my understanding it count exactly what is required. Where do you see the dependency to n_mask (which we don't have easily as we have independent masks per PMD)? > > I will include a bugfix as separate commit in the next version. It's a bug in your patch #1. Previously 'n_hit' was calculated as sum of 2 counters. kernel datapath treats 'n_hit' as a total number of hits and 'n_mask_hit' as a number of masked hits from 'n_hit'. See datapath/datapath.c: ovs_dp_process_packet(). dpctl utility calculates average number of masked hits per packet as 'stats.n_mask_hit / (stats.n_hit + stats.n_missed)' and only if "stats.n_masks != UINT32_MAX". So, (stats.n_hit + stats.n_missed) is the total numer of packets. See lib/dpctl.c: show_dpif() for details. > > /Jan > >> -----Original Message----- >> From: Ilya Maximets [mailto:i.maximets@samsung.com] >> Sent: Thursday, 11 January, 2018 15:10 >> >> I found the reason why this test fails. It happens because 'stats->n_hit' >> in 'dpif_netdev_get_stats' should be the sum of PMD_STAT_EXACT_HIT and >> PMD_STAT_MASKED_HIT. >> Additionally, no need to set 'stats->n_mask_hit' because it only useful >> if 'stats->n_masks' set properly. >> >> >> So, following incremental required to make at least unit tests work: >> -------------------------------------------------------------------- >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index c7a4a21..ed14dde 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -1054,7 +1054,7 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], >> >> ovs_mutex_lock(&dp_netdev_mutex); >> >> - while (argc > 0) { >> + while (argc > 1) { >> if (!strcmp(argv[1], "-pmd") && argc >= 2) { >> if (str_to_uint(argv[2], 10, &core_id)) { >> filter_on_pmd = true; >> @@ -1464,11 +1464,12 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) >> stats->n_flows += cmap_count(&pmd->flow_table); >> pmd_perf_read_counters(&pmd->perf_stats, pmd_stats); >> stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT]; >> - stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT]; >> + stats->n_hit += pmd_stats[PMD_STAT_MASKED_HIT]; >> stats->n_missed += pmd_stats[PMD_STAT_MISS]; >> stats->n_lost += pmd_stats[PMD_STAT_LOST]; >> } >> stats->n_masks = UINT32_MAX; >> + stats->n_mask_hit = UINT64_MAX; >> >> return 0; >> } >> diff --git a/tests/pmd.at b/tests/pmd.at >> index 6e7ba50..80f7414 100644 >> --- a/tests/pmd.at >> +++ b/tests/pmd.at >> @@ -178,8 +178,8 @@ pmd thread numa_id <cleared> core_id <cleared>: >> emc hits:0 >> megaflow hits:0 >> avg. subtable lookups per megaflow hit:0.00 >> - miss(i.e. lookup miss with success upcall):0 >> - lost(i.e. lookup miss with failed upcall):0 >> + miss with success upcall:0 >> + miss with failed upcall:0 >> ]) >> >> ovs-appctl time/stop >> @@ -208,8 +208,8 @@ pmd thread numa_id <cleared> core_id <cleared>: >> emc hits:19 >> megaflow hits:0 >> avg. subtable lookups per megaflow hit:0.00 >> - miss(i.e. lookup miss with success upcall):1 >> - lost(i.e. lookup miss with failed upcall):0 >> + miss with success upcall:1 >> + miss with failed upcall:0 >> ]) >> >> OVS_VSWITCHD_STOP >> --------------------------------------------------------------------
> > It's a bug in your patch #1. > Previously 'n_hit' was calculated as sum of 2 counters. I see. Must have screwed this up a long time ago.... Completely forgot about this change. > kernel datapath treats 'n_hit' as a total number of hits and 'n_mask_hit' as a number > of masked hits from 'n_hit'. See datapath/datapath.c: ovs_dp_process_packet(). Not sure. Have a look at this: static struct sw_flow *masked_flow_lookup(struct table_instance *ti, const struct sw_flow_key *unmasked, const struct sw_flow_mask *mask, u32 *n_mask_hit) { struct sw_flow *flow; struct hlist_head *head; u32 hash; struct sw_flow_key masked_key; ovs_flow_mask_key(&masked_key, unmasked, false, mask); hash = flow_hash(&masked_key, &mask->range); head = find_bucket(ti, hash); (*n_mask_hit)++; hlist_for_each_entry_rcu(flow, head, flow_table.node[ti->node_ver]) { if (flow->mask == mask && flow->flow_table.hash == hash && flow_cmp_masked_key(flow, &masked_key, &mask->range)) return flow; } return NULL; } The n_mask_hit is incremented for every mask (subtable) lookup, no matter if hit or not. So it really corresponds to our MASKED _LOOKUP counter. Also in line with the specification in dpif.h: /* Statistics for a dpif as a whole. */ struct dpif_dp_stats { uint64_t n_hit; /* Number of flow table matches. */ uint64_t n_missed; /* Number of flow table misses. */ uint64_t n_lost; /* Number of misses not sent to userspace. */ uint64_t n_flows; /* Number of flows present. */ uint64_t n_mask_hit; /* Number of mega flow masks visited for flow table matches. */ uint32_t n_masks; /* Number of mega flow masks. */ }; > > dpctl utility calculates average number of masked hits per packet as > 'stats.n_mask_hit / (stats.n_hit + stats.n_missed)' and only if "stats.n_masks != UINT32_MAX". > So, (stats.n_hit + stats.n_missed) is the total numer of packets. > > See lib/dpctl.c: show_dpif() for details. Given the above definition of n_mask_hit, what they print in show_dpif() is the average number of mask/subtable lookups per datapath pass (not per Megaflow hit as we do in pmd-stats-show). If the number of EMC hits or misses is large this will not provide the same information. /Jan
diff --git a/lib/automake.mk b/lib/automake.mk index 4b38a11..159319f 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -80,6 +80,8 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpdk.h \ lib/dpif-netdev.c \ lib/dpif-netdev.h \ + lib/dpif-netdev-perf.c \ + lib/dpif-netdev-perf.h \ lib/dpif-provider.h \ lib/dpif.c \ lib/dpif.h \ diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c new file mode 100644 index 0000000..f853fd8 --- /dev/null +++ b/lib/dpif-netdev-perf.c @@ -0,0 +1,67 @@ +/* + * Copyright (c) 2017 Ericsson AB. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include <config.h> + +#ifdef DPDK_NETDEV +#include <rte_cycles.h> +#endif + +#include "openvswitch/dynamic-string.h" +#include "openvswitch/vlog.h" +#include "dpif-netdev-perf.h" +#include "timeval.h" + +VLOG_DEFINE_THIS_MODULE(pmd_perf); + +void +pmd_perf_stats_init(struct pmd_perf_stats *s) +{ + memset(s, 0 , sizeof(*s)); + s->start_ms = time_msec(); +} + +void +pmd_perf_read_counters(struct pmd_perf_stats *s, + uint64_t stats[PMD_N_STATS]) +{ + uint64_t val; + + /* These loops subtracts reference values (.zero[*]) from the counters. + * Since loads and stores are relaxed, it might be possible for a .zero[*] + * value to be more recent than the current value we're reading from the + * counter. This is not a big problem, since these numbers are not + * supposed to be 100% accurate, but we should at least make sure that + * the result is not negative. */ + for (int i = 0; i < PMD_N_STATS; i++) { + atomic_read_relaxed(&s->counters.n[i], &val); + if (val > s->counters.zero[i]) { + stats[i] = val - s->counters.zero[i]; + } else { + stats[i] = 0; + } + } +} + +void +pmd_perf_stats_clear(struct pmd_perf_stats *s) +{ + s->start_ms = 0; /* Marks start of clearing. */ + for (int i = 0; i < PMD_N_STATS; i++) { + atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]); + } + s->start_ms = time_msec(); /* Clearing finished. */ +} diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h new file mode 100644 index 0000000..41f4e85 --- /dev/null +++ b/lib/dpif-netdev-perf.h @@ -0,0 +1,151 @@ +/* + * Copyright (c) 2017 Ericsson AB. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PERF_H +#define DPIF_NETDEV_PERF_H 1 + +#include <stdbool.h> +#include <stddef.h> +#include <stdint.h> +#include <string.h> +#include <math.h> + +#include "openvswitch/vlog.h" +#include "ovs-atomic.h" +#include "timeval.h" +#include "unixctl.h" +#include "util.h" + +#ifdef __cplusplus +extern "C" { +#endif + +/* This module encapsulates data structures and functions to maintain PMD + * performance metrics such as packet counters, execution cycles. It + * provides a clean API for dpif-netdev to initialize, update and read and + * reset these metrics. + */ + +/* Set of counter types maintained in pmd_perf_stats. */ + +enum pmd_stat_type { + PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ + PMD_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ + PMD_STAT_MISS, /* Packets that did not match and upcall was ok. */ + PMD_STAT_LOST, /* Packets that did not match and upcall failed. */ + /* The above statistics account for the total + * number of packet passes through the datapath + * pipeline and should not be overlapping with each + * other. */ + PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table + hits. Each MASKED_HIT hit will have >= 1 + MASKED_LOOKUP(s). */ + PMD_STAT_RECV, /* Packets entering the datapath pipeline from an + * interface. */ + PMD_STAT_RECIRC, /* Packets reentering the datapath pipeline due to + * recirculation. */ + PMD_STAT_SENT_PKTS, /* Packets that have been sent. */ + PMD_STAT_SENT_BATCHES, /* Number of batches sent. */ + PMD_CYCLES_POLL_IDLE, /* Cycles spent unsuccessful polling. */ + PMD_CYCLES_POLL_BUSY, /* Cycles spent successfully polling and + * processing polled packets. */ + PMD_CYCLES_OVERHEAD, /* Cycles spent for other tasks. */ + PMD_CYCLES_ITER_IDLE, /* Cycles spent in idle iterations. */ + PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */ + PMD_N_STATS +}; + +/* Array of PMD counters indexed by enum pmd_stat_type. + * The n[] array contains the actual counter values since initialization + * of the PMD. Counters are atomically updated from the PMD but are + * read and cleared also from other processes. To clear the counters at + * PMD run-time, the current counter values are copied over to the zero[] + * array. To read counters we subtract zero[] value from n[]. */ + +struct pmd_counters { + atomic_uint64_t n[PMD_N_STATS]; /* Value since _init(). */ + uint64_t zero[PMD_N_STATS]; /* Value at last _clear(). */ +}; + +/* Container for all performance metrics of a PMD. + * Part of the struct dp_netdev_pmd_thread. */ + +struct pmd_perf_stats { + /* Start of the current performance measurement period. + * While a new measurement period is being started with + * ovs-appctl dpif-netdev/pmd-stats-clear, start_ms is set + * to zero to lock out operations from accessing inconsistent + * data. */ + uint64_t start_ms; + /* Start of the current PMD iteration in TSC cycles.*/ + uint64_t last_tsc; + /* Set of PMD counters with their zero offsets. */ + struct pmd_counters counters; +}; + +void pmd_perf_stats_init(struct pmd_perf_stats *s); +void pmd_perf_stats_clear(struct pmd_perf_stats *s); +void pmd_perf_read_counters(struct pmd_perf_stats *s, + uint64_t stats[PMD_N_STATS]); + +/* PMD performance counters are updated lock-less. For real PMDs + * they are only updated from the PMD thread itself. In the case of the + * NON-PMD they might be updated from multiple threads, but we can live + * with losing a rare update as 100% accuracy is not required. + * However, as counters are read for display from outside the PMD thread + * with e.g. pmd-stats-show, we make sure that the 64-bit read and store + * operations are atomic also on 32-bit systems so that readers cannot + * not read garbage. On 64-bit systems this incurs no overhead. */ + +static inline void +pmd_perf_update_counter(struct pmd_perf_stats *s, + enum pmd_stat_type counter, int delta) +{ + uint64_t tmp; + atomic_read_relaxed(&s->counters.n[counter], &tmp); + tmp += delta; + atomic_store_relaxed(&s->counters.n[counter], tmp); +} + +static inline void +pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc) +{ + if (OVS_UNLIKELY(s->start_ms == 0)) { + /* Stats not yet fully initialized. */ + return; + } + s->last_tsc = now_tsc; +} + +static inline void +pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc, + int packets) +{ + uint64_t cycles = now_tsc - s->last_tsc; + + /* No need for atomic updates as only invoked within PMD thread. */ + if (packets > 0) { + s->counters.n[PMD_CYCLES_ITER_BUSY] += cycles; + } else { + s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles; + } +} + +#ifdef __cplusplus +} +#endif + +#endif /* DPIF_NETDEV_PERF_H */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index c7d157a..538d5ce 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -44,6 +44,7 @@ #include "csum.h" #include "dp-packet.h" #include "dpif.h" +#include "dpif-netdev-perf.h" #include "dpif-provider.h" #include "dummy.h" #include "fat-rwlock.h" @@ -331,25 +332,6 @@ static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp, odp_port_t) OVS_REQUIRES(dp->port_mutex); -enum dp_stat_type { - DP_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ - DP_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ - DP_STAT_MISS, /* Packets that did not match. */ - DP_STAT_LOST, /* Packets not passed up to the client. */ - DP_STAT_LOOKUP_HIT, /* Number of subtable lookups for flow table - hits */ - DP_STAT_SENT_PKTS, /* Packets that has been sent. */ - DP_STAT_SENT_BATCHES, /* Number of batches sent. */ - DP_N_STATS -}; - -enum pmd_cycles_counter_type { - PMD_CYCLES_IDLE, /* Cycles spent idle or unsuccessful polling */ - PMD_CYCLES_PROCESSING, /* Cycles spent successfully polling and - * processing polled packets */ - PMD_N_CYCLES -}; - enum rxq_cycles_counter_type { RXQ_CYCLES_PROC_CURR, /* Cycles spent successfully polling and processing packets during the current @@ -499,21 +481,6 @@ struct dp_netdev_actions *dp_netdev_flow_get_actions( const struct dp_netdev_flow *); static void dp_netdev_actions_free(struct dp_netdev_actions *); -/* Contained by struct dp_netdev_pmd_thread's 'stats' member. */ -struct dp_netdev_pmd_stats { - /* Indexed by DP_STAT_*. */ - atomic_ullong n[DP_N_STATS]; -}; - -/* Contained by struct dp_netdev_pmd_thread's 'cycle' member. */ -struct dp_netdev_pmd_cycles { - /* Indexed by PMD_CYCLES_*. */ - atomic_ullong n[PMD_N_CYCLES]; -}; - -static void dp_netdev_count_packet(struct dp_netdev_pmd_thread *, - enum dp_stat_type type, int cnt); - struct polled_queue { struct dp_netdev_rxq *rxq; odp_port_t port_no; @@ -595,12 +562,6 @@ struct dp_netdev_pmd_thread { are stored for each polled rxq. */ long long int rxq_next_cycle_store; - /* Statistics. */ - struct dp_netdev_pmd_stats stats; - - /* Cycles counters */ - struct dp_netdev_pmd_cycles cycles; - /* Current context of the PMD thread. */ struct dp_netdev_pmd_thread_ctx ctx; @@ -638,12 +599,8 @@ struct dp_netdev_pmd_thread { struct hmap tnl_port_cache; struct hmap send_port_cache; - /* Only a pmd thread can write on its own 'cycles' and 'stats'. - * The main thread keeps 'stats_zero' and 'cycles_zero' as base - * values and subtracts them from 'stats' and 'cycles' before - * reporting to the user */ - unsigned long long stats_zero[DP_N_STATS]; - uint64_t cycles_zero[PMD_N_CYCLES]; + /* Keep track of detailed PMD performance statistics. */ + struct pmd_perf_stats perf_stats; /* Set to true if the pmd thread needs to be reloaded. */ bool need_reload; @@ -833,47 +790,10 @@ enum pmd_info_type { }; static void -pmd_info_show_stats(struct ds *reply, - struct dp_netdev_pmd_thread *pmd, - unsigned long long stats[DP_N_STATS], - uint64_t cycles[PMD_N_CYCLES]) +format_pmd_thread(struct ds *reply, struct dp_netdev_pmd_thread *pmd) { - unsigned long long total_packets; - uint64_t total_cycles = 0; - double lookups_per_hit = 0, packets_per_batch = 0; - int i; - - /* These loops subtracts reference values ('*_zero') from the counters. - * Since loads and stores are relaxed, it might be possible for a '*_zero' - * value to be more recent than the current value we're reading from the - * counter. This is not a big problem, since these numbers are not - * supposed to be too accurate, but we should at least make sure that - * the result is not negative. */ - for (i = 0; i < DP_N_STATS; i++) { - if (stats[i] > pmd->stats_zero[i]) { - stats[i] -= pmd->stats_zero[i]; - } else { - stats[i] = 0; - } - } - - /* Sum of all the matched and not matched packets gives the total. */ - total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT] - + stats[DP_STAT_MISS]; - - for (i = 0; i < PMD_N_CYCLES; i++) { - if (cycles[i] > pmd->cycles_zero[i]) { - cycles[i] -= pmd->cycles_zero[i]; - } else { - cycles[i] = 0; - } - - total_cycles += cycles[i]; - } - ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID) ? "main thread" : "pmd thread"); - if (pmd->numa_id != OVS_NUMA_UNSPEC) { ds_put_format(reply, " numa_id %d", pmd->numa_id); } @@ -881,23 +801,52 @@ pmd_info_show_stats(struct ds *reply, ds_put_format(reply, " core_id %u", pmd->core_id); } ds_put_cstr(reply, ":\n"); +} + +static void +pmd_info_show_stats(struct ds *reply, + struct dp_netdev_pmd_thread *pmd) +{ + uint64_t stats[PMD_N_STATS]; + uint64_t total_cycles, total_packets; + double passes_per_pkt = 0; + double lookups_per_hit = 0; + double packets_per_batch = 0; + + pmd_perf_read_counters(&pmd->perf_stats, stats); + total_cycles = stats[PMD_CYCLES_ITER_IDLE] + + stats[PMD_CYCLES_ITER_BUSY]; + total_packets = stats[PMD_STAT_RECV]; + + format_pmd_thread(reply, pmd); - if (stats[DP_STAT_MASKED_HIT] > 0) { - lookups_per_hit = stats[DP_STAT_LOOKUP_HIT] - / (double) stats[DP_STAT_MASKED_HIT]; + if (total_packets > 0) { + passes_per_pkt = (total_packets + stats[PMD_STAT_RECIRC]) + / (double) total_packets; } - if (stats[DP_STAT_SENT_BATCHES] > 0) { - packets_per_batch = stats[DP_STAT_SENT_PKTS] - / (double) stats[DP_STAT_SENT_BATCHES]; + if (stats[PMD_STAT_MASKED_HIT] > 0) { + lookups_per_hit = stats[PMD_STAT_MASKED_LOOKUP] + / (double) stats[PMD_STAT_MASKED_HIT]; + } + if (stats[PMD_STAT_SENT_BATCHES] > 0) { + packets_per_batch = stats[PMD_STAT_SENT_PKTS] + / (double) stats[PMD_STAT_SENT_BATCHES]; } ds_put_format(reply, - "\temc hits:%llu\n\tmegaflow hits:%llu\n" - "\tavg. subtable lookups per hit:%.2f\n" - "\tmiss:%llu\n\tlost:%llu\n" - "\tavg. packets per output batch: %.2f\n", - stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT], - lookups_per_hit, stats[DP_STAT_MISS], stats[DP_STAT_LOST], + "\tpackets received:%"PRIu64"\n" + "\tpacket recirculations:%"PRIu64"\n" + "\tavg. datapath passes per packet:%.02f\n" + "\temc hits:%"PRIu64"\n" + "\tmegaflow hits:%"PRIu64"\n" + "\tavg. subtable lookups per megaflow hit:%.02f\n" + "\tmiss with success upcall:%"PRIu64"\n" + "\tmiss with failed upcall:%"PRIu64"\n" + "\tavg. packets per output batch:%.02f\n", + total_packets, stats[PMD_STAT_RECIRC], + passes_per_pkt, stats[PMD_STAT_EXACT_HIT], + stats[PMD_STAT_MASKED_HIT], lookups_per_hit, + stats[PMD_STAT_MISS], stats[PMD_STAT_LOST], packets_per_batch); if (total_cycles == 0) { @@ -907,46 +856,25 @@ pmd_info_show_stats(struct ds *reply, ds_put_format(reply, "\tidle cycles:%"PRIu64" (%.02f%%)\n" "\tprocessing cycles:%"PRIu64" (%.02f%%)\n", - cycles[PMD_CYCLES_IDLE], - cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100, - cycles[PMD_CYCLES_PROCESSING], - cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100); + stats[PMD_CYCLES_ITER_IDLE], + stats[PMD_CYCLES_ITER_IDLE] / (double) total_cycles * 100, + stats[PMD_CYCLES_ITER_BUSY], + stats[PMD_CYCLES_ITER_BUSY] / (double) total_cycles * 100); if (total_packets == 0) { return; } ds_put_format(reply, - "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n", - total_cycles / (double)total_packets, + "\tavg cycles per packet:%.02f (%"PRIu64"/%"PRIu64")\n", + total_cycles / (double) total_packets, total_cycles, total_packets); ds_put_format(reply, - "\tavg processing cycles per packet: " - "%.02f (%"PRIu64"/%llu)\n", - cycles[PMD_CYCLES_PROCESSING] / (double)total_packets, - cycles[PMD_CYCLES_PROCESSING], total_packets); -} - -static void -pmd_info_clear_stats(struct ds *reply OVS_UNUSED, - struct dp_netdev_pmd_thread *pmd, - unsigned long long stats[DP_N_STATS], - uint64_t cycles[PMD_N_CYCLES]) -{ - int i; - - /* We cannot write 'stats' and 'cycles' (because they're written by other - * threads) and we shouldn't change 'stats' (because they're used to count - * datapath stats, which must not be cleared here). Instead, we save the - * current values and subtract them from the values to be displayed in the - * future */ - for (i = 0; i < DP_N_STATS; i++) { - pmd->stats_zero[i] = stats[i]; - } - for (i = 0; i < PMD_N_CYCLES; i++) { - pmd->cycles_zero[i] = cycles[i]; - } + "\tavg processing cycles per packet:" + "%.02f (%"PRIu64"/%"PRIu64")\n", + stats[PMD_CYCLES_ITER_BUSY] / (double) total_packets, + stats[PMD_CYCLES_ITER_BUSY], total_packets); } static int @@ -1106,23 +1034,37 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], struct ds reply = DS_EMPTY_INITIALIZER; struct dp_netdev_pmd_thread **pmd_list; struct dp_netdev *dp = NULL; - size_t n; enum pmd_info_type type = *(enum pmd_info_type *) aux; + unsigned int core_id; + bool filter_on_pmd = false; + size_t n; ovs_mutex_lock(&dp_netdev_mutex); - if (argc == 2) { - dp = shash_find_data(&dp_netdevs, argv[1]); - } else if (shash_count(&dp_netdevs) == 1) { - /* There's only one datapath */ - dp = shash_first(&dp_netdevs)->data; + while (argc > 0) { + if (!strcmp(argv[1], "-pmd") && argc >= 2) { + if (str_to_uint(argv[2], 10, &core_id)) { + filter_on_pmd = true; + } + argc -= 2; + argv += 2; + } else { + dp = shash_find_data(&dp_netdevs, argv[1]); + argc -= 1; + argv += 1; + } } if (!dp) { - ovs_mutex_unlock(&dp_netdev_mutex); - unixctl_command_reply_error(conn, - "please specify an existing datapath"); - return; + if (shash_count(&dp_netdevs) == 1) { + /* There's only one datapath */ + dp = shash_first(&dp_netdevs)->data; + } else { + ovs_mutex_unlock(&dp_netdev_mutex); + unixctl_command_reply_error(conn, + "please specify an existing datapath"); + return; + } } sorted_poll_thread_list(dp, &pmd_list, &n); @@ -1131,26 +1073,15 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], if (!pmd) { break; } - + if (filter_on_pmd && pmd->core_id != core_id) { + continue; + } if (type == PMD_INFO_SHOW_RXQ) { pmd_info_show_rxq(&reply, pmd); - } else { - unsigned long long stats[DP_N_STATS]; - uint64_t cycles[PMD_N_CYCLES]; - - /* Read current stats and cycle counters */ - for (size_t j = 0; j < ARRAY_SIZE(stats); j++) { - atomic_read_relaxed(&pmd->stats.n[j], &stats[j]); - } - for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) { - atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]); - } - - if (type == PMD_INFO_CLEAR_STATS) { - pmd_info_clear_stats(&reply, pmd, stats, cycles); - } else if (type == PMD_INFO_SHOW_STATS) { - pmd_info_show_stats(&reply, pmd, stats, cycles); - } + } else if (type == PMD_INFO_CLEAR_STATS) { + pmd_perf_stats_clear(&pmd->perf_stats); + } else if (type == PMD_INFO_SHOW_STATS) { + pmd_info_show_stats(&reply, pmd); } } free(pmd_list); @@ -1168,14 +1099,14 @@ dpif_netdev_init(void) clear_aux = PMD_INFO_CLEAR_STATS, poll_aux = PMD_INFO_SHOW_RXQ; - unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]", - 0, 1, dpif_netdev_pmd_info, + unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core] [dp]", + 0, 2, dpif_netdev_pmd_info, (void *)&show_aux); - unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]", - 0, 1, dpif_netdev_pmd_info, + unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core] [dp]", + 0, 2, dpif_netdev_pmd_info, (void *)&clear_aux); - unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]", - 0, 1, dpif_netdev_pmd_info, + unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] [dp]", + 0, 2, dpif_netdev_pmd_info, (void *)&poll_aux); unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", 0, 1, dpif_netdev_pmd_rebalance, @@ -1511,23 +1442,19 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) { struct dp_netdev *dp = get_dp_netdev(dpif); struct dp_netdev_pmd_thread *pmd; + uint64_t pmd_stats[PMD_N_STATS]; - stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost = 0; + stats->n_flows = stats->n_hit = + stats->n_mask_hit = stats->n_missed = stats->n_lost = 0; CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { - unsigned long long n; stats->n_flows += cmap_count(&pmd->flow_table); - - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n); - stats->n_hit += n; - atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n); - stats->n_hit += n; - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n); - stats->n_missed += n; - atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n); - stats->n_lost += n; + pmd_perf_read_counters(&pmd->perf_stats, pmd_stats); + stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT]; + stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT]; + stats->n_missed += pmd_stats[PMD_STAT_MISS]; + stats->n_lost += pmd_stats[PMD_STAT_LOST]; } stats->n_masks = UINT32_MAX; - stats->n_mask_hit = UINT64_MAX; return 0; } @@ -3209,28 +3136,28 @@ cycles_count_start(struct dp_netdev_pmd_thread *pmd) /* Stop counting cycles and add them to the counter 'type' */ static inline void cycles_count_end(struct dp_netdev_pmd_thread *pmd, - enum pmd_cycles_counter_type type) + enum pmd_stat_type type) OVS_RELEASES(&cycles_counter_fake_mutex) OVS_NO_THREAD_SAFETY_ANALYSIS { unsigned long long interval = cycles_counter() - pmd->ctx.last_cycles; - non_atomic_ullong_add(&pmd->cycles.n[type], interval); + pmd_perf_update_counter(&pmd->perf_stats, type, interval); } /* Calculate the intermediate cycle result and add to the counter 'type' */ static inline void cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_rxq *rxq, - enum pmd_cycles_counter_type type) + enum pmd_stat_type type) OVS_NO_THREAD_SAFETY_ANALYSIS { unsigned long long new_cycles = cycles_counter(); unsigned long long interval = new_cycles - pmd->ctx.last_cycles; pmd->ctx.last_cycles = new_cycles; - non_atomic_ullong_add(&pmd->cycles.n[type], interval); - if (rxq && (type == PMD_CYCLES_PROCESSING)) { + pmd_perf_update_counter(&pmd->perf_stats, type, interval); + if (rxq && (type == PMD_CYCLES_POLL_BUSY)) { /* Add to the amount of current processing cycles. */ non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval); } @@ -3289,8 +3216,8 @@ dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd, netdev_send(p->port->netdev, tx_qid, &p->output_pkts, dynamic_txqs); dp_packet_batch_init(&p->output_pkts); - dp_netdev_count_packet(pmd, DP_STAT_SENT_PKTS, output_cnt); - dp_netdev_count_packet(pmd, DP_STAT_SENT_BATCHES, 1); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SENT_PKTS, output_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SENT_BATCHES, 1); } static void @@ -3971,12 +3898,12 @@ dpif_netdev_run(struct dpif *dpif) port->port_no); cycles_count_intermediate(non_pmd, NULL, process_packets - ? PMD_CYCLES_PROCESSING - : PMD_CYCLES_IDLE); + ? PMD_CYCLES_POLL_BUSY + : PMD_CYCLES_POLL_IDLE); } } } - cycles_count_end(non_pmd, PMD_CYCLES_IDLE); + cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE); pmd_thread_ctx_time_update(non_pmd); dpif_netdev_xps_revalidate_pmd(non_pmd, false); ovs_mutex_unlock(&dp->non_pmd_mutex); @@ -4121,6 +4048,7 @@ static void * pmd_thread_main(void *f_) { struct dp_netdev_pmd_thread *pmd = f_; + struct pmd_perf_stats *s = &pmd->perf_stats; unsigned int lc = 0; struct polled_queue *poll_list; bool exiting; @@ -4156,13 +4084,17 @@ reload: cycles_count_start(pmd); for (;;) { + uint64_t iter_packets = 0; + pmd_perf_start_iteration(s, pmd->ctx.last_cycles); for (i = 0; i < poll_cnt; i++) { process_packets = dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, poll_list[i].port_no); cycles_count_intermediate(pmd, poll_list[i].rxq, - process_packets ? PMD_CYCLES_PROCESSING - : PMD_CYCLES_IDLE); + process_packets + ? PMD_CYCLES_POLL_BUSY + : PMD_CYCLES_POLL_IDLE); + iter_packets += process_packets; } if (lc++ > 1024) { @@ -4183,10 +4115,12 @@ reload: if (reload) { break; } + cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD); } + pmd_perf_end_iteration(s, pmd->ctx.last_cycles, iter_packets); } - cycles_count_end(pmd, PMD_CYCLES_IDLE); + cycles_count_end(pmd, PMD_CYCLES_OVERHEAD); poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); exiting = latch_is_set(&pmd->exit_latch); @@ -4638,6 +4572,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, emc_cache_init(&pmd->flow_cache); pmd_alloc_static_tx_qid(pmd); } + pmd_perf_stats_init(&pmd->perf_stats); cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node), hash_int(core_id, 0)); } @@ -4838,13 +4773,6 @@ dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow, int cnt, int size, atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags); } -static void -dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd, - enum dp_stat_type type, int cnt) -{ - non_atomic_ullong_add(&pmd->stats.n[type], cnt); -} - static int dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_, struct flow *flow, struct flow_wildcards *wc, ovs_u128 *ufid, @@ -5017,6 +4945,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, int i; atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min); + pmd_perf_update_counter(&pmd->perf_stats, + md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV, + cnt); DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) { struct dp_netdev_flow *flow; @@ -5065,18 +4996,17 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, } } - dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT, - cnt - n_dropped - n_missed); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT, + cnt - n_dropped - n_missed); return dp_packet_batch_size(packets_); } -static inline void +static inline int handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet, const struct netdev_flow_key *key, - struct ofpbuf *actions, struct ofpbuf *put_actions, - int *lost_cnt) + struct ofpbuf *actions, struct ofpbuf *put_actions) { struct ofpbuf *add_actions; struct dp_packet_batch b; @@ -5096,8 +5026,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, put_actions); if (OVS_UNLIKELY(error && error != ENOSPC)) { dp_packet_delete(packet); - (*lost_cnt)++; - return; + return error; } /* The Netlink encoding of datapath flow keys cannot express @@ -5137,6 +5066,9 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, ovs_mutex_unlock(&pmd->flow_mutex); emc_probabilistic_insert(pmd, key, netdev_flow); } + /* Only error ENOSPC can reach here. We process the packet but do not + * install a datapath flow. Treat as successful. */ + return 0; } static inline void @@ -5158,7 +5090,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, struct dpcls *cls; struct dpcls_rule *rules[PKT_ARRAY_SIZE]; struct dp_netdev *dp = pmd->dp; - int miss_cnt = 0, lost_cnt = 0; + int upcall_ok_cnt = 0, upcall_fail_cnt = 0; int lookup_cnt = 0, add_lookup_cnt; bool any_miss; size_t i; @@ -5200,9 +5132,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, continue; } - miss_cnt++; - handle_packet_upcall(pmd, packet, &keys[i], &actions, - &put_actions, &lost_cnt); + int error = handle_packet_upcall(pmd, packet, &keys[i], + &actions, &put_actions); + + if (OVS_UNLIKELY(error)) { + upcall_fail_cnt++; + } else { + upcall_ok_cnt++; + } } ofpbuf_uninit(&actions); @@ -5212,8 +5149,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, DP_PACKET_BATCH_FOR_EACH (packet, packets_) { if (OVS_UNLIKELY(!rules[i])) { dp_packet_delete(packet); - lost_cnt++; - miss_cnt++; + upcall_fail_cnt++; } } } @@ -5231,10 +5167,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches); } - dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt); - dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt); - dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt); - dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT, + cnt - upcall_ok_cnt - upcall_fail_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_LOOKUP, + lookup_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS, + upcall_ok_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST, + upcall_fail_cnt); } /* Packets enter the datapath from a port (or from recirculation) here. diff --git a/tests/pmd.at b/tests/pmd.at index e39a23a..0356f87 100644 --- a/tests/pmd.at +++ b/tests/pmd.at @@ -170,13 +170,16 @@ dummy@ovs-dummy: hit:0 missed:0 p0 7/1: (dummy-pmd: configured_rx_queues=4, configured_tx_queues=<cleared>, requested_rx_queues=4, requested_tx_queues=<cleared>) ]) -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl pmd thread numa_id <cleared> core_id <cleared>: + packets received:0 + packet recirculations:0 + avg. datapath passes per packet:0.00 emc hits:0 megaflow hits:0 - avg. subtable lookups per hit:0.00 - miss:0 - lost:0 + avg. subtable lookups per megaflow hit:0.00 + miss(i.e. lookup miss with success upcall):0 + lost(i.e. lookup miss with failed upcall):0 ]) ovs-appctl time/stop @@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | strip_xout], [0], [dnl recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), actions: <del> ]) -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl pmd thread numa_id <cleared> core_id <cleared>: + packets received:20 + packet recirculations:0 + avg. datapath passes per packet:1.00 emc hits:19 megaflow hits:0 - avg. subtable lookups per hit:0.00 - miss:1 - lost:0 + avg. subtable lookups per megaflow hit:0.00 + miss(i.e. lookup miss with success upcall):1 + lost(i.e. lookup miss with failed upcall):0 ]) OVS_VSWITCHD_STOP