From patchwork Fri Jan 12 00:39:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Scheurich X-Patchwork-Id: 859973 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zJ50064Cvz9sNw for ; Sat, 13 Jan 2018 01:36:00 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id AA5B9102D; Fri, 12 Jan 2018 14:35:14 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 3F591FEF for ; Fri, 12 Jan 2018 14:35:13 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mout.web.de (mout.web.de [212.227.15.4]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 93B4B44D for ; Fri, 12 Jan 2018 14:35:10 +0000 (UTC) Received: from ubuntu.local ([129.192.10.2]) by smtp.web.de (mrweb004 [213.165.67.108]) with ESMTPSA (Nemesis) id 0M1DVw-1epCt12N20-00tGWC; Fri, 12 Jan 2018 15:34:43 +0100 From: Jan Scheurich To: dev@openvswitch.org Date: Fri, 12 Jan 2018 01:39:02 +0100 Message-Id: <1515717543-31903-2-git-send-email-jan.scheurich@ericsson.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1515717543-31903-1-git-send-email-jan.scheurich@ericsson.com> References: <1515717543-31903-1-git-send-email-jan.scheurich@ericsson.com> X-Provags-ID: V03:K0:0DCYnEuQeQExiIYxhd/Zy9p4CzS2BCGLVSEsZpSQ68W1SYXoeHW g/XCKaYKjpduy6uw84LiQTluYVycSXlNOwQY+7ANG9dMwiaxnYQ7EidpoNdDrrcQA3zUGf0 nwEN3LVcoSdLOQafxB7XJi5vEFNJd2ZLKBVxgiGZwoOfVFb30jTxqd5bidQTj5nkIOE5uJ9 0YeFQUo0f7wKaqiqHuaNg== X-UI-Out-Filterresults: notjunk:1; V01:K0:JJUPOnj1NLQ=:yehFwzeE6WgAYxDDNk/625 x9oxkvIqw2Dwp/g4Rg4lrcm/IO+OGUvvF1kyCmERxkb7cEXcg8UVl/+XryMtNhg3XA45EkGNv nOJEQHonyFwbTULvrI+NrQnG8SG9hEZMV8HpXMySdpB3V1dxHNfxfteU/idf5irpp79sSLHrc G2iLG0EpFmwYsVmafJ9gbALCvpA0IqwzxWg4HTs3iXdxQ7ZFpBeyEiWuR1jE0qULy7mrEp7Fw So2nm+8ogjDKIwdsI3kZRb0PXnnf981lqVNb7OYt5aep2vN2jY8j+D6YUHkbp49cIf+4q3kNI cj5bWarGeLuqyA7E5vjwVissyuf2/4Re3Q5N22I3TLT7L6/NAqxCgIhV99Mu5HnUf863qkrq5 GlWasH1R4I5fLRLokrN5IUpDZ37r+elQ8CLETMSWY3tzk6y8SssHUiGZkcrNFthkgRtDm/plw 9gFVqrBQLGEI45DWpfseS5hEjLVZmu9vnw4pB0CcggKRFBI+qjnpUVIKsGJZSLxa4h9zMjKS+ lMNRv6H2R0mP6rvYA8CpYTu5NiqmwVNe9zKMzEAUIjW9fJgDsiaP/LRfA8hfuHA2OOvxGcd0Z M8/ZiPZ6WoJgR+pOiakRjamo5wRgvlcFIrN6EvA77yxvUIrR3jli9q367XYSNaj29pe+eofZV jB2X8HlxXGMoTjGeDeDIHnaQ5l8/J+yic45b/5QqvM4Rv3C6LAWB+t7or5Lv6WhWjseYKBd7U RnHFZlMbUvNTaCvPlyUjxyz4Euhp+J2OO3a3jBVcm/EnZ7avt6ILTl7ycaTVzxshDEpnKCPVa oRtEWimP4HfaV5fD5Jf7vVH8Xvkuw== X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00, DATE_IN_PAST_12_24, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v9 1/2] dpif-netdev: Refactor PMD performance into dpif-netdev-perf X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Add module dpif-netdev-perf to host all PMD performance-related data structures and functions in dpif-netdev. Refactor the PMD stats handling in dpif-netdev and delegate whatever possible into the new module, using clean interfaces to shield dpif-netdev from the implementation details. Accordingly, the all PMD statistics members are moved from the main struct dp_netdev_pmd_thread into a dedicated member of type struct pmd_perf_stats. Include Darrel's prior refactoring of PMD stats contained in [PATCH v5,2/3] dpif-netdev: Refactor some pmd stats: 1. The cycles per packet counts are now based on packets received rather than packet passes through the datapath. 2. Packet counters are now kept for packets received and packets recirculated. These are kept as separate counters for maintainability reasons. The cost of incrementing these counters is negligible. These new counters are also displayed to the user. 3. A display statistic is added for the average number of datapath passes per packet. This should be useful for user debugging and understanding of packet processing. 4. The user visible 'miss' counter is used for successful upcalls, rather than the sum of sucessful and unsuccessful upcalls. Hence, this becomes what user historically understands by OVS 'miss upcall'. The user display is annotated to make this clear as well. 5. The user visible 'lost' counter remains as failed upcalls, but is annotated to make it clear what the meaning is. 6. The enum pmd_stat_type is annotated to make the usage of the stats counters clear. 7. The subtable lookup stats is renamed to make it clear that it relates to masked lookups. 8. The PMD stats test is updated to handle the new user stats of packets received, packets recirculated and average number of datapath passes per packet. On top of that introduce a "-pmd " option to the PMD info commands to filter the output for a single PMD. Made the pmd-stats-show output a bit more readable by adding a blank between colon and value. Signed-off-by: Jan Scheurich Co-authored-by: Darrell Ball Signed-off-by: Darrell Ball Acked-by: Billy O'Mahony --- lib/automake.mk | 2 + lib/dpif-netdev-perf.c | 60 +++++++++ lib/dpif-netdev-perf.h | 140 +++++++++++++++++++ lib/dpif-netdev.c | 358 ++++++++++++++++++++----------------------------- tests/pmd.at | 30 +++-- 5 files changed, 369 insertions(+), 221 deletions(-) create mode 100644 lib/dpif-netdev-perf.c create mode 100644 lib/dpif-netdev-perf.h diff --git a/lib/automake.mk b/lib/automake.mk index 4b38a11..159319f 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -80,6 +80,8 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpdk.h \ lib/dpif-netdev.c \ lib/dpif-netdev.h \ + lib/dpif-netdev-perf.c \ + lib/dpif-netdev-perf.h \ lib/dpif-provider.h \ lib/dpif.c \ lib/dpif.h \ diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c new file mode 100644 index 0000000..f06991a --- /dev/null +++ b/lib/dpif-netdev-perf.c @@ -0,0 +1,60 @@ +/* + * Copyright (c) 2017 Ericsson AB. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#include "openvswitch/dynamic-string.h" +#include "openvswitch/vlog.h" +#include "dpif-netdev-perf.h" +#include "timeval.h" + +VLOG_DEFINE_THIS_MODULE(pmd_perf); + +void +pmd_perf_stats_init(struct pmd_perf_stats *s) +{ + memset(s, 0 , sizeof(*s)); +} + +void +pmd_perf_read_counters(struct pmd_perf_stats *s, + uint64_t stats[PMD_N_STATS]) +{ + uint64_t val; + + /* These loops subtracts reference values (.zero[*]) from the counters. + * Since loads and stores are relaxed, it might be possible for a .zero[*] + * value to be more recent than the current value we're reading from the + * counter. This is not a big problem, since these numbers are not + * supposed to be 100% accurate, but we should at least make sure that + * the result is not negative. */ + for (int i = 0; i < PMD_N_STATS; i++) { + atomic_read_relaxed(&s->counters.n[i], &val); + if (val > s->counters.zero[i]) { + stats[i] = val - s->counters.zero[i]; + } else { + stats[i] = 0; + } + } +} + +void +pmd_perf_stats_clear(struct pmd_perf_stats *s) +{ + for (int i = 0; i < PMD_N_STATS; i++) { + atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]); + } +} diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h new file mode 100644 index 0000000..53d60d3 --- /dev/null +++ b/lib/dpif-netdev-perf.h @@ -0,0 +1,140 @@ +/* + * Copyright (c) 2017 Ericsson AB. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_NETDEV_PERF_H +#define DPIF_NETDEV_PERF_H 1 + +#include +#include +#include +#include +#include + +#include "openvswitch/vlog.h" +#include "ovs-atomic.h" +#include "timeval.h" +#include "unixctl.h" +#include "util.h" + +#ifdef __cplusplus +extern "C" { +#endif + +/* This module encapsulates data structures and functions to maintain PMD + * performance metrics such as packet counters, execution cycles. It + * provides a clean API for dpif-netdev to initialize, update and read and + * reset these metrics. + */ + +/* Set of counter types maintained in pmd_perf_stats. */ + +enum pmd_stat_type { + PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ + PMD_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ + PMD_STAT_MISS, /* Packets that did not match and upcall was ok. */ + PMD_STAT_LOST, /* Packets that did not match and upcall failed. */ + /* The above statistics account for the total + * number of packet passes through the datapath + * pipeline and should not be overlapping with each + * other. */ + PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table + hits. Each MASKED_HIT hit will have >= 1 + MASKED_LOOKUP(s). */ + PMD_STAT_RECV, /* Packets entering the datapath pipeline from an + * interface. */ + PMD_STAT_RECIRC, /* Packets reentering the datapath pipeline due to + * recirculation. */ + PMD_STAT_SENT_PKTS, /* Packets that have been sent. */ + PMD_STAT_SENT_BATCHES, /* Number of batches sent. */ + PMD_CYCLES_POLL_IDLE, /* Cycles spent unsuccessful polling. */ + PMD_CYCLES_POLL_BUSY, /* Cycles spent successfully polling and + * processing polled packets. */ + PMD_CYCLES_OVERHEAD, /* Cycles spent for other tasks. */ + PMD_CYCLES_ITER_IDLE, /* Cycles spent in idle iterations. */ + PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */ + PMD_N_STATS +}; + +/* Array of PMD counters indexed by enum pmd_stat_type. + * The n[] array contains the actual counter values since initialization + * of the PMD. Counters are atomically updated from the PMD but are + * read and cleared also from other processes. To clear the counters at + * PMD run-time, the current counter values are copied over to the zero[] + * array. To read counters we subtract zero[] value from n[]. */ + +struct pmd_counters { + atomic_uint64_t n[PMD_N_STATS]; /* Value since _init(). */ + uint64_t zero[PMD_N_STATS]; /* Value at last _clear(). */ +}; + +/* Container for all performance metrics of a PMD. + * Part of the struct dp_netdev_pmd_thread. */ + +struct pmd_perf_stats { + /* Start of the current PMD iteration in TSC cycles.*/ + uint64_t last_tsc; + /* Set of PMD counters with their zero offsets. */ + struct pmd_counters counters; +}; + +void pmd_perf_stats_init(struct pmd_perf_stats *s); +void pmd_perf_stats_clear(struct pmd_perf_stats *s); +void pmd_perf_read_counters(struct pmd_perf_stats *s, + uint64_t stats[PMD_N_STATS]); + +/* PMD performance counters are updated lock-less. For real PMDs + * they are only updated from the PMD thread itself. In the case of the + * NON-PMD they might be updated from multiple threads, but we can live + * with losing a rare update as 100% accuracy is not required. + * However, as counters are read for display from outside the PMD thread + * with e.g. pmd-stats-show, we make sure that the 64-bit read and store + * operations are atomic also on 32-bit systems so that readers cannot + * not read garbage. On 64-bit systems this incurs no overhead. */ + +static inline void +pmd_perf_update_counter(struct pmd_perf_stats *s, + enum pmd_stat_type counter, int delta) +{ + uint64_t tmp; + atomic_read_relaxed(&s->counters.n[counter], &tmp); + tmp += delta; + atomic_store_relaxed(&s->counters.n[counter], tmp); +} + +static inline void +pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc) +{ + s->last_tsc = now_tsc; +} + +static inline void +pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc, + int rx_packets) +{ + uint64_t cycles = now_tsc - s->last_tsc; + + if (rx_packets > 0) { + pmd_perf_update_counter(s, PMD_CYCLES_ITER_BUSY, cycles); + } else { + pmd_perf_update_counter(s, PMD_CYCLES_ITER_IDLE, cycles); + } +} + +#ifdef __cplusplus +} +#endif + +#endif /* DPIF_NETDEV_PERF_H */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index c7d157a..82d29bb 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -44,6 +44,7 @@ #include "csum.h" #include "dp-packet.h" #include "dpif.h" +#include "dpif-netdev-perf.h" #include "dpif-provider.h" #include "dummy.h" #include "fat-rwlock.h" @@ -331,25 +332,6 @@ static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp, odp_port_t) OVS_REQUIRES(dp->port_mutex); -enum dp_stat_type { - DP_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ - DP_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ - DP_STAT_MISS, /* Packets that did not match. */ - DP_STAT_LOST, /* Packets not passed up to the client. */ - DP_STAT_LOOKUP_HIT, /* Number of subtable lookups for flow table - hits */ - DP_STAT_SENT_PKTS, /* Packets that has been sent. */ - DP_STAT_SENT_BATCHES, /* Number of batches sent. */ - DP_N_STATS -}; - -enum pmd_cycles_counter_type { - PMD_CYCLES_IDLE, /* Cycles spent idle or unsuccessful polling */ - PMD_CYCLES_PROCESSING, /* Cycles spent successfully polling and - * processing polled packets */ - PMD_N_CYCLES -}; - enum rxq_cycles_counter_type { RXQ_CYCLES_PROC_CURR, /* Cycles spent successfully polling and processing packets during the current @@ -499,21 +481,6 @@ struct dp_netdev_actions *dp_netdev_flow_get_actions( const struct dp_netdev_flow *); static void dp_netdev_actions_free(struct dp_netdev_actions *); -/* Contained by struct dp_netdev_pmd_thread's 'stats' member. */ -struct dp_netdev_pmd_stats { - /* Indexed by DP_STAT_*. */ - atomic_ullong n[DP_N_STATS]; -}; - -/* Contained by struct dp_netdev_pmd_thread's 'cycle' member. */ -struct dp_netdev_pmd_cycles { - /* Indexed by PMD_CYCLES_*. */ - atomic_ullong n[PMD_N_CYCLES]; -}; - -static void dp_netdev_count_packet(struct dp_netdev_pmd_thread *, - enum dp_stat_type type, int cnt); - struct polled_queue { struct dp_netdev_rxq *rxq; odp_port_t port_no; @@ -595,12 +562,6 @@ struct dp_netdev_pmd_thread { are stored for each polled rxq. */ long long int rxq_next_cycle_store; - /* Statistics. */ - struct dp_netdev_pmd_stats stats; - - /* Cycles counters */ - struct dp_netdev_pmd_cycles cycles; - /* Current context of the PMD thread. */ struct dp_netdev_pmd_thread_ctx ctx; @@ -638,12 +599,8 @@ struct dp_netdev_pmd_thread { struct hmap tnl_port_cache; struct hmap send_port_cache; - /* Only a pmd thread can write on its own 'cycles' and 'stats'. - * The main thread keeps 'stats_zero' and 'cycles_zero' as base - * values and subtracts them from 'stats' and 'cycles' before - * reporting to the user */ - unsigned long long stats_zero[DP_N_STATS]; - uint64_t cycles_zero[PMD_N_CYCLES]; + /* Keep track of detailed PMD performance statistics. */ + struct pmd_perf_stats perf_stats; /* Set to true if the pmd thread needs to be reloaded. */ bool need_reload; @@ -833,47 +790,10 @@ enum pmd_info_type { }; static void -pmd_info_show_stats(struct ds *reply, - struct dp_netdev_pmd_thread *pmd, - unsigned long long stats[DP_N_STATS], - uint64_t cycles[PMD_N_CYCLES]) +format_pmd_thread(struct ds *reply, struct dp_netdev_pmd_thread *pmd) { - unsigned long long total_packets; - uint64_t total_cycles = 0; - double lookups_per_hit = 0, packets_per_batch = 0; - int i; - - /* These loops subtracts reference values ('*_zero') from the counters. - * Since loads and stores are relaxed, it might be possible for a '*_zero' - * value to be more recent than the current value we're reading from the - * counter. This is not a big problem, since these numbers are not - * supposed to be too accurate, but we should at least make sure that - * the result is not negative. */ - for (i = 0; i < DP_N_STATS; i++) { - if (stats[i] > pmd->stats_zero[i]) { - stats[i] -= pmd->stats_zero[i]; - } else { - stats[i] = 0; - } - } - - /* Sum of all the matched and not matched packets gives the total. */ - total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT] - + stats[DP_STAT_MISS]; - - for (i = 0; i < PMD_N_CYCLES; i++) { - if (cycles[i] > pmd->cycles_zero[i]) { - cycles[i] -= pmd->cycles_zero[i]; - } else { - cycles[i] = 0; - } - - total_cycles += cycles[i]; - } - ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID) ? "main thread" : "pmd thread"); - if (pmd->numa_id != OVS_NUMA_UNSPEC) { ds_put_format(reply, " numa_id %d", pmd->numa_id); } @@ -881,23 +801,52 @@ pmd_info_show_stats(struct ds *reply, ds_put_format(reply, " core_id %u", pmd->core_id); } ds_put_cstr(reply, ":\n"); +} + +static void +pmd_info_show_stats(struct ds *reply, + struct dp_netdev_pmd_thread *pmd) +{ + uint64_t stats[PMD_N_STATS]; + uint64_t total_cycles, total_packets; + double passes_per_pkt = 0; + double lookups_per_hit = 0; + double packets_per_batch = 0; + + pmd_perf_read_counters(&pmd->perf_stats, stats); + total_cycles = stats[PMD_CYCLES_ITER_IDLE] + + stats[PMD_CYCLES_ITER_BUSY]; + total_packets = stats[PMD_STAT_RECV]; + + format_pmd_thread(reply, pmd); - if (stats[DP_STAT_MASKED_HIT] > 0) { - lookups_per_hit = stats[DP_STAT_LOOKUP_HIT] - / (double) stats[DP_STAT_MASKED_HIT]; + if (total_packets > 0) { + passes_per_pkt = (total_packets + stats[PMD_STAT_RECIRC]) + / (double) total_packets; } - if (stats[DP_STAT_SENT_BATCHES] > 0) { - packets_per_batch = stats[DP_STAT_SENT_PKTS] - / (double) stats[DP_STAT_SENT_BATCHES]; + if (stats[PMD_STAT_MASKED_HIT] > 0) { + lookups_per_hit = stats[PMD_STAT_MASKED_LOOKUP] + / (double) stats[PMD_STAT_MASKED_HIT]; + } + if (stats[PMD_STAT_SENT_BATCHES] > 0) { + packets_per_batch = stats[PMD_STAT_SENT_PKTS] + / (double) stats[PMD_STAT_SENT_BATCHES]; } ds_put_format(reply, - "\temc hits:%llu\n\tmegaflow hits:%llu\n" - "\tavg. subtable lookups per hit:%.2f\n" - "\tmiss:%llu\n\tlost:%llu\n" - "\tavg. packets per output batch: %.2f\n", - stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT], - lookups_per_hit, stats[DP_STAT_MISS], stats[DP_STAT_LOST], + "\tpackets received: %"PRIu64"\n" + "\tpacket recirculations: %"PRIu64"\n" + "\tavg. datapath passes per packet: %.02f\n" + "\temc hits: %"PRIu64"\n" + "\tmegaflow hits: %"PRIu64"\n" + "\tavg. subtable lookups per megaflow hit: %.02f\n" + "\tmiss with success upcall: %"PRIu64"\n" + "\tmiss with failed upcall: %"PRIu64"\n" + "\tavg. packets per output batch: %.02f\n", + total_packets, stats[PMD_STAT_RECIRC], + passes_per_pkt, stats[PMD_STAT_EXACT_HIT], + stats[PMD_STAT_MASKED_HIT], lookups_per_hit, + stats[PMD_STAT_MISS], stats[PMD_STAT_LOST], packets_per_batch); if (total_cycles == 0) { @@ -905,48 +854,27 @@ pmd_info_show_stats(struct ds *reply, } ds_put_format(reply, - "\tidle cycles:%"PRIu64" (%.02f%%)\n" - "\tprocessing cycles:%"PRIu64" (%.02f%%)\n", - cycles[PMD_CYCLES_IDLE], - cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100, - cycles[PMD_CYCLES_PROCESSING], - cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100); + "\tidle cycles: %"PRIu64" (%.02f%%)\n" + "\tprocessing cycles: %"PRIu64" (%.02f%%)\n", + stats[PMD_CYCLES_ITER_IDLE], + stats[PMD_CYCLES_ITER_IDLE] / (double) total_cycles * 100, + stats[PMD_CYCLES_ITER_BUSY], + stats[PMD_CYCLES_ITER_BUSY] / (double) total_cycles * 100); if (total_packets == 0) { return; } ds_put_format(reply, - "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n", - total_cycles / (double)total_packets, + "\tavg cycles per packet: %.02f (%"PRIu64"/%"PRIu64")\n", + total_cycles / (double) total_packets, total_cycles, total_packets); ds_put_format(reply, "\tavg processing cycles per packet: " - "%.02f (%"PRIu64"/%llu)\n", - cycles[PMD_CYCLES_PROCESSING] / (double)total_packets, - cycles[PMD_CYCLES_PROCESSING], total_packets); -} - -static void -pmd_info_clear_stats(struct ds *reply OVS_UNUSED, - struct dp_netdev_pmd_thread *pmd, - unsigned long long stats[DP_N_STATS], - uint64_t cycles[PMD_N_CYCLES]) -{ - int i; - - /* We cannot write 'stats' and 'cycles' (because they're written by other - * threads) and we shouldn't change 'stats' (because they're used to count - * datapath stats, which must not be cleared here). Instead, we save the - * current values and subtract them from the values to be displayed in the - * future */ - for (i = 0; i < DP_N_STATS; i++) { - pmd->stats_zero[i] = stats[i]; - } - for (i = 0; i < PMD_N_CYCLES; i++) { - pmd->cycles_zero[i] = cycles[i]; - } + "%.02f (%"PRIu64"/%"PRIu64")\n", + stats[PMD_CYCLES_ITER_BUSY] / (double) total_packets, + stats[PMD_CYCLES_ITER_BUSY], total_packets); } static int @@ -1106,23 +1034,37 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], struct ds reply = DS_EMPTY_INITIALIZER; struct dp_netdev_pmd_thread **pmd_list; struct dp_netdev *dp = NULL; - size_t n; enum pmd_info_type type = *(enum pmd_info_type *) aux; + unsigned int core_id; + bool filter_on_pmd = false; + size_t n; ovs_mutex_lock(&dp_netdev_mutex); - if (argc == 2) { - dp = shash_find_data(&dp_netdevs, argv[1]); - } else if (shash_count(&dp_netdevs) == 1) { - /* There's only one datapath */ - dp = shash_first(&dp_netdevs)->data; + while (argc > 1) { + if (!strcmp(argv[1], "-pmd") && argc >= 3) { + if (str_to_uint(argv[2], 10, &core_id)) { + filter_on_pmd = true; + } + argc -= 2; + argv += 2; + } else { + dp = shash_find_data(&dp_netdevs, argv[1]); + argc -= 1; + argv += 1; + } } if (!dp) { - ovs_mutex_unlock(&dp_netdev_mutex); - unixctl_command_reply_error(conn, - "please specify an existing datapath"); - return; + if (shash_count(&dp_netdevs) == 1) { + /* There's only one datapath */ + dp = shash_first(&dp_netdevs)->data; + } else { + ovs_mutex_unlock(&dp_netdev_mutex); + unixctl_command_reply_error(conn, + "please specify an existing datapath"); + return; + } } sorted_poll_thread_list(dp, &pmd_list, &n); @@ -1131,26 +1073,15 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], if (!pmd) { break; } - + if (filter_on_pmd && pmd->core_id != core_id) { + continue; + } if (type == PMD_INFO_SHOW_RXQ) { pmd_info_show_rxq(&reply, pmd); - } else { - unsigned long long stats[DP_N_STATS]; - uint64_t cycles[PMD_N_CYCLES]; - - /* Read current stats and cycle counters */ - for (size_t j = 0; j < ARRAY_SIZE(stats); j++) { - atomic_read_relaxed(&pmd->stats.n[j], &stats[j]); - } - for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) { - atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]); - } - - if (type == PMD_INFO_CLEAR_STATS) { - pmd_info_clear_stats(&reply, pmd, stats, cycles); - } else if (type == PMD_INFO_SHOW_STATS) { - pmd_info_show_stats(&reply, pmd, stats, cycles); - } + } else if (type == PMD_INFO_CLEAR_STATS) { + pmd_perf_stats_clear(&pmd->perf_stats); + } else if (type == PMD_INFO_SHOW_STATS) { + pmd_info_show_stats(&reply, pmd); } } free(pmd_list); @@ -1168,14 +1099,14 @@ dpif_netdev_init(void) clear_aux = PMD_INFO_CLEAR_STATS, poll_aux = PMD_INFO_SHOW_RXQ; - unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]", - 0, 1, dpif_netdev_pmd_info, + unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core] [dp]", + 0, 3, dpif_netdev_pmd_info, (void *)&show_aux); - unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]", - 0, 1, dpif_netdev_pmd_info, + unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core] [dp]", + 0, 3, dpif_netdev_pmd_info, (void *)&clear_aux); - unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]", - 0, 1, dpif_netdev_pmd_info, + unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] [dp]", + 0, 3, dpif_netdev_pmd_info, (void *)&poll_aux); unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", 0, 1, dpif_netdev_pmd_rebalance, @@ -1511,20 +1442,16 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) { struct dp_netdev *dp = get_dp_netdev(dpif); struct dp_netdev_pmd_thread *pmd; + uint64_t pmd_stats[PMD_N_STATS]; stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost = 0; CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { - unsigned long long n; stats->n_flows += cmap_count(&pmd->flow_table); - - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n); - stats->n_hit += n; - atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n); - stats->n_hit += n; - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n); - stats->n_missed += n; - atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n); - stats->n_lost += n; + pmd_perf_read_counters(&pmd->perf_stats, pmd_stats); + stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT]; + stats->n_hit += pmd_stats[PMD_STAT_MASKED_HIT]; + stats->n_missed += pmd_stats[PMD_STAT_MISS]; + stats->n_lost += pmd_stats[PMD_STAT_LOST]; } stats->n_masks = UINT32_MAX; stats->n_mask_hit = UINT64_MAX; @@ -3209,28 +3136,28 @@ cycles_count_start(struct dp_netdev_pmd_thread *pmd) /* Stop counting cycles and add them to the counter 'type' */ static inline void cycles_count_end(struct dp_netdev_pmd_thread *pmd, - enum pmd_cycles_counter_type type) + enum pmd_stat_type type) OVS_RELEASES(&cycles_counter_fake_mutex) OVS_NO_THREAD_SAFETY_ANALYSIS { unsigned long long interval = cycles_counter() - pmd->ctx.last_cycles; - non_atomic_ullong_add(&pmd->cycles.n[type], interval); + pmd_perf_update_counter(&pmd->perf_stats, type, interval); } /* Calculate the intermediate cycle result and add to the counter 'type' */ static inline void cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_rxq *rxq, - enum pmd_cycles_counter_type type) + enum pmd_stat_type type) OVS_NO_THREAD_SAFETY_ANALYSIS { unsigned long long new_cycles = cycles_counter(); unsigned long long interval = new_cycles - pmd->ctx.last_cycles; pmd->ctx.last_cycles = new_cycles; - non_atomic_ullong_add(&pmd->cycles.n[type], interval); - if (rxq && (type == PMD_CYCLES_PROCESSING)) { + pmd_perf_update_counter(&pmd->perf_stats, type, interval); + if (rxq && (type == PMD_CYCLES_POLL_BUSY)) { /* Add to the amount of current processing cycles. */ non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval); } @@ -3289,8 +3216,8 @@ dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd, netdev_send(p->port->netdev, tx_qid, &p->output_pkts, dynamic_txqs); dp_packet_batch_init(&p->output_pkts); - dp_netdev_count_packet(pmd, DP_STAT_SENT_PKTS, output_cnt); - dp_netdev_count_packet(pmd, DP_STAT_SENT_BATCHES, 1); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SENT_PKTS, output_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SENT_BATCHES, 1); } static void @@ -3971,12 +3898,12 @@ dpif_netdev_run(struct dpif *dpif) port->port_no); cycles_count_intermediate(non_pmd, NULL, process_packets - ? PMD_CYCLES_PROCESSING - : PMD_CYCLES_IDLE); + ? PMD_CYCLES_POLL_BUSY + : PMD_CYCLES_POLL_IDLE); } } } - cycles_count_end(non_pmd, PMD_CYCLES_IDLE); + cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE); pmd_thread_ctx_time_update(non_pmd); dpif_netdev_xps_revalidate_pmd(non_pmd, false); ovs_mutex_unlock(&dp->non_pmd_mutex); @@ -4121,6 +4048,7 @@ static void * pmd_thread_main(void *f_) { struct dp_netdev_pmd_thread *pmd = f_; + struct pmd_perf_stats *s = &pmd->perf_stats; unsigned int lc = 0; struct polled_queue *poll_list; bool exiting; @@ -4156,13 +4084,17 @@ reload: cycles_count_start(pmd); for (;;) { + uint64_t iter_packets = 0; + pmd_perf_start_iteration(s, pmd->ctx.last_cycles); for (i = 0; i < poll_cnt; i++) { process_packets = dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, poll_list[i].port_no); cycles_count_intermediate(pmd, poll_list[i].rxq, - process_packets ? PMD_CYCLES_PROCESSING - : PMD_CYCLES_IDLE); + process_packets + ? PMD_CYCLES_POLL_BUSY + : PMD_CYCLES_POLL_IDLE); + iter_packets += process_packets; } if (lc++ > 1024) { @@ -4183,10 +4115,12 @@ reload: if (reload) { break; } + cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD); } + pmd_perf_end_iteration(s, pmd->ctx.last_cycles, iter_packets); } - cycles_count_end(pmd, PMD_CYCLES_IDLE); + cycles_count_end(pmd, PMD_CYCLES_OVERHEAD); poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); exiting = latch_is_set(&pmd->exit_latch); @@ -4638,6 +4572,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, emc_cache_init(&pmd->flow_cache); pmd_alloc_static_tx_qid(pmd); } + pmd_perf_stats_init(&pmd->perf_stats); cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node), hash_int(core_id, 0)); } @@ -4838,13 +4773,6 @@ dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow, int cnt, int size, atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags); } -static void -dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd, - enum dp_stat_type type, int cnt) -{ - non_atomic_ullong_add(&pmd->stats.n[type], cnt); -} - static int dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_, struct flow *flow, struct flow_wildcards *wc, ovs_u128 *ufid, @@ -5017,6 +4945,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, int i; atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min); + pmd_perf_update_counter(&pmd->perf_stats, + md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV, + cnt); DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) { struct dp_netdev_flow *flow; @@ -5065,18 +4996,17 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, } } - dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT, - cnt - n_dropped - n_missed); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT, + cnt - n_dropped - n_missed); return dp_packet_batch_size(packets_); } -static inline void +static inline int handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet, const struct netdev_flow_key *key, - struct ofpbuf *actions, struct ofpbuf *put_actions, - int *lost_cnt) + struct ofpbuf *actions, struct ofpbuf *put_actions) { struct ofpbuf *add_actions; struct dp_packet_batch b; @@ -5096,8 +5026,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, put_actions); if (OVS_UNLIKELY(error && error != ENOSPC)) { dp_packet_delete(packet); - (*lost_cnt)++; - return; + return error; } /* The Netlink encoding of datapath flow keys cannot express @@ -5137,6 +5066,9 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, ovs_mutex_unlock(&pmd->flow_mutex); emc_probabilistic_insert(pmd, key, netdev_flow); } + /* Only error ENOSPC can reach here. We process the packet but do not + * install a datapath flow. Treat as successful. */ + return 0; } static inline void @@ -5158,7 +5090,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, struct dpcls *cls; struct dpcls_rule *rules[PKT_ARRAY_SIZE]; struct dp_netdev *dp = pmd->dp; - int miss_cnt = 0, lost_cnt = 0; + int upcall_ok_cnt = 0, upcall_fail_cnt = 0; int lookup_cnt = 0, add_lookup_cnt; bool any_miss; size_t i; @@ -5200,9 +5132,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, continue; } - miss_cnt++; - handle_packet_upcall(pmd, packet, &keys[i], &actions, - &put_actions, &lost_cnt); + int error = handle_packet_upcall(pmd, packet, &keys[i], + &actions, &put_actions); + + if (OVS_UNLIKELY(error)) { + upcall_fail_cnt++; + } else { + upcall_ok_cnt++; + } } ofpbuf_uninit(&actions); @@ -5212,8 +5149,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, DP_PACKET_BATCH_FOR_EACH (packet, packets_) { if (OVS_UNLIKELY(!rules[i])) { dp_packet_delete(packet); - lost_cnt++; - miss_cnt++; + upcall_fail_cnt++; } } } @@ -5231,10 +5167,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches); } - dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt); - dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt); - dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt); - dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT, + cnt - upcall_ok_cnt - upcall_fail_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_LOOKUP, + lookup_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS, + upcall_ok_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST, + upcall_fail_cnt); } /* Packets enter the datapath from a port (or from recirculation) here. diff --git a/tests/pmd.at b/tests/pmd.at index fcb007c..83d60f8 100644 --- a/tests/pmd.at +++ b/tests/pmd.at @@ -170,13 +170,16 @@ dummy@ovs-dummy: hit:0 missed:0 p0 7/1: (dummy-pmd: configured_rx_queues=4, configured_tx_queues=, requested_rx_queues=4, requested_tx_queues=) ]) -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl pmd thread numa_id core_id : - emc hits:0 - megaflow hits:0 - avg. subtable lookups per hit:0.00 - miss:0 - lost:0 + packets received: 0 + packet recirculations: 0 + avg. datapath passes per packet: 0.00 + emc hits: 0 + megaflow hits: 0 + avg. subtable lookups per megaflow hit: 0.00 + miss with success upcall: 0 + miss with failed upcall: 0 ]) ovs-appctl time/stop @@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | strip_xout], [0], [dnl recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), actions: ]) -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl pmd thread numa_id core_id : - emc hits:19 - megaflow hits:0 - avg. subtable lookups per hit:0.00 - miss:1 - lost:0 + packets received: 20 + packet recirculations: 0 + avg. datapath passes per packet: 1.00 + emc hits: 19 + megaflow hits: 0 + avg. subtable lookups per megaflow hit: 0.00 + miss with success upcall: 1 + miss with failed upcall: 0 ]) OVS_VSWITCHD_STOP From patchwork Fri Jan 12 00:39:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Scheurich X-Patchwork-Id: 859974 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zJ5111C7yz9sNw for ; Sat, 13 Jan 2018 01:36:53 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 06F6FFFA; Fri, 12 Jan 2018 14:35:30 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id DFBC8FF1 for ; Fri, 12 Jan 2018 14:35:28 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mout.web.de (mout.web.de [212.227.15.14]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 796F2196 for ; Fri, 12 Jan 2018 14:35:27 +0000 (UTC) Received: from ubuntu.local ([129.192.10.2]) by smtp.web.de (mrweb004 [213.165.67.108]) with ESMTPSA (Nemesis) id 0M69CU-1elN8b3DFS-00yCCE; Fri, 12 Jan 2018 15:34:43 +0100 From: Jan Scheurich To: dev@openvswitch.org Date: Fri, 12 Jan 2018 01:39:03 +0100 Message-Id: <1515717543-31903-3-git-send-email-jan.scheurich@ericsson.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1515717543-31903-1-git-send-email-jan.scheurich@ericsson.com> References: <1515717543-31903-1-git-send-email-jan.scheurich@ericsson.com> X-Provags-ID: V03:K0:gxp7E7O9bm3NpNj5LFo/jyTuOds7PLBaRofrVhVILhzkcA9/335 orTRg/7a8pn4QFpkNodzQROi74Y8WkNpPuLrrDnk8mMrLPhekzywDzOBPGwcrDhZTv3UyfB UJjenlS9UNXQM8VGvtY4DCAgP9aUvzIuU5c/GmAWVOwivFUdLy2maFSuEfmIggtgOoPeQ39 rsm6JnfVqdDsnbAvlHOJA== X-UI-Out-Filterresults: notjunk:1; V01:K0:eYcvYbkWz/A=:OzDiCjVbjDvVEpmfZCFJ7t y4FB4/NMDcGPlVfL0SNI6nSHWugobE9g1GqQr3IYYtfro6zpf6K30daNPf4VoIbhsK1Wg77mE o1UWW8gJAE5r1CORNVy7IIbiGi18yjt4OM65++56XG2RW2X9+r8Jm3SrM+nrJKeUtMNBzWB1q xzbGMkgzQk1CuCYIKBG2XCwDpp2AQfOvaKUyF5lT6yJstdybhV1pHqR3Pj4A9HYyS3cAuoEqK 7jOoWhehs7ElvWRv6D5UnwumnsZmjsCAQNrfIxRqLCzPWMK82d7L6MhxD6lyIo4qtqJxGotpv GTfSz7fXRro2VpZe5HphiJhB25rcYDrAMVrObe0yhoUafJ5lWxGnZB9T+5vuFqpAADRVTS6gr p0/IizgY8buBOIodluUYdcL7qj334h43WudMIVrZ3YLNMafwNC2+kwFSmGhUybEmleAhkCvH7 wO0/khZ0g1FdxIyL+YW5VERv86FQLJQT+X6EF2vV7ksVs9GxP/E1CZTV/ykkCSPpDJBC4O7yK ocDlZIuxcatobYAh4GN9T8JCLI+iNtLYRL1JhJoHla9+QPGJMBYgsT11sw2QR/78EF00ElW5n V/HtuGN0Sl9n1bPuo07HghKh/16UuwJzoeZaDRqKd+OexqaKwYZs7Z26VEoCGhSILyuuJdnly jzCMV0OJJ5dkUTyXPUn8mKdmAd2H/cfKufhEBrG3bn0UTYpXGiTU62bJQBi+aWHjQwmCTgiGI V+LFK+ahpTI4iGQ/IhPpjufjLkM1z2g7Dn20u1t0V0X6+pWWaKm3WQ1+ZKyKLdVLU6NQt9s0m zCJwAUxWFE1Om+sh/JQnk0t6dzL3w== X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00, DATE_IN_PAST_12_24, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v9 2/2] dpif-netdev: Refactor cycle counting X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Simplify the historically grown TSC cycle counting in PMD threads. Cycles are currently counted for the following purposes: 1. Measure PMD ustilization PMD utilization is defined as ratio of cycles spent in busy iterations (at least one packet received or sent) over the total number of cycles. This is already done in pmd_perf_start_iteration() and pmd_perf_end_iteration() based on a TSC timestamp saved in current iteration at start_iteration() and the actual TSC at end_iteration(). No dependency on intermediate cycle accounting. 2. Measure the processing load per RX queue This comprises cycles spend on polling and processing packets received from the rx queue and the cycles spent on delayed sending of these packets to tx queues (with time-based batching). The previous scheme using cycles_count_start(), cycles_count_intermediate() and cycles-count_end() originally introduced to simplify cycle counting and saving calls to rte_get_tsc_cycles() was rather obscuring things. Replace by a nestable cycle_timer with with start and stop functions to embrace a code segment to be timed. The timed code may contain arbitrary nested cycle_timers. The duration of nested timers is excluded from the outer timer. The caller must ensure that each call to cycle_timer_start() is followed by a call to cycle_timer_end(). Failure to do so will lead to assertion failure or a memory leak. The new cycle_timer is used to measure the processing cycles per rx queue. This is not yet strictly necessary but will be made use of in a subsequent commit. All cycle count functions and data are relocated to module dpif-netdev-perf. Signed-off-by: Jan Scheurich Acked-by: Billy O'Mahony --- lib/dpif-netdev-perf.h | 110 ++++++++++++++++++++++++++++++++++++++++---- lib/dpif-netdev.c | 122 ++++++++++++++----------------------------------- 2 files changed, 135 insertions(+), 97 deletions(-) diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index 53d60d3..5993c25 100644 --- a/lib/dpif-netdev-perf.h +++ b/lib/dpif-netdev-perf.h @@ -23,6 +23,11 @@ #include #include +#ifdef DPDK_NETDEV +#include +#include +#endif + #include "openvswitch/vlog.h" #include "ovs-atomic.h" #include "timeval.h" @@ -59,10 +64,6 @@ enum pmd_stat_type { * recirculation. */ PMD_STAT_SENT_PKTS, /* Packets that have been sent. */ PMD_STAT_SENT_BATCHES, /* Number of batches sent. */ - PMD_CYCLES_POLL_IDLE, /* Cycles spent unsuccessful polling. */ - PMD_CYCLES_POLL_BUSY, /* Cycles spent successfully polling and - * processing polled packets. */ - PMD_CYCLES_OVERHEAD, /* Cycles spent for other tasks. */ PMD_CYCLES_ITER_IDLE, /* Cycles spent in idle iterations. */ PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */ PMD_N_STATS @@ -85,11 +86,95 @@ struct pmd_counters { struct pmd_perf_stats { /* Start of the current PMD iteration in TSC cycles.*/ + uint64_t start_it_tsc; + /* Latest TSC time stamp taken in PMD. */ uint64_t last_tsc; + /* If non-NULL, outermost cycle timer currently running in PMD. */ + struct cycle_timer *cur_timer; /* Set of PMD counters with their zero offsets. */ struct pmd_counters counters; }; +/* Support for accurate timing of PMD execution on TSC clock cycle level. + * These functions are intended to be invoked in the context of pmd threads. */ + +/* Read the TSC cycle register and cache it. Any function not requiring clock + * cycle accuracy should read the cached value using cycles_counter_get() to + * avoid the overhead of reading the TSC register. */ + +static inline uint64_t +cycles_counter_update(struct pmd_perf_stats *s) +{ +#ifdef DPDK_NETDEV + return s->last_tsc = rte_get_tsc_cycles(); +#else + return s->last_tsc = 0; +#endif +} + +static inline uint64_t +cycles_counter_get(struct pmd_perf_stats *s) +{ + return s->last_tsc; +} + +/* A nestable timer for measuring execution time in TSC cycles. + * + * Usage: + * struct cycle_timer timer; + * + * cycle_timer_start(pmd, &timer); + * + * uint64_t cycles = cycle_timer_stop(pmd, &timer); + * + * The caller must guarantee that a call to cycle_timer_start() is always + * paired with a call to cycle_stimer_stop(). + * + * Is is possible to have nested cycles timers within the timed code. The + * execution time measured by the nested timers is excluded from the time + * measured by the embracing timer. + */ + +struct cycle_timer { + uint64_t start; + uint64_t suspended; + struct cycle_timer *interrupted; +}; + +static inline void +cycle_timer_start(struct pmd_perf_stats *s, + struct cycle_timer *timer) +{ + struct cycle_timer *cur_timer = s->cur_timer; + uint64_t now = cycles_counter_update(s); + + if (cur_timer) { + cur_timer->suspended = now; + } + timer->interrupted = cur_timer; + timer->start = now; + timer->suspended = 0; + s->cur_timer = timer; +} + +static inline uint64_t +cycle_timer_stop(struct pmd_perf_stats *s, + struct cycle_timer *timer) +{ + /* Assert that this is the current cycle timer. */ + ovs_assert(s->cur_timer == timer); + uint64_t now = cycles_counter_update(s); + struct cycle_timer *intr_timer = timer->interrupted; + + if (intr_timer) { + /* Adjust the start offset by the suspended cycles. */ + intr_timer->start += now - intr_timer->suspended; + } + /* Restore suspended timer, if any. */ + s->cur_timer = intr_timer; + return now - timer->start; +} + void pmd_perf_stats_init(struct pmd_perf_stats *s); void pmd_perf_stats_clear(struct pmd_perf_stats *s); void pmd_perf_read_counters(struct pmd_perf_stats *s, @@ -115,16 +200,23 @@ pmd_perf_update_counter(struct pmd_perf_stats *s, } static inline void -pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc) +pmd_perf_start_iteration(struct pmd_perf_stats *s) { - s->last_tsc = now_tsc; + if (OVS_LIKELY(s->last_tsc)) { + /* We assume here that last_tsc was updated immediately prior at + * the end of the previous iteration, or just before the first + * iteration. */ + s->start_it_tsc = s->last_tsc; + } else { + /* In case last_tsc has never been set before. */ + s->start_it_tsc = cycles_counter_update(s); + } } static inline void -pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc, - int rx_packets) +pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets) { - uint64_t cycles = now_tsc - s->last_tsc; + uint64_t cycles = cycles_counter_update(s) - s->start_it_tsc; if (rx_packets > 0) { pmd_perf_update_counter(s, PMD_CYCLES_ITER_BUSY, cycles); diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 82d29bb..e371d11 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -32,10 +32,6 @@ #include #include -#ifdef DPDK_NETDEV -#include -#endif - #include "bitmap.h" #include "cmap.h" #include "conntrack.h" @@ -509,8 +505,6 @@ struct tx_port { struct dp_netdev_pmd_thread_ctx { /* Latest measured time. See 'pmd_thread_ctx_time_update()'. */ long long now; - /* Used to count cycles. See 'cycles_count_end()' */ - unsigned long long last_cycles; }; /* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate @@ -3111,64 +3105,20 @@ dp_netdev_actions_free(struct dp_netdev_actions *actions) free(actions); } -static inline unsigned long long -cycles_counter(void) -{ -#ifdef DPDK_NETDEV - return rte_get_tsc_cycles(); -#else - return 0; -#endif -} - -/* Fake mutex to make sure that the calls to cycles_count_* are balanced */ -extern struct ovs_mutex cycles_counter_fake_mutex; - -/* Start counting cycles. Must be followed by 'cycles_count_end()' */ -static inline void -cycles_count_start(struct dp_netdev_pmd_thread *pmd) - OVS_ACQUIRES(&cycles_counter_fake_mutex) - OVS_NO_THREAD_SAFETY_ANALYSIS -{ - pmd->ctx.last_cycles = cycles_counter(); -} - -/* Stop counting cycles and add them to the counter 'type' */ -static inline void -cycles_count_end(struct dp_netdev_pmd_thread *pmd, - enum pmd_stat_type type) - OVS_RELEASES(&cycles_counter_fake_mutex) - OVS_NO_THREAD_SAFETY_ANALYSIS -{ - unsigned long long interval = cycles_counter() - pmd->ctx.last_cycles; - - pmd_perf_update_counter(&pmd->perf_stats, type, interval); -} - -/* Calculate the intermediate cycle result and add to the counter 'type' */ -static inline void -cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, - struct dp_netdev_rxq *rxq, - enum pmd_stat_type type) - OVS_NO_THREAD_SAFETY_ANALYSIS +static void +dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx, + enum rxq_cycles_counter_type type, + unsigned long long cycles) { - unsigned long long new_cycles = cycles_counter(); - unsigned long long interval = new_cycles - pmd->ctx.last_cycles; - pmd->ctx.last_cycles = new_cycles; - - pmd_perf_update_counter(&pmd->perf_stats, type, interval); - if (rxq && (type == PMD_CYCLES_POLL_BUSY)) { - /* Add to the amount of current processing cycles. */ - non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval); - } + atomic_store_relaxed(&rx->cycles[type], cycles); } static void -dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx, +dp_netdev_rxq_add_cycles(struct dp_netdev_rxq *rx, enum rxq_cycles_counter_type type, unsigned long long cycles) { - atomic_store_relaxed(&rx->cycles[type], cycles); + non_atomic_ullong_add(&rx->cycles[type], cycles); } static uint64_t @@ -3234,27 +3184,40 @@ dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread *pmd) static int dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, - struct netdev_rxq *rx, + struct dp_netdev_rxq *rxq, odp_port_t port_no) { struct dp_packet_batch batch; + struct cycle_timer timer; int error; int batch_cnt = 0; + /* Measure duration for polling and processing rx burst. */ + cycle_timer_start(&pmd->perf_stats, &timer); dp_packet_batch_init(&batch); - error = netdev_rxq_recv(rx, &batch); + error = netdev_rxq_recv(rxq->rx, &batch); if (!error) { + /* At least one packet received. */ *recirc_depth_get() = 0; pmd_thread_ctx_time_update(pmd); batch_cnt = batch.count; dp_netdev_input(pmd, &batch, port_no); dp_netdev_pmd_flush_output_packets(pmd); - } else if (error != EAGAIN && error != EOPNOTSUPP) { - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); - VLOG_ERR_RL(&rl, "error receiving data from %s: %s", - netdev_rxq_get_name(rx), ovs_strerror(error)); + /* Assign processing cycles to rx queue. */ + uint64_t cycles = cycle_timer_stop(&pmd->perf_stats, &timer); + dp_netdev_rxq_add_cycles(rxq, RXQ_CYCLES_PROC_CURR, cycles); + + } else { + /* Discard cycles. */ + cycle_timer_stop(&pmd->perf_stats, &timer); + if (error != EAGAIN && error != EOPNOTSUPP) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + + VLOG_ERR_RL(&rl, "error receiving data from %s: %s", + netdev_rxq_get_name(rxq->rx), ovs_strerror(error)); + } } return batch_cnt; @@ -3880,30 +3843,22 @@ dpif_netdev_run(struct dpif *dpif) struct dp_netdev *dp = get_dp_netdev(dpif); struct dp_netdev_pmd_thread *non_pmd; uint64_t new_tnl_seq; - int process_packets = 0; ovs_mutex_lock(&dp->port_mutex); non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID); if (non_pmd) { ovs_mutex_lock(&dp->non_pmd_mutex); - cycles_count_start(non_pmd); HMAP_FOR_EACH (port, node, &dp->ports) { if (!netdev_is_pmd(port->netdev)) { int i; for (i = 0; i < port->n_rxq; i++) { - process_packets = - dp_netdev_process_rxq_port(non_pmd, - port->rxqs[i].rx, - port->port_no); - cycles_count_intermediate(non_pmd, NULL, - process_packets - ? PMD_CYCLES_POLL_BUSY - : PMD_CYCLES_POLL_IDLE); + dp_netdev_process_rxq_port(non_pmd, + &port->rxqs[i], + port->port_no); } } } - cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE); pmd_thread_ctx_time_update(non_pmd); dpif_netdev_xps_revalidate_pmd(non_pmd, false); ovs_mutex_unlock(&dp->non_pmd_mutex); @@ -4082,18 +4037,14 @@ reload: lc = UINT_MAX; } - cycles_count_start(pmd); + cycles_counter_update(s); for (;;) { uint64_t iter_packets = 0; - pmd_perf_start_iteration(s, pmd->ctx.last_cycles); + pmd_perf_start_iteration(s); for (i = 0; i < poll_cnt; i++) { process_packets = - dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, + dp_netdev_process_rxq_port(pmd, poll_list[i].rxq, poll_list[i].port_no); - cycles_count_intermediate(pmd, poll_list[i].rxq, - process_packets - ? PMD_CYCLES_POLL_BUSY - : PMD_CYCLES_POLL_IDLE); iter_packets += process_packets; } @@ -4115,13 +4066,10 @@ reload: if (reload) { break; } - cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD); } - pmd_perf_end_iteration(s, pmd->ctx.last_cycles, iter_packets); + pmd_perf_end_iteration(s, iter_packets); } - cycles_count_end(pmd, PMD_CYCLES_OVERHEAD); - poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); exiting = latch_is_set(&pmd->exit_latch); /* Signal here to make sure the pmd finishes @@ -5066,9 +5014,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, ovs_mutex_unlock(&pmd->flow_mutex); emc_probabilistic_insert(pmd, key, netdev_flow); } - /* Only error ENOSPC can reach here. We process the packet but do not - * install a datapath flow. Treat as successful. */ - return 0; + return error; } static inline void